All Classes and Interfaces
Class
Description
Default 
ICUTokenizerConfig that is generally applicable to many languages.Extension of 
CharTermAttributeImpl that encodes the term text as a binary Unicode
 collation key instead of as UTF-8 bytes.Converts each token into its 
CollationKey, and then encodes bytes as an
 index term.Indexes collation keys as a single-valued 
SortedDocValuesField.Configures 
KeywordTokenizer with ICUCollationAttributeFactory.A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30
 Character Foldings.
Factory for 
ICUFoldingFilter.Normalize token text with ICU's 
Normalizer2.Factory for 
ICUNormalizer2CharFilterNormalize token text with ICU's 
Normalizer2Factory for 
ICUNormalizer2FilterBreaks text into words according to UAX #29: Unicode Text Segmentation
 (http://www.unicode.org/reports/tr29/)
Class that allows for tailored Unicode Text Segmentation on a per-writing system basis.
Factory for 
ICUTokenizer.A 
TokenFilter that transforms text with ICU.Factory for 
ICUTransformFilter.This attribute stores the UTR #24 script value for a token of text.
Implementation of 
ScriptAttribute that stores the script as an integer.