All Classes and Interfaces
Class
Description
Base class for payload encoders.
Abstract parent class for analysis factories that accept a stopwords file as input.
An object representing the analysis result of a simple (non-compound) word
An object representing a prefix or a suffix applied to a word stem
Internal class used by Snowball stemmers
Strips all characters after an apostrophe (including the apostrophe itself).
Factory for 
ApostropheFilter.Analyzer for Arabic.A 
TokenFilter that applies ArabicNormalizer to normalize the orthography.Factory for 
ArabicNormalizationFilter.A 
TokenFilter that applies ArabicStemmer to stem Arabic words..Factory for 
ArabicStemFilter.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Armenian.This class implements the stemming algorithm defined by a snowball script.
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the
 first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one
 exists.
Factory for 
ASCIIFoldingFilter.Base utility class for implementing a 
CharFilter.Analyzer for Basque.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Bengali.
A 
TokenFilter that applies BengaliNormalizer to normalize the orthography.Factory for 
BengaliNormalizationFilter.A 
TokenFilter that applies BengaliStemmer to stem Bengali words.Factory for 
BengaliStemFilter.Abstract dictionary base class.
Abstract base dictionary writer class.
Analyzer for Brazilian Portuguese language.A 
TokenFilter that applies BrazilianStemmer.Factory for 
BrazilianStemFilter.Analyzer for Bulgarian.A 
TokenFilter that applies BulgarianStemmer to stem Bulgarian words.Factory for 
BulgarianStemFilter.This class implements a simple byte vector with access to the underlying array.
A filter to apply normal capitalization rules to Tokens.
Factory for 
CapitalizationFilter.Analyzer for Catalan.This class implements the stemming algorithm defined by a snowball script.
Character category data.
Functional interface to lookup character class
Writes character definition file
A CharacterIterator used internally for use with 
BreakIteratorAn abstract base class for simple, character-oriented tokenizers.
This class implements a simple char vector with access to the underlying array.
An 
Analyzer that tokenizes text with StandardTokenizer, normalizes content with
 CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilterForms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.
Factory for 
CJKBigramFilter.A 
CharFilter that normalizes CJK width differences:
 
   Folds fullwidth ASCII variants into the equivalent basic latin
   Folds halfwidth Katakana variants into the equivalent kana
 Factory for 
CJKWidthCharFilter.A 
TokenFilter that normalizes CJK width differences:
 
   Folds fullwidth ASCII variants into the equivalent basic latin
   Folds halfwidth Katakana variants into the equivalent kana
 Factory for 
CJKWidthFilter.Filters 
ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.Normalizes tokens extracted with 
ClassicTokenizer.Factory for 
ClassicFilter.A grammar-based tokenizer constructed with JFlex
Factory for 
ClassicTokenizer.Removes words that are too long or too short from the stream.
Factory for 
CodepointCountFilter.Extension of 
CharTermAttributeImpl that encodes the term text as a binary Unicode
 collation key instead of as UTF-8 bytes.Converts each token into its 
CollationKey, and then encodes the bytes as an
 index term.Indexes collation keys as a single-valued 
SortedDocValuesField.Configures 
KeywordTokenizer with CollationAttributeFactory.Construct bigrams for frequently occurring terms while indexing.
Constructs a 
CommonGramsFilter.Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are
 not a member of a bigram.
Construct 
CommonGramsQueryFilter.Base class for decomposition token filters.
Concatenates/Joins every incoming token with a separator into one output token for every path
 through the token stream (which is a graph).
Attribute providing access to the term builder and UTF-16 conversion
Implementation of 
ConcatenateGraphFilter.BytesRefBuilderTermAttributeFactory for 
ConcatenateGraphFilter.A TokenStream that takes an array of input TokenStreams as sources, and concatenates them
 together.
Allows skipping TokenFilters based on the current set of attributes.
Abstract parent class for analysis factories that create 
ConditionalTokenFilter instancesn-gram connection cost data
Writes connection costs
Utility class for parsing CSV text
A general-purpose Analyzer that can be created with a builder-style API.
Builder for 
CustomAnalyzer.Factory class for a 
ConditionalTokenFilterAnalyzer for Czech language.A 
TokenFilter that applies CzechStemmer to stem Czech words.Factory for 
CzechStemFilter.Analyzer for Danish.This class implements the stemming algorithm defined by a snowball script.
Filters all tokens that cannot be parsed to a date, using the provided 
DateFormat.Factory for 
DateRecognizerFilter.Folds all Unicode digits in 
[:General_Category=Decimal_Number:] to Basic Latin digits
 (0-9).Factory for 
DecimalDigitFilter.Characters before the delimiter are the "token", those after are the boost.
Factory for 
DelimitedBoostTokenFilter.Characters before the delimiter are the "token", those after are the payload.
Factory for 
DelimitedPayloadTokenFilter.Characters before the delimiter are the "token", the textual integer after is the term frequency.
Factory for 
DelimitedTermFrequencyTokenFilter.An object representing homonym dictionary entries.
An object representing *.dic file entry with its word, flags and morphological data.
In-memory structure for the dictionary (.dic) and affix (.aff) data of a hunspell dictionary.
High-level dictionary interface for morphological analyzers.
A 
TokenFilter that decomposes compound words found in many
 Germanic languages.Factory for 
DictionaryCompoundWordTokenFilter.Abstract writer class to write dictionary entries.
Dl4jModelReader reads the file generated by the library Deeplearning4j and provide a
 Word2VecModel with normalized vectors
Allows Tokens with a given combination of flags to be dropped.
Provides a filter that will drop tokens matching a set of flags.
Analyzer for Dutch language.This class implements the stemming algorithm defined by a snowball script.
Creates new instances of 
EdgeNGramTokenFilter.Tokenizes the given token into n-grams of given size(s).
Tokenizes the input from an edge into n-grams of given size(s).
Creates new instances of 
EdgeNGramTokenizer.Removes elisions from a 
TokenStream.Factory for 
ElisionFilter.An always exhausted token stream.
Analyzer for English.A 
TokenFilter that applies EnglishMinimalStemmer to stem English words.Factory for 
EnglishMinimalStemFilter.TokenFilter that removes possessives (trailing 's) from words.
Factory for 
EnglishPossessiveFilter.This class implements the stemming algorithm defined by a snowball script.
Suggestion to add/edit dictionary entries to generate a given list of words created by 
WordFormGenerator.compress(java.util.List<java.lang.String>, java.util.Set<java.lang.String>, java.lang.Runnable).Analyzer for Estonian.This class implements the stemming algorithm defined by a snowball script.
Simple 
ResourceLoader that opens resource files from the local file system, optionally
 resolving against a base directory.Filter outputs a single token which is a concatenation of the sorted and de-duplicated set of
 input tokens.
Factory for 
FingerprintFilter.Analyzer for Finnish.A 
TokenFilter that applies FinnishLightStemmer to stem Finnish words.Factory for 
FinnishLightStemFilter.This class implements the stemming algorithm defined by a snowball script.
Deprecated.
Fix the token filters that create broken offsets in the first place.
Deprecated.
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
Factory for 
FixedShingleFilterConverts an incoming graph token stream, such as one from 
SynonymGraphFilter, into a flat
 form so that all nodes form a single linear chain with no side paths.Factory for 
FlattenGraphFilter.Encode a character array Float as a 
BytesRef.An oracle for quickly checking that a specific part of a word can never be a valid word.
Analyzer for French language.A 
TokenFilter that applies FrenchLightStemmer to stem French words.Factory for 
FrenchLightStemFilter.A 
TokenFilter that applies FrenchMinimalStemmer to stem French words.Factory for 
FrenchMinimalStemFilter.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Galician.A 
TokenFilter that applies GalicianMinimalStemmer to stem Galician words.Factory for 
GalicianMinimalStemFilter.A 
TokenFilter that applies GalicianStemmer to stem Galician words.Factory for 
GalicianStemFilter.Analyzer for German language.A 
TokenFilter that applies GermanLightStemmer to stem German words.Factory for 
GermanLightStemFilter.A 
TokenFilter that applies GermanMinimalStemmer to stem German words.Factory for 
GermanMinimalStemFilter.Normalizes German characters according to the heuristics of the German snowball algorithm.
Factory for 
GermanNormalizationFilter.A 
TokenFilter that stems German words.Factory for 
GermanStemFilter.This class implements the stemming algorithm defined by a snowball script.
Outputs the dot (graphviz) string for the viterbi lattice.
Dictionary providerAnalyzer for the Greek language.Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma
 to sigma.
Factory for 
GreekLowerCaseFilter.A 
TokenFilter that applies GreekStemmer to stem Greek words.Factory for 
GreekStemFilter.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Hindi.
A 
TokenFilter that applies HindiNormalizer to normalize the orthography.Factory for 
HindiNormalizationFilter.A 
TokenFilter that applies HindiStemmer to stem Hindi words.Factory for 
HindiStemFilter.This class implements the stemming algorithm defined by a snowball script.
A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
Factory for 
HTMLStripCharFilter.Analyzer for Hungarian.A 
TokenFilter that applies HungarianLightStemmer to stem Hungarian words.Factory for 
HungarianLightStemFilter.This class implements the stemming algorithm defined by a snowball script.
A spell checker based on Hunspell dictionaries.
TokenFilter that uses hunspell affix rules and words to stem tokens.
TokenFilterFactory that creates instances of 
HunspellStemFilter.This class represents a hyphen.
When the plain text is extracted from documents, we will often have many words hyphenated and
 broken into two lines.
Factory for 
HyphenatedWordsFilter.This class represents a hyphenated word.
A 
TokenFilter that decomposes compound words found in many
 Germanic languages.Factory for 
HyphenationCompoundWordTokenFilter.This tree structure stores the hyphenation patterns in an efficient way for fast lookup.
Does nothing other than convert the char array to a byte array using the specified encoding.
A 
TokenFilter that applies IndicNormalizer to normalize text in Indian Languages.Factory for 
IndicNormalizationFilter.Analyzer for Indonesian (Bahasa)
A 
TokenFilter that applies IndonesianStemmer to stem Indonesian words.Factory for 
IndonesianStemFilter.This class implements the stemming algorithm defined by a snowball script.
Encode a character array Integer as a 
BytesRef.Analyzer for Irish.Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair'
 should become 'n-athair')
Factory for 
IrishLowerCaseFilter.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Italian.A 
TokenFilter that applies ItalianLightStemmer to stem Italian words.Factory for 
ItalianLightStemFilter.This class implements the stemming algorithm defined by a snowball script.
A TokenFilter that only keeps tokens with text contained in the required words.
Factory for 
KeepWordFilter."Tokenizes" the entire stream as a single token.
Marks terms as keywords via the 
KeywordAttribute.Factory for 
KeywordMarkerFilter.This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other
 words once with 
KeywordAttribute.setKeyword(boolean) set to true and once
 set to false.Factory for 
KeywordRepeatFilter.Emits the entire input as a single token.
Factory for 
KeywordTokenizer.A high-performance kstem filter for english.
Factory for 
KStemFilter.Analyzer for Latvian.A 
TokenFilter that applies LatvianStemmer to stem Latvian words.Factory for 
LatvianStemFilter.Removes words that are too long or too short from the stream.
Factory for 
LengthFilter.A LetterTokenizer is a tokenizer that divides text at non-letters.
Factory for 
LetterTokenizer.This Analyzer limits the number of tokens while indexing.
This TokenFilter limits the number of tokens while indexing.
Factory for 
LimitTokenCountFilter.Lets all tokens pass through until it sees one with a start offset <= a configured limit,
 which won't pass and ends the stream.
Factory for 
LimitTokenOffsetFilter.This TokenFilter limits its emitted tokens to those with positions that are not greater than the
 configured limit.
Factory for 
LimitTokenPositionFilter.Analyzer for Lithuanian.This class implements the stemming algorithm defined by a snowball script.
Normalizes token text to lower case.
Factory for 
LowerCaseFilter.Simplistic 
CharFilter that applies the mappings contained in a NormalizeCharMap
 to the character stream, and correcting the resulting changes to the offsets.Factory for 
MappingCharFilter.Generate min hash tokens from an incoming stream of tokens.
High-level interface that represents morphological information in a dictionary
Analyzer for Nepali.
This class implements the stemming algorithm defined by a snowball script.
Factory for 
NGramTokenFilter.A 
FragmentChecker based on all character n-grams possible in a certain language, keeping
 them in a relatively memory-efficient, but probabilistic data structure.A callback for n-gram ranges in words
Tokenizes the input into n-grams of the given size(s).
Tokenizes the input into n-grams of the given size(s).
Factory for 
NGramTokenizer.Holds a map of String input to String output, to be used with 
MappingCharFilter.Builds an NormalizeCharMap.
Analyzer for Norwegian.A 
TokenFilter that applies NorwegianLightStemmer to stem Norwegian words.Factory for 
NorwegianLightStemFilter.A 
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian words.Factory for 
NorwegianMinimalStemFilter.This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
 variants (ae, oe, aa) by transforming them to åÅæÆøØ.
Factory for 
NorwegianNormalizationFilter.This class implements the stemming algorithm defined by a snowball script.
Assigns a payload to a token based on the 
TypeAttributeFactory for 
NumericPayloadTokenFilter.A StringBuilder that allows one to access the array.
Tokenizer for path-like hierarchies.
Factory for 
PathHierarchyTokenizer.Factory for 
PatternCaptureGroupTokenFilter.CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or
 more patterns.
This interface is used to connect the XML pattern file parser to the hyphenation tree.
Marks terms as keywords via the 
KeywordAttribute.A SAX document handler to read and parse hyphenation patterns from a XML file.
CharFilter that uses a regular expression for the target of replace string.
Factory for 
PatternReplaceCharFilter.A TokenFilter which applies a Pattern to each token in the stream, replacing match occurrences
 with the specified replacement string.
Factory for 
PatternReplaceFilter.This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
Factory for 
PatternTokenizer.Set a type attribute to a parameterized value when tokens are matched by any of a several regex
 patterns.
Value holding class for pattern typing rules.
Provides a filter that will analyze tokens with the analyzer from an arbitrary field type.
Mainly for use with the DelimitedPayloadTokenFilter, converts char buffers to 
BytesRef.Utility methods for encoding payloads.
This analyzer is used to facilitate scenarios where different fields require different analysis
 techniques.
Analyzer for Persian.CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.
Factory for 
PersianCharFilter.A 
TokenFilter that applies PersianNormalizer to normalize the orthography.Factory for 
PersianNormalizationFilter.A 
TokenFilter that applies PersianStemmer to stem Persian words.Factory for 
PersianStemFilter.Transforms the token stream as per the Porter stemming algorithm.
Factory for 
PorterStemFilter.This class implements the stemming algorithm defined by a snowball script.
Analyzer for Portuguese.A 
TokenFilter that applies PortugueseLightStemmer to stem Portuguese words.Factory for 
PortugueseLightStemFilter.A 
TokenFilter that applies PortugueseMinimalStemmer to stem Portuguese words.Factory for 
PortugueseMinimalStemFilter.A 
TokenFilter that applies PortugueseStemmer to stem Portuguese words.Factory for 
PortugueseStemFilter.This class implements the stemming algorithm defined by a snowball script.
A ConditionalTokenFilter that only applies its wrapped filters to tokens that are not contained
 in a protected set.
Factory for a 
ProtectedTermFilterAn 
Analyzer used primarily at query time to wrap another analyzer and provide a layer of
 protection which prevents very common words from being passed into queries.A TokenFilter which filters out Tokens at the same position and Term text as the previous token
 in the stream.
Factory for 
RemoveDuplicatesTokenFilter.Tokenizer for domain-like hierarchies.
Reverse token string, for example "country" => "yrtnuoc".
Factory for 
ReverseStringFilter.Acts like a forever growing char[] as you read characters into it from the provided reader, but
 internally it uses a circular buffer to only hold the characters that haven't been freed yet.
Analyzer for Romanian.TokenFilter that normalizes cedilla forms to comma forms.
Factory for 
RomanianNormalizationFilter.This class implements the stemming algorithm defined by a snowball script.
Base class for stemmers that use a set of RSLP-like stemming steps.
A basic rule, with no exceptions.
A rule with a set of whole-word exceptions.
A rule with a set of exceptional suffixes.
A step containing a list of rules.
Analyzer for Russian language.A 
TokenFilter that applies RussianLightStemmer to stem Russian words.Factory for 
RussianLightStemFilter.This class implements the stemming algorithm defined by a snowball script.
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
Factory for 
ScandinavianFoldingFilter.This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded
 variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
Factory for 
ScandinavianNormalizationFilter.This Normalizer does the heavy lifting for a set of Scandinavian normalization filters,
 normalizing use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa,
 ao, ae, oe and oo) by transforming them to åÅæÆøØ.
List of possible foldings that can be used when configuring the filter
Breaks text into sentences with a 
BreakIterator and allows subclasses to decompose these
 sentences into words.Analyzer for Serbian.Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
Factory for 
SerbianNormalizationFilter.Normalizes Serbian Cyrillic to Latin.
This class implements the stemming algorithm defined by a snowball script.
Marks terms as keywords via the 
KeywordAttribute.A ShingleAnalyzerWrapper wraps a 
ShingleFilter around another Analyzer.A ShingleFilter constructs shingles (token n-grams) from a token stream.
Factory for 
ShingleFilter.Factory for 
SimplePatternSplitTokenizer, for producing tokens by splitting according to
 the provided regexp.Factory for 
SimplePatternTokenizer, for matching tokens based on the provided regexp.A filter that stems words using a Snowball-generated stemmer.
Factory for 
SnowballFilter, with configurable languageBase class for a snowball stemmer
Parent class of all snowball stemmers, which must implement 
stemParser for the Solr synonyms format.
Analyzer for Sorani Kurdish.A 
TokenFilter that applies SoraniNormalizer to normalize the orthography.Factory for 
SoraniNormalizationFilter.A 
TokenFilter that applies SoraniStemmer to stem Sorani words.Factory for 
SoraniStemFilter.The strategy defining how a Hunspell dictionary should be loaded, with different tradeoffs.
Analyzer for Spanish.A 
TokenFilter that applies SpanishLightStemmer to stem Spanish words.Factory for 
SpanishLightStemFilter.Deprecated.
Use 
SpanishPluralStemFilter instead.Deprecated.
Use 
SpanishPluralStemFilterFactory insteadA 
TokenFilter that applies SpanishPluralStemmer to stem Spanish words.Factory for 
SpanishPluralStemFilterFactory.This class implements the stemming algorithm defined by a snowball script.
Provides the ability to override any 
KeywordAttribute aware stemmer with custom
 dictionary-based stemming.This builder builds an 
FST for the StemmerOverrideFilterA read-only 4-byte FST backed map that allows fast case-insensitive key value lookups for
 
StemmerOverrideFilterFactory for 
StemmerOverrideFilter.Some commonly-used stemming functions
Removes stop words from a token stream.
Factory for 
StopFilter.A generator for misspelled word corrections based on Hunspell flags.
An exception thrown when 
Hunspell.suggest(java.lang.String) call takes too long, if TimeoutPolicy.THROW_EXCEPTION is used.Analyzer for Swedish.A 
TokenFilter that applies SwedishLightStemmer to stem Swedish words.Factory for 
SwedishLightStemFilter.A 
TokenFilter that applies SwedishMinimalStemmer to stem Swedish words.Factory for 
SwedishMinimalStemFilter.This class implements the stemming algorithm defined by a snowball script.
Deprecated.
Use 
SynonymGraphFilter instead, but be sure to also use FlattenGraphFilter at index time (not at search time) as well.Deprecated.
Use 
SynonymGraphFilterFactory instead, but be sure to also use FlattenGraphFilterFactory at index time (not at search time) as well.Applies single- or multi-token synonyms from a 
SynonymMap to an incoming TokenStream, producing a fully correct graph output.Factory for 
SynonymGraphFilter.A map of synonyms, keys and values are phrases.
Builds an FSTSynonymMap.
Abstraction for parsing synonym files.
Analyzer for Tamil.
This class implements the stemming algorithm defined by a snowball script.
This TokenFilter provides the ability to set aside attribute states that have already been
 analyzed.
TokenStream output from a tee.
Analyzer for Telugu.
A 
TokenFilter that applies TeluguNormalizer to normalize the orthography.Factory for 
TeluguNormalizationFilter.A 
TokenFilter that applies TeluguStemmer to stem Telugu words.Factory for 
TeluguStemFilter.Wraps a term and boost
Ternary Search Tree.
Analyzer for Thai language.Tokenizer that use 
BreakIterator to tokenize Thai text.Factory for 
ThaiTokenizer.A strategy determining what to do when Hunspell API calls take too much time
Analyzed token with morphological data.
Thin wrapper around an FST with root-arc caching.
Adds the 
OffsetAttribute.startOffset() and OffsetAttribute.endOffset() First 4
 bytes are the startFactory for 
TokenOffsetPayloadTokenFilter.Token type reflecting the original source of this token
Trims leading and trailing whitespace from Tokens in the stream.
Factory for 
TrimFilter.A token filter for truncating the terms into a specific length.
Factory for 
TruncateTokenFilter.Analyzer for Turkish.Normalizes Turkish token text to lower case.
Factory for 
TurkishLowerCaseFilter.This class implements the stemming algorithm defined by a snowball script.
Makes the 
TypeAttribute a payload.Factory for 
TypeAsPayloadTokenFilter.Adds the 
TypeAttribute.type() as a synonym, i.e.Factory for 
TypeAsSynonymFilter.Removes tokens whose types appear in a set of blocked types from a token stream.
Factory class for 
TypeTokenFilter.Filters 
UAX29URLEmailTokenizer with LowerCaseFilter
 and StopFilter, using a list of English stop words.This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified
 in Unicode Standard Annex #29 URLs and email
 addresses are also tokenized according to the relevant RFCs.
Factory for 
UAX29URLEmailTokenizer.This class implements Word Break rules from the Unicode Text Segmentation 
 algorithm, as specified in 
 Unicode Standard Annex #29 
 URLs and email addresses are also tokenized according to the relevant RFCs.
This file contains unicode properties used by various 
CharTokenizers.An Analyzer that uses 
UnicodeWhitespaceTokenizer.A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
Normalizes token text to UPPER CASE.
Factory for 
UpperCaseFilter.Performs Viterbi algorithm for
 morphological Tokenizers, which split texts by Hidden Markov Model or Conditional Random Fields.
Holds all back pointers arriving to this position.
Holds partial graph (array of positions) for calculating the minimum cost path
Viterbi subclass for n-best path calculation.Yet another lattice data structure for keeping n-best path.
Viterbi.Position extension; this holds all forward pointers to calculate n-best path.An Analyzer that uses 
WhitespaceTokenizer.A tokenizer that divides text at whitespace characters as defined by 
Character.isWhitespace(int).Factory for 
WhitespaceTokenizer.Extension of StandardTokenizer that is aware of Wikipedia syntax.
Factory for 
WikipediaTokenizer.Word2VecModel is a class representing the parsed Word2Vec model containing the vectors for each
 word in dictionary
Applies single-token synonyms from a Word2Vec trained network to an incoming 
TokenStream.Factory for 
Word2VecSynonymFilter.The Word2VecSynonymProvider generates the list of sysnonyms of a term.
Supply Word2Vec Word2VecSynonymProvider cache avoiding that multiple instances of
 Word2VecSynonymFilterFactory will instantiate multiple instances of the same SynonymProvider.
Deprecated.
Use 
WordDelimiterGraphFilter instead: it produces a correct token graph so
     that e.g.Deprecated.
Use 
WordDelimiterGraphFilterFactory instead: it produces a correct token
     graph so that e.g.Splits words into subwords and performs optional transformations on subword groups, producing a
 correct token graph so that e.g.
Factory for 
WordDelimiterGraphFilter.A BreakIterator-like API for iterating over subwords in text, according to
 WordDelimiterGraphFilter rules.
A utility class used for generating possible word forms by adding affixes to stems (
WordFormGenerator.getAllWordForms(String, String, Runnable)), and suggesting stems and flags to generate the
 given set of words (WordFormGenerator.compress(List, Set, Runnable)).Parser for wordnet prolog format
This class implements the stemming algorithm defined by a snowball script.