Package org.apache.lucene.analysis.ta
Class TamilAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.ta.TamilAnalyzer
- All Implemented Interfaces:
- Closeable,- AutoCloseable
Analyzer for Tamil.
- Since:
- 9.0
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.lucene.analysis.AnalyzerAnalyzer.ReuseStrategy, Analyzer.TokenStreamComponents
- 
Field SummaryFieldsFields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBasestopwordsFields inherited from class org.apache.lucene.analysis.AnalyzerGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
- 
Constructor SummaryConstructorsConstructorDescriptionBuilds an analyzer with the default stop words:DEFAULT_STOPWORD_FILE.TamilAnalyzer(CharArraySet stopwords) Builds an analyzer with the given stop wordsTamilAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet) Builds an analyzer with the given stop words
- 
Method SummaryModifier and TypeMethodDescriptionprotected Analyzer.TokenStreamComponentscreateComponents(String fieldName) CreatesAnalyzer.TokenStreamComponentsused to tokenize all the text in the providedReader.static CharArraySetReturns an unmodifiable instance of the default stop-words set.protected TokenStreamnormalize(String fieldName, TokenStream in) Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBasegetStopwordSet, loadStopwordSet, loadStopwordSetMethods inherited from class org.apache.lucene.analysis.AnalyzerattributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
- 
Field Details- 
DEFAULT_STOPWORD_FILEFile containing default Tamil stopwords.Default stopword list is from https://github.com/AshokR/TamilNLP (Apache 2 License) - See Also:
 
 
- 
- 
Constructor Details- 
TamilAnalyzerBuilds an analyzer with the given stop words- Parameters:
- stopwords- a stopword set
- stemExclusionSet- a stemming exclusion set
 
- 
TamilAnalyzerBuilds an analyzer with the given stop words- Parameters:
- stopwords- a stopword set
 
- 
TamilAnalyzerpublic TamilAnalyzer()Builds an analyzer with the default stop words:DEFAULT_STOPWORD_FILE.
 
- 
- 
Method Details- 
getDefaultStopSetReturns an unmodifiable instance of the default stop-words set.- Returns:
- an unmodifiable instance of the default stop-words set.
 
- 
createComponentsCreatesAnalyzer.TokenStreamComponentsused to tokenize all the text in the providedReader.- Specified by:
- createComponentsin class- Analyzer
- Returns:
- Analyzer.TokenStreamComponentsbuilt from a- StandardTokenizerfiltered with- LowerCaseFilter,- DecimalDigitFilter,- IndicNormalizationFilter,- SetKeywordMarkerFilterif a stem exclusion set is provided,- SnowballFilter, and Tamil Stop words
 
- 
normalize
 
-