Package org.apache.lucene.analysis.bn
Class BengaliAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.bn.BengaliAnalyzer
- All Implemented Interfaces:
- Closeable,- AutoCloseable
Analyzer for Bengali.
- Since:
- 7.1.0
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.lucene.analysis.AnalyzerAnalyzer.ReuseStrategy, Analyzer.TokenStreamComponents
- 
Field SummaryFieldsFields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBasestopwordsFields inherited from class org.apache.lucene.analysis.AnalyzerGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
- 
Constructor SummaryConstructorsConstructorDescriptionBuilds an analyzer with the default stop words:DEFAULT_STOPWORD_FILE.BengaliAnalyzer(CharArraySet stopwords) Builds an analyzer with the given stop wordsBengaliAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet) Builds an analyzer with the given stop words
- 
Method SummaryModifier and TypeMethodDescriptionprotected Analyzer.TokenStreamComponentscreateComponents(String fieldName) CreatesAnalyzer.TokenStreamComponentsused to tokenize all the text in the providedReader.static CharArraySetReturns an unmodifiable instance of the default stop-words set.protected TokenStreamnormalize(String fieldName, TokenStream in) Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBasegetStopwordSet, loadStopwordSet, loadStopwordSetMethods inherited from class org.apache.lucene.analysis.AnalyzerattributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
- 
Field Details- 
DEFAULT_STOPWORD_FILEFile containing default Bengali stopwords.Default stopword list is from http://members.unine.ch/jacques.savoy/clef/bengaliST.txt The stopword list is BSD-Licensed. - See Also:
 
 
- 
- 
Constructor Details- 
BengaliAnalyzerBuilds an analyzer with the given stop words- Parameters:
- stopwords- a stopword set
- stemExclusionSet- a stemming exclusion set
 
- 
BengaliAnalyzerBuilds an analyzer with the given stop words- Parameters:
- stopwords- a stopword set
 
- 
BengaliAnalyzerpublic BengaliAnalyzer()Builds an analyzer with the default stop words:DEFAULT_STOPWORD_FILE.
 
- 
- 
Method Details- 
getDefaultStopSetReturns an unmodifiable instance of the default stop-words set.- Returns:
- an unmodifiable instance of the default stop-words set.
 
- 
createComponentsCreatesAnalyzer.TokenStreamComponentsused to tokenize all the text in the providedReader.- Specified by:
- createComponentsin class- Analyzer
- Returns:
- Analyzer.TokenStreamComponentsbuilt from a- StandardTokenizerfiltered with- LowerCaseFilter,- DecimalDigitFilter,- IndicNormalizationFilter,- BengaliNormalizationFilter,- SetKeywordMarkerFilterif a stem exclusion set is provided,- BengaliStemFilter, and Bengali Stop words
 
- 
normalize
 
-