Package org.apache.lucene.analysis.en
Class AbstractWordsFileFilterFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenFilterFactory
org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory
- All Implemented Interfaces:
ResourceLoaderAware
- Direct Known Subclasses:
CommonGramsFilterFactory,KeepWordFilterFactory,StopFilterFactory
public abstract class AbstractWordsFileFilterFactory
extends TokenFilterFactory
implements ResourceLoaderAware
Abstract parent class for analysis factories that accept a stopwords file as input.
Concrete implementations can leverage the following input attributes. All attributes are optional:
ignoreCasedefaults tofalsewordsshould be the name of a stopwords file to parse, if not specified the factory will use the value provided bycreateDefaultWords()implementation in concrete subclass.formatdefines how thewordsfile will be parsed, and defaults towordset. Ifwordsis not specified, thenformatmust not be specified.
The valid values for the format option are:
wordset- This is the default format, which supports one word per line (including any intra-word whitespace) and allows whole line comments beginning with the "#" character. Blank lines are ignored. SeeWordlistLoader.getLinesfor details.snowball- This format allows for multiple words specified on each line, and trailing comments may be specified using the vertical line ("|"). Blank lines are ignored. SeeWordlistLoader.getSnowballWordSetfor details.
-
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedDefault ctor for compatibility with SPIInitialize this factory via a set of key-value pairs. -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract CharArraySetDefault word set implementation.getWords()voidinform(ResourceLoader loader) Initialize the set of stopwords provided via ResourceLoader, or using defaults.booleanMethods inherited from class org.apache.lucene.analysis.TokenFilterFactory
availableTokenFilters, create, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
FORMAT_WORDSET
- See Also:
-
FORMAT_SNOWBALL
- See Also:
-
-
Constructor Details
-
AbstractWordsFileFilterFactory
protected AbstractWordsFileFilterFactory()Default ctor for compatibility with SPI -
AbstractWordsFileFilterFactory
Initialize this factory via a set of key-value pairs.
-
-
Method Details
-
inform
Initialize the set of stopwords provided via ResourceLoader, or using defaults.- Specified by:
informin interfaceResourceLoaderAware- Throws:
IOException
-
createDefaultWords
Default word set implementation. -
getWords
-
getWordFiles
-
getFormat
-
isIgnoreCase
public boolean isIgnoreCase()
-