Class CompoundWordTokenFilterBase
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
-
- All Implemented Interfaces:
Closeable,AutoCloseable,Unwrappable<TokenStream>
- Direct Known Subclasses:
DictionaryCompoundWordTokenFilter,HyphenationCompoundWordTokenFilter
public abstract class CompoundWordTokenFilterBase extends TokenFilter
Base class for decomposition token filters.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classCompoundWordTokenFilterBase.CompoundTokenHelper class to hold decompounded token information-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_SUBWORD_SIZEThe default for maximal length of subwords that get propagated to the output of this filterstatic intDEFAULT_MIN_SUBWORD_SIZEThe default for minimal length of subwords that get propagated to the output of this filterstatic intDEFAULT_MIN_WORD_SIZEThe default for minimal word length that gets decomposedprotected CharArraySetdictionaryprotected intmaxSubwordSizeprotected intminSubwordSizeprotected intminWordSizeprotected OffsetAttributeoffsetAttprotected booleanonlyLongestMatchprotected CharTermAttributetermAttprotected LinkedList<CompoundWordTokenFilterBase.CompoundToken>tokens-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedCompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary)protectedCompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, boolean onlyLongestMatch)protectedCompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract voiddecompose()Decomposes the currenttermAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in thetokenslist.booleanincrementToken()voidreset()-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, unwrap
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Field Detail
-
DEFAULT_MIN_WORD_SIZE
public static final int DEFAULT_MIN_WORD_SIZE
The default for minimal word length that gets decomposed- See Also:
- Constant Field Values
-
DEFAULT_MIN_SUBWORD_SIZE
public static final int DEFAULT_MIN_SUBWORD_SIZE
The default for minimal length of subwords that get propagated to the output of this filter- See Also:
- Constant Field Values
-
DEFAULT_MAX_SUBWORD_SIZE
public static final int DEFAULT_MAX_SUBWORD_SIZE
The default for maximal length of subwords that get propagated to the output of this filter- See Also:
- Constant Field Values
-
dictionary
protected final CharArraySet dictionary
-
tokens
protected final LinkedList<CompoundWordTokenFilterBase.CompoundToken> tokens
-
minWordSize
protected final int minWordSize
-
minSubwordSize
protected final int minSubwordSize
-
maxSubwordSize
protected final int maxSubwordSize
-
onlyLongestMatch
protected final boolean onlyLongestMatch
-
termAtt
protected final CharTermAttribute termAtt
-
offsetAtt
protected final OffsetAttribute offsetAtt
-
-
Constructor Detail
-
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, boolean onlyLongestMatch)
-
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary)
-
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
-
-
Method Detail
-
incrementToken
public final boolean incrementToken() throws IOException- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
decompose
protected abstract void decompose()
Decomposes the currenttermAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in thetokenslist. The original token may not be placed in the list, as it is automatically passed through this filter.
-
reset
public void reset() throws IOException- Overrides:
resetin classTokenFilter- Throws:
IOException
-
-