Class HyphenationCompoundWordTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilter
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- Unwrappable<TokenStream>
A 
TokenFilter that decomposes compound words found in many
 Germanic languages.
 "Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseCompoundWordTokenFilterBase.CompoundTokenNested classes/interfaces inherited from class org.apache.lucene.util.AttributeSourceAttributeSource.State
- 
Field SummaryFields inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseDEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, offsetAtt, onlyLongestMatch, termAtt, tokensFields inherited from class org.apache.lucene.analysis.TokenFilterinputFields inherited from class org.apache.lucene.analysis.TokenStreamDEFAULT_TOKEN_ATTRIBUTE_FACTORY
- 
Constructor SummaryConstructorsConstructorDescriptionHyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator) Create a HyphenationCompoundWordTokenFilter with no dictionary.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize) Create a HyphenationCompoundWordTokenFilter with no dictionary.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary) Creates a newHyphenationCompoundWordTokenFilterinstance.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch) Creates a newHyphenationCompoundWordTokenFilterinstance.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch, boolean noSubMatches, boolean noOverlappingMatches) Creates a newHyphenationCompoundWordTokenFilterinstance.
- 
Method SummaryModifier and TypeMethodDescriptionprotected voidDecomposes the currentCompoundWordTokenFilterBase.termAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in theCompoundWordTokenFilterBase.tokenslist.static HyphenationTreegetHyphenationTree(String hyphenationFilename) Create a hyphenator treestatic HyphenationTreegetHyphenationTree(InputSource hyphenationSource) Create a hyphenator treeMethods inherited from class org.apache.lucene.analysis.compound.CompoundWordTokenFilterBaseincrementToken, resetMethods inherited from class org.apache.lucene.analysis.TokenFilterclose, end, unwrapMethods inherited from class org.apache.lucene.util.AttributeSourceaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
- 
Constructor Details- 
HyphenationCompoundWordTokenFilterpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary) Creates a newHyphenationCompoundWordTokenFilterinstance.- Parameters:
- input- the- TokenStreamto process
- hyphenator- the hyphenation pattern tree to use for hyphenation
- dictionary- the word dictionary to match against.
 
- 
HyphenationCompoundWordTokenFilterpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch) Creates a newHyphenationCompoundWordTokenFilterinstance.- Parameters:
- input- the- TokenStreamto process
- hyphenator- the hyphenation pattern tree to use for hyphenation
- dictionary- the word dictionary to match against.
- minWordSize- only words longer than this get processed
- minSubwordSize- only subwords longer than this get to the output stream
- maxSubwordSize- only subwords shorter than this get to the output stream
- onlyLongestMatch- Add only the longest matching subword to the stream
 
- 
HyphenationCompoundWordTokenFilterpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch, boolean noSubMatches, boolean noOverlappingMatches) Creates a newHyphenationCompoundWordTokenFilterinstance.- Parameters:
- input- the- TokenStreamto process
- hyphenator- the hyphenation pattern tree to use for hyphenation
- dictionary- the word dictionary to match against.
- minWordSize- only words longer than this get processed
- minSubwordSize- only subwords longer than this get to the output stream
- maxSubwordSize- only subwords shorter than this get to the output stream
- onlyLongestMatch- Add only the longest matching subword to the stream
- noSubMatches- Excludes subwords that are enclosed by an other token
- noOverlappingMatches- Excludes subwords that overlap with an other subword
 
- 
HyphenationCompoundWordTokenFilterpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize) Create a HyphenationCompoundWordTokenFilter with no dictionary.
- 
HyphenationCompoundWordTokenFilterCreate a HyphenationCompoundWordTokenFilter with no dictionary.
 
- 
- 
Method Details- 
getHyphenationTreeCreate a hyphenator tree- Parameters:
- hyphenationFilename- the filename of the XML grammar to load
- Returns:
- An object representing the hyphenation patterns
- Throws:
- IOException- If there is a low-level I/O error.
 
- 
getHyphenationTreeCreate a hyphenator tree- Parameters:
- hyphenationSource- the InputSource pointing to the XML grammar
- Returns:
- An object representing the hyphenation patterns
- Throws:
- IOException- If there is a low-level I/O error.
 
- 
decomposeprotected void decompose()Description copied from class:CompoundWordTokenFilterBaseDecomposes the currentCompoundWordTokenFilterBase.termAttand placesCompoundWordTokenFilterBase.CompoundTokeninstances in theCompoundWordTokenFilterBase.tokenslist. The original token may not be placed in the list, as it is automatically passed through this filter.- Specified by:
- decomposein class- CompoundWordTokenFilterBase
 
 
-