Class SimplePatternSplitTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.pattern.SimplePatternSplitTokenizer
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class SimplePatternSplitTokenizer extends Tokenizer
This tokenizer uses a LuceneRegExpor (expert usage) a pre-built determinizedAutomaton, to locate tokens. The regexp syntax is more limited thanPatternTokenizer, but the tokenization is quite a bit faster. This is just likeSimplePatternTokenizerexcept that the pattern should make valid token separator characters, likeString.split. Empty string tokens are never produced.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description SimplePatternSplitTokenizer(String regexp)SeeRegExpfor the accepted syntax.SimplePatternSplitTokenizer(AttributeFactory factory, String regexp, int determinizeWorkLimit)SeeRegExpfor the accepted syntax.SimplePatternSplitTokenizer(AttributeFactory factory, Automaton dfa)Runs a pre-built automaton.SimplePatternSplitTokenizer(Automaton dfa)Runs a pre-built automaton.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidend()booleanincrementToken()voidreset()-
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader, setReaderTestPoint
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(String regexp)
SeeRegExpfor the accepted syntax.
-
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(Automaton dfa)
Runs a pre-built automaton.
-
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(AttributeFactory factory, String regexp, int determinizeWorkLimit)
SeeRegExpfor the accepted syntax.
-
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(AttributeFactory factory, Automaton dfa)
Runs a pre-built automaton.
-
-
Method Detail
-
incrementToken
public boolean incrementToken() throws IOException- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
end
public void end() throws IOException- Overrides:
endin classTokenStream- Throws:
IOException
-
reset
public void reset() throws IOException- Overrides:
resetin classTokenizer- Throws:
IOException
-
-