Class SimplePatternSplitTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.pattern.SimplePatternSplitTokenizerFactory
Factory for 
SimplePatternSplitTokenizer, for producing tokens by splitting according to
 the provided regexp.
 This tokenizer uses Lucene RegExp pattern matching to construct distinct tokens for
 the input stream. The syntax is more limited than PatternTokenizer, but the tokenization
 is quite a bit faster. It takes two arguments: 
 
- "pattern" (required) is the regular expression, according to the syntax described at RegExp
- "determinizeWorkLimit" (optional, default Operations.DEFAULT_DETERMINIZE_WORK_LIMIT) the limit on total effort to determinize the automaton computed from the regexp
The pattern matches the characters that should split tokens, like String.split, and
 the matching is greedy such that the longest token separator matching at a given point is
 matched. Empty tokens are never created.
 
For example, to match tokens delimited by simple whitespace characters:
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.SimplePatternSplitTokenizerFactory" pattern="[ \t\r\n]+"/>
   </analyzer>
 </fieldType>- Since:
- 6.5.0
- See Also:
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "simplePatternSplit"
- 
Field SummaryFieldsFields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactoryLUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
- 
Constructor SummaryConstructorsConstructorDescriptionDefault ctor for compatibility with SPICreates a new SimpleSplitPatternTokenizerFactory
- 
Method SummaryMethods inherited from class org.apache.lucene.analysis.TokenizerFactoryavailableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizersMethods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactorydefaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
- 
Field Details- 
NAMESPI name- See Also:
 
- 
PATTERN- See Also:
 
 
- 
- 
Constructor Details- 
SimplePatternSplitTokenizerFactoryCreates a new SimpleSplitPatternTokenizerFactory
- 
SimplePatternSplitTokenizerFactorypublic SimplePatternSplitTokenizerFactory()Default ctor for compatibility with SPI
 
- 
- 
Method Details- 
create- Specified by:
- createin class- TokenizerFactory
 
 
-