Package org.apache.lucene.analysis
Class TokenStreamToAutomaton
java.lang.Object
org.apache.lucene.analysis.TokenStreamToAutomaton
Consumes a TokenStream and creates an 
Automaton where the transition labels are UTF8
 bytes (or Unicode code points if unicodeArcs is true) from the TermToBytesRefAttribute.
 Between tokens we insert POS_SEP and for holes we insert HOLE.- WARNING: This API is experimental and might change in incompatible ways in the next release.
- 
Field SummaryFields
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionprotected BytesRefchangeToken(BytesRef in) Subclass and implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.voidsetFinalOffsetGapAsHole(boolean finalOffsetGapAsHole) If true, any final offset gaps will result in adding a position hole.voidsetPreservePositionIncrements(boolean enablePositionIncrements) Whether to generate holes in the automaton for missing positions,trueby default.voidsetUnicodeArcs(boolean unicodeArcs) Whether to make transition labels Unicode code points instead of UTF8 bytes,falseby defaultPulls the graph (includingPositionLengthAttribute) from the providedTokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
- 
Field Details- 
POS_SEPpublic static final int POS_SEPWe create transition between two adjacent tokens.- See Also:
 
- 
HOLEpublic static final int HOLEWe add this arc to represent a hole.- See Also:
 
 
- 
- 
Constructor Details- 
TokenStreamToAutomatonpublic TokenStreamToAutomaton()Sole constructor.
 
- 
- 
Method Details- 
setPreservePositionIncrementspublic void setPreservePositionIncrements(boolean enablePositionIncrements) Whether to generate holes in the automaton for missing positions,trueby default.
- 
setFinalOffsetGapAsHolepublic void setFinalOffsetGapAsHole(boolean finalOffsetGapAsHole) If true, any final offset gaps will result in adding a position hole.
- 
setUnicodeArcspublic void setUnicodeArcs(boolean unicodeArcs) Whether to make transition labels Unicode code points instead of UTF8 bytes,falseby default
- 
changeTokenSubclass and implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.
- 
toAutomatonPulls the graph (includingPositionLengthAttribute) from the providedTokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.- Throws:
- IOException
 
 
-