Package org.apache.lucene.analysis.morph
Class Viterbi<T extends Token,U extends Viterbi.Position> 
java.lang.Object
org.apache.lucene.analysis.morph.Viterbi<T,U> 
- Type Parameters:
- T- output token class
- U- position class
- Direct Known Subclasses:
- ViterbiNBest
Performs Viterbi algorithm for
 morphological Tokenizers, which split texts by Hidden Markov Model or Conditional Random Fields.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classHolds all back pointers arriving to this position.static final classHolds partial graph (array of positions) for calculating the minimum cost path
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected final RollingCharBufferprotected final ConnectionCostsprotected booleanprotected booleanprotected intprotected static final intprotected booleanprotected booleanprotected intprotected final Viterbi.WrappedPositionArray<U> protected static final booleanprotected final IntsRef
- 
Constructor SummaryConstructorsModifierConstructorDescriptionprotectedViterbi(TokenInfoFST fst, FST.BytesReader fstReader, BinaryDictionary<? extends MorphData> dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, Dictionary<? extends MorphData> userDictionary, ConnectionCosts costs, Class<U> positionImpl) 
- 
Method SummaryModifier and TypeMethodDescriptionprotected final voidadd(MorphData morphData, Viterbi.Position fromPosData, int wordPos, int endPos, int wordID, TokenType type, boolean addPenalty) Add a token on the minimum cost path to the pending token list.protected abstract voidbacktrace(Viterbi.Position endPosData, int fromIDX) Backtrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list.protected voidbacktraceNBest(Viterbi.Position endPosData, boolean useEOS) Backtrace the n-best path.protected intcomputePenalty(int pos, int length) Returns the penalty for a specific input regionprotected intcomputeSpacePenalty(MorphData morphData, int wordID, int numSpaces) Returns the space penalty.protected voidRemove duplicated tokens from the pending list; this is needed becausebacktrace(Position, int)andbacktraceNBest(Position, boolean)can add same tokens to the list.final voidforward()Incrementally parse some more characters.intgetPos()booleanisEnd()booleanprotected abstract intprocessUnknownWord(boolean anyMatches, Viterbi.Position posData) Add unknown words to the position graph.voidresetBuffer(Reader reader) voidprotected booleanshouldSkipProcessUnknownWord(int unknownWordEndIndex, Viterbi.Position posData) 
- 
Field Details- 
VERBOSEprotected static final boolean VERBOSE- See Also:
 
- 
MAX_UNKNOWN_WORD_LENGTHprotected static final int MAX_UNKNOWN_WORD_LENGTH- See Also:
 
- 
costs
- 
wordIdRef
- 
buffer
- 
positions
- 
endprotected boolean end
- 
lastBackTracePosprotected int lastBackTracePos
- 
posprotected int pos
- 
pending
- 
outputNBestprotected boolean outputNBest
- 
enableSpacePenaltyFactorprotected boolean enableSpacePenaltyFactor
- 
outputLongestUserEntryOnlyprotected boolean outputLongestUserEntryOnly
 
- 
- 
Constructor Details- 
Viterbiprotected Viterbi(TokenInfoFST fst, FST.BytesReader fstReader, BinaryDictionary<? extends MorphData> dictionary, TokenInfoFST userFST, FST.BytesReader userFSTReader, Dictionary<? extends MorphData> userDictionary, ConnectionCosts costs, Class<U> positionImpl) 
 
- 
- 
Method Details- 
forwardIncrementally parse some more characters. This runs the viterbi search forwards "enough" so that we generate some more tokens. How much forward depends on the chars coming in, since some chars could cause longer-lasting ambiguity in the parsing. Once the ambiguity is resolved, then we back trace, produce the pending tokens, and return.- Throws:
- IOException
 
- 
shouldSkipProcessUnknownWord
- 
processUnknownWordprotected abstract int processUnknownWord(boolean anyMatches, Viterbi.Position posData) throws IOException Add unknown words to the position graph.- Returns:
- word length
- Throws:
- IOException
 
- 
backtraceBacktrace from the provided position, back to the last time we back-traced, accumulating the resulting tokens to the pending list. The pending list is then in-reverse (last token should be returned first).- Throws:
- IOException
 
- 
backtraceNBestBacktrace the n-best path. Subclasses that support n-best paths should implement this method.- Throws:
- IOException
 
- 
fixupPendingListprotected void fixupPendingList()Remove duplicated tokens from the pending list; this is needed becausebacktrace(Position, int)andbacktraceNBest(Position, boolean)can add same tokens to the list. Subclasses that support n-best paths should implement this method.
- 
addprotected final void add(MorphData morphData, Viterbi.Position fromPosData, int wordPos, int endPos, int wordID, TokenType type, boolean addPenalty) throws IOException Add a token on the minimum cost path to the pending token list.- Throws:
- IOException
 
- 
computeSpacePenaltyReturns the space penalty.
- 
computePenaltyReturns the penalty for a specific input region- Throws:
- IOException
 
- 
getPospublic int getPos()
- 
isEndpublic boolean isEnd()
- 
getPending
- 
isOutputNBestpublic boolean isOutputNBest()
- 
resetBuffer
- 
resetStatepublic void resetState()
 
-