Class DirectSpellChecker
Candidates are presented directly from the term dictionary, based on Levenshtein distance.
 This is an alternative to SpellChecker if you are using an edit-distance-like metric such
 as Levenshtein or JaroWinklerDistance.
 
A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work.
- See Also:
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionprotected static classHolds a spelling correction for internal usage insideDirectSpellChecker.
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final StringDistanceThe default StringDistance, Damerau-Levenshtein distance implemented internally viaLevenshteinAutomata.
- 
Constructor SummaryConstructorsConstructorDescriptionCreates a DirectSpellChecker with default configuration values
- 
Method SummaryModifier and TypeMethodDescriptionfloatGet the minimal accuracy from the StringDistance for a matchGet the current comparator in use.Get the string distance metric in use.booleantrue if the spellchecker should lowercase termsintGet the maximum number of Levenshtein edit-distances to draw candidate terms from.intGet the maximum number of top-N inspections per suggestionfloatGet the maximum threshold of documents a query term can appear in order to provide suggestions.intGet the maximum length of a query term to return suggestionsintGet the minimal number of characters that must match exactlyintGet the minimum length of a query term needed to return suggestionsfloatGet the minimal threshold of documents a term must appear for a matchvoidsetAccuracy(float accuracy) Set the minimal accuracy required (default: 0.5f) from a StringDistance for a suggestion match.voidsetComparator(Comparator<SuggestWord> comparator) Set the comparator for sorting suggestions.voidsetDistance(StringDistance distance) Set the string distance metric.voidsetLowerCaseTerms(boolean lowerCaseTerms) True if the spellchecker should lowercase terms (default: true)voidsetMaxEdits(int maxEdits) Sets the maximum number of Levenshtein edit-distances to draw candidate terms from.voidsetMaxInspections(int maxInspections) Set the maximum number of top-N inspections (default: 5) per suggestion.voidsetMaxQueryFrequency(float maxQueryFrequency) Set the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions.voidsetMaxQueryLength(int maxQueryLength) Set the maximum length of a query term to return suggestions.voidsetMinPrefix(int minPrefix) Sets the minimal number of initial characters (default: 1) that must match exactly.voidsetMinQueryLength(int minQueryLength) Set the minimum length of a query term (default: 4) needed to return suggestions.voidsetThresholdFrequency(float thresholdFrequency) Set the minimal threshold of documents a term must appear for a match.suggestSimilar(Term term, int numSug, IndexReader ir) protected Collection<DirectSpellChecker.ScoreTerm> suggestSimilar(Term term, int numSug, IndexReader ir, int docfreq, int editDistance, float accuracy, CharsRefBuilder spare) Provide spelling corrections based on several parameters.suggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode) suggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode, float accuracy) Suggest similar words.
- 
Field Details- 
INTERNAL_LEVENSHTEINThe default StringDistance, Damerau-Levenshtein distance implemented internally viaLevenshteinAutomata.Note: this is the fastest distance metric, because Damerau-Levenshtein is used to draw candidates from the term dictionary: this just re-uses the scoring. 
 
- 
- 
Constructor Details- 
DirectSpellCheckerpublic DirectSpellChecker()Creates a DirectSpellChecker with default configuration values
 
- 
- 
Method Details- 
getMaxEditspublic int getMaxEdits()Get the maximum number of Levenshtein edit-distances to draw candidate terms from.
- 
setMaxEditspublic void setMaxEdits(int maxEdits) Sets the maximum number of Levenshtein edit-distances to draw candidate terms from. This value can be 1 or 2. The default is 2.Note: a large number of spelling errors occur with an edit distance of 1, by setting this value to 1 you can increase both performance and precision at the cost of recall. 
- 
getMinPrefixpublic int getMinPrefix()Get the minimal number of characters that must match exactly
- 
setMinPrefixpublic void setMinPrefix(int minPrefix) Sets the minimal number of initial characters (default: 1) that must match exactly.This can improve both performance and accuracy of results, as misspellings are commonly not the first character. 
- 
getMaxInspectionspublic int getMaxInspections()Get the maximum number of top-N inspections per suggestion
- 
setMaxInspectionspublic void setMaxInspections(int maxInspections) Set the maximum number of top-N inspections (default: 5) per suggestion.Increasing this number can improve the accuracy of results, at the cost of performance. 
- 
getAccuracypublic float getAccuracy()Get the minimal accuracy from the StringDistance for a match
- 
setAccuracypublic void setAccuracy(float accuracy) Set the minimal accuracy required (default: 0.5f) from a StringDistance for a suggestion match.
- 
getThresholdFrequencypublic float getThresholdFrequency()Get the minimal threshold of documents a term must appear for a match
- 
setThresholdFrequencypublic void setThresholdFrequency(float thresholdFrequency) Set the minimal threshold of documents a term must appear for a match.This can improve quality by only suggesting high-frequency terms. Note that very high values might decrease performance slightly, by forcing the spellchecker to draw more candidates from the term dictionary, but a practical value such as 1can be very useful towards improving quality.This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional. 
- 
getMinQueryLengthpublic int getMinQueryLength()Get the minimum length of a query term needed to return suggestions
- 
setMinQueryLengthpublic void setMinQueryLength(int minQueryLength) Set the minimum length of a query term (default: 4) needed to return suggestions.Very short query terms will often cause only bad suggestions with any distance metric. 
- 
getMaxQueryLengthpublic int getMaxQueryLength()Get the maximum length of a query term to return suggestions
- 
setMaxQueryLengthpublic void setMaxQueryLength(int maxQueryLength) Set the maximum length of a query term to return suggestions.Long queries can be expensive to process and/or trigger exceptions. 
- 
getMaxQueryFrequencypublic float getMaxQueryFrequency()Get the maximum threshold of documents a query term can appear in order to provide suggestions.
- 
setMaxQueryFrequencypublic void setMaxQueryFrequency(float maxQueryFrequency) Set the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions.Very high-frequency terms are typically spelled correctly. Additionally, this can increase performance as it will do no work for the common case of correctly-spelled input terms. This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional. 
- 
getLowerCaseTermspublic boolean getLowerCaseTerms()true if the spellchecker should lowercase terms
- 
setLowerCaseTermspublic void setLowerCaseTerms(boolean lowerCaseTerms) True if the spellchecker should lowercase terms (default: true)This is a convenience method, if your index field has more complicated analysis (such as StandardTokenizer removing punctuation), it's probably better to turn this off, and instead run your query terms through your Analyzer first. If this option is not on, case differences count as an edit! 
- 
getComparatorGet the current comparator in use.
- 
setComparatorSet the comparator for sorting suggestions. The default isSuggestWordQueue.DEFAULT_COMPARATOR
- 
getDistanceGet the string distance metric in use.
- 
setDistanceSet the string distance metric. The default isINTERNAL_LEVENSHTEINNote: because this spellchecker draws its candidates from the term dictionary using Damerau-Levenshtein, it works best with an edit-distance-like string metric. If you use a different metric than the default, you might want to consider increasing setMaxInspections(int)to draw more candidates for your metric to rank.
- 
suggestSimilar- Throws:
- IOException
 
- 
suggestSimilarpublic SuggestWord[] suggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode) throws IOException - Throws:
- IOException
 
- 
suggestSimilarpublic SuggestWord[] suggestSimilar(Term term, int numSug, IndexReader ir, SuggestMode suggestMode, float accuracy) throws IOException Suggest similar words.Unlike SpellChecker, the similarity used to fetch the most relevant terms is an edit distance, therefore typically a low value for numSug will work very well.- Parameters:
- term- Term you want to spell check on
- numSug- the maximum number of suggested words
- ir- IndexReader to find terms from
- suggestMode- specifies when to return suggested words
- accuracy- return only suggested words that match with this similarity
- Returns:
- sorted list of the suggested words according to the comparator
- Throws:
- IOException- If there is a low-level I/O error.
 
- 
suggestSimilarprotected Collection<DirectSpellChecker.ScoreTerm> suggestSimilar(Term term, int numSug, IndexReader ir, int docfreq, int editDistance, float accuracy, CharsRefBuilder spare) throws IOException Provide spelling corrections based on several parameters.- Parameters:
- term- The term to suggest spelling corrections for
- numSug- The maximum number of spelling corrections
- ir- The index reader to fetch the candidate spelling corrections from
- docfreq- The minimum document frequency a potential suggestion need to have in order to be included
- editDistance- The maximum edit distance candidates are allowed to have
- accuracy- The minimum accuracy a suggested spelling correction needs to have in order to be included
- spare- a chars scratch
- Returns:
- a collection of spelling corrections sorted by ScoreTerm's natural order.
- Throws:
- IOException- If I/O related errors occur
 
 
-