Class BM25Similarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.BM25Similarity
BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline
 Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text
 REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.lucene.search.similarities.SimilaritySimilarity.SimScorer
- 
Constructor SummaryConstructorsConstructorDescriptionBM25 with these default values:k1 = 1.2b = 0.75discountOverlaps = trueBM25Similarity(boolean discountOverlaps) BM25 with these default values:k1 = 1.2b = 0.75and the supplied parameter value:BM25Similarity(float k1, float b) BM25 with the supplied parameter values.BM25Similarity(float k1, float b, boolean discountOverlaps) BM25 with the supplied parameter values.
- 
Method SummaryModifier and TypeMethodDescriptionprotected floatavgFieldLength(CollectionStatistics collectionStats) The default implementation computes the average assumTotalTermFreq / docCountfinal floatgetB()Returns thebparameterfinal floatgetK1()Returns thek1parameterprotected floatidf(long docFreq, long docCount) Implemented aslog(1 + (docCount - docFreq + 0.5)/(docFreq + 0.5)).idfExplain(CollectionStatistics collectionStats, TermStatistics termStats) Computes a score factor for a simple term and returns an explanation for that score factor.idfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats) Computes a score factor for a phrase.final Similarity.SimScorerscorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) Compute any collection-level weight (e.g.toString()Methods inherited from class org.apache.lucene.search.similarities.SimilaritycomputeNorm, getDiscountOverlaps
- 
Constructor Details- 
BM25Similaritypublic BM25Similarity(float k1, float b, boolean discountOverlaps) BM25 with the supplied parameter values.- Parameters:
- k1- Controls non-linear term frequency normalization (saturation).
- b- Controls to what degree document length normalizes tf values.
- discountOverlaps- True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
- Throws:
- IllegalArgumentException- if- k1is infinite or negative, or if- bis not within the range- [0..1]
 
- 
BM25Similaritypublic BM25Similarity(float k1, float b) BM25 with the supplied parameter values.- Parameters:
- k1- Controls non-linear term frequency normalization (saturation).
- b- Controls to what degree document length normalizes tf values.
- Throws:
- IllegalArgumentException- if- k1is infinite or negative, or if- bis not within the range- [0..1]
 
- 
BM25Similaritypublic BM25Similarity(boolean discountOverlaps) BM25 with these default values:- k1 = 1.2
- b = 0.75
 - Parameters:
- discountOverlaps- True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
 
- 
BM25Similaritypublic BM25Similarity()BM25 with these default values:- k1 = 1.2
- b = 0.75
- discountOverlaps = true
 
 
- 
- 
Method Details- 
idfprotected float idf(long docFreq, long docCount) Implemented aslog(1 + (docCount - docFreq + 0.5)/(docFreq + 0.5)).
- 
avgFieldLengthThe default implementation computes the average assumTotalTermFreq / docCount
- 
idfExplainComputes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses: idf(docFreq, docCount); Note thatCollectionStatistics.docCount()is used instead ofIndexReader#numDocs()because alsoTermStatistics.docFreq()is used, and when the latter is inaccurate, so isCollectionStatistics.docCount(), and in the same direction. In addition,CollectionStatistics.docCount()does not skew when fields are sparse.- Parameters:
- collectionStats- collection-level statistics
- termStats- term-level statistics for the term
- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
 
- 
idfExplainComputes a score factor for a phrase.The default implementation sums the idf factor for each term in the phrase. - Parameters:
- collectionStats- collection-level statistics
- termStats- term-level statistics for the terms in the phrase
- Returns:
- an Explain object that includes both an idf score factor for the phrase and an explanation for each term.
 
- 
scorerpublic final Similarity.SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) Description copied from class:SimilarityCompute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.- Specified by:
- scorerin class- Similarity
- Parameters:
- boost- a multiplicative factor to apply to the produces scores
- collectionStats- collection-level statistics, such as the number of tokens in the collection.
- termStats- term-level statistics, such as the document frequency of a term across the collection.
- Returns:
- SimWeight object with the information this Similarity needs to score a query.
 
- 
toString
- 
getK1public final float getK1()Returns thek1parameter- See Also:
 
- 
getBpublic final float getB()Returns thebparameter- See Also:
 
 
-