Class BM25Similarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.apache.lucene.search.similarities.BM25Similarity
-
public class BM25Similarity extends Similarity
BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
-
Constructor Summary
Constructors Constructor Description BM25Similarity()BM25 with these default values:k1 = 1.2b = 0.75discountOverlaps = trueBM25Similarity(boolean discountOverlaps)BM25 with these default values:k1 = 1.2b = 0.75and the supplied parameter value:BM25Similarity(float k1, float b)BM25 with the supplied parameter values.BM25Similarity(float k1, float b, boolean discountOverlaps)BM25 with the supplied parameter values.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected floatavgFieldLength(CollectionStatistics collectionStats)The default implementation computes the average assumTotalTermFreq / docCountfloatgetB()Returns thebparameterfloatgetK1()Returns thek1parameterprotected floatidf(long docFreq, long docCount)Implemented aslog(1 + (docCount - docFreq + 0.5)/(docFreq + 0.5)).ExplanationidfExplain(CollectionStatistics collectionStats, TermStatistics termStats)Computes a score factor for a simple term and returns an explanation for that score factor.ExplanationidfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)Computes a score factor for a phrase.Similarity.SimScorerscorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)Compute any collection-level weight (e.g.StringtoString()-
Methods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
-
-
-
Constructor Detail
-
BM25Similarity
public BM25Similarity(float k1, float b, boolean discountOverlaps)BM25 with the supplied parameter values.- Parameters:
k1- Controls non-linear term frequency normalization (saturation).b- Controls to what degree document length normalizes tf values.discountOverlaps- True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.- Throws:
IllegalArgumentException- ifk1is infinite or negative, or ifbis not within the range[0..1]
-
BM25Similarity
public BM25Similarity(float k1, float b)BM25 with the supplied parameter values.- Parameters:
k1- Controls non-linear term frequency normalization (saturation).b- Controls to what degree document length normalizes tf values.- Throws:
IllegalArgumentException- ifk1is infinite or negative, or ifbis not within the range[0..1]
-
BM25Similarity
public BM25Similarity(boolean discountOverlaps)
BM25 with these default values:k1 = 1.2b = 0.75
- Parameters:
discountOverlaps- True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
-
BM25Similarity
public BM25Similarity()
BM25 with these default values:k1 = 1.2b = 0.75discountOverlaps = true
-
-
Method Detail
-
idf
protected float idf(long docFreq, long docCount)Implemented aslog(1 + (docCount - docFreq + 0.5)/(docFreq + 0.5)).
-
avgFieldLength
protected float avgFieldLength(CollectionStatistics collectionStats)
The default implementation computes the average assumTotalTermFreq / docCount
-
idfExplain
public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)
Computes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()is used instead ofIndexReader#numDocs()because alsoTermStatistics.docFreq()is used, and when the latter is inaccurate, so isCollectionStatistics.docCount(), and in the same direction. In addition,CollectionStatistics.docCount()does not skew when fields are sparse.- Parameters:
collectionStats- collection-level statisticstermStats- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idfExplain
public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics[] termStats)
Computes a score factor for a phrase.The default implementation sums the idf factor for each term in the phrase.
- Parameters:
collectionStats- collection-level statisticstermStats- term-level statistics for the terms in the phrase- Returns:
- an Explain object that includes both an idf score factor for the phrase and an explanation for each term.
-
scorer
public final Similarity.SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats)
Description copied from class:SimilarityCompute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.- Specified by:
scorerin classSimilarity- Parameters:
boost- a multiplicative factor to apply to the produces scorescollectionStats- collection-level statistics, such as the number of tokens in the collection.termStats- term-level statistics, such as the document frequency of a term across the collection.- Returns:
- SimWeight object with the information this Similarity needs to score a query.
-
getK1
public final float getK1()
Returns thek1parameter- See Also:
BM25Similarity(float, float)
-
getB
public final float getB()
Returns thebparameter- See Also:
BM25Similarity(float, float)
-
-