Class ClassicSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.TFIDFSimilarity
org.apache.lucene.search.similarities.ClassicSimilarity
Expert: Historical scoring implementation. You might want to consider using
BM25Similarity instead, which is generally considered superior to TF-IDF.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer -
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor: parameter-freeClassicSimilarity(boolean discountOverlaps) Primary constructor. -
Method Summary
Modifier and TypeMethodDescriptionfloatidf(long docFreq, long docCount) Implemented aslog((docCount+1)/(docFreq+1)) + 1.idfExplain(CollectionStatistics collectionStats, TermStatistics termStats) Computes a score factor for a simple term and returns an explanation for that score factor.floatlengthNorm(int numTerms) Implemented as1/sqrt(length).floattf(float freq) Implemented assqrt(freq).toString()Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
idfExplain, scorerMethods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
Constructor Details
-
ClassicSimilarity
public ClassicSimilarity()Default constructor: parameter-free -
ClassicSimilarity
public ClassicSimilarity(boolean discountOverlaps) Primary constructor.
-
-
Method Details
-
lengthNorm
public float lengthNorm(int numTerms) Implemented as1/sqrt(length).- Specified by:
lengthNormin classTFIDFSimilarity- Parameters:
numTerms- the number of terms in the field, optionallydiscounting overlaps- Returns:
- a length normalization value
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
tf
public float tf(float freq) Implemented assqrt(freq).- Specified by:
tfin classTFIDFSimilarity- Parameters:
freq- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
-
idfExplain
Description copied from class:TFIDFSimilarityComputes a score factor for a simple term and returns an explanation for that score factor.The default implementation uses:
idf(docFreq, docCount);
Note thatCollectionStatistics.docCount()is used instead ofIndexReader#numDocs()because alsoTermStatistics.docFreq()is used, and when the latter is inaccurate, so isCollectionStatistics.docCount(), and in the same direction. In addition,CollectionStatistics.docCount()does not skew when fields are sparse.- Overrides:
idfExplainin classTFIDFSimilarity- Parameters:
collectionStats- collection-level statisticstermStats- term-level statistics for the term- Returns:
- an Explain object that includes both an idf score factor and an explanation for the term.
-
idf
public float idf(long docFreq, long docCount) Implemented aslog((docCount+1)/(docFreq+1)) + 1.- Specified by:
idfin classTFIDFSimilarity- Parameters:
docFreq- the number of documents which contain the termdocCount- the total number of documents in the collection- Returns:
- a score factor based on the term's document frequency
-
toString
-