Class IndriDirichletSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
org.apache.lucene.search.similarities.LMSimilarity
org.apache.lucene.search.similarities.IndriDirichletSimilarity
Bayesian smoothing using Dirichlet priors as implemented in the Indri Search engine
(http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!
tf_E + mu*P(t|D) P(t|E)= documentLength + documentMu mu*P(t|C) + tf_D where P(t|D)= doclen + mu
A larger value for mu, produces more smoothing. Smoothing is most important for short documents where the probabilities are more granular.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classModelsp(w|C)as the number of occurrences of the term in the collection, divided by the total number of tokens+ 1.Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.LMSimilarity
LMSimilarity.CollectionModel, LMSimilarity.DefaultCollectionModel, LMSimilarity.LMStatsNested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer -
Field Summary
Fields inherited from class org.apache.lucene.search.similarities.LMSimilarity
collectionModel -
Constructor Summary
ConstructorsConstructorDescriptionInstantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(float mu) Instantiates the similarity with the provided μ parameter.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel) Instantiates the similarity with the default μ value of 2000.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, boolean discountOverlaps, float mu) Instantiates the similarity with the provided parameters.IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, float mu) Instantiates the similarity with the provided μ parameter. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidexplain(List<Explanation> subs, BasicStats stats, double freq, double docLen) Subclasses should implement this method to explain the score.floatgetMu()Returns the μ parameter.getName()Returns the name of the LM method.protected doublescore(BasicStats stats, double freq, double docLen) Scores the documentdoc.Methods inherited from class org.apache.lucene.search.similarities.LMSimilarity
fillBasicStats, newStats, toStringMethods inherited from class org.apache.lucene.search.similarities.SimilarityBase
explain, log2, scorerMethods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
Constructor Details
-
IndriDirichletSimilarity
public IndriDirichletSimilarity(LMSimilarity.CollectionModel collectionModel, boolean discountOverlaps, float mu) Instantiates the similarity with the provided parameters. -
IndriDirichletSimilarity
Instantiates the similarity with the provided μ parameter. -
IndriDirichletSimilarity
public IndriDirichletSimilarity(float mu) Instantiates the similarity with the provided μ parameter. -
IndriDirichletSimilarity
Instantiates the similarity with the default μ value of 2000. -
IndriDirichletSimilarity
public IndriDirichletSimilarity()Instantiates the similarity with the default μ value of 2000.
-
-
Method Details
-
score
Description copied from class:SimilarityBaseScores the documentdoc.Subclasses must apply their scoring formula in this class.
- Specified by:
scorein classSimilarityBase- Parameters:
stats- the corpus level statistics.freq- the term frequency.docLen- the document length.- Returns:
- the score.
-
explain
Description copied from class:SimilarityBaseSubclasses should implement this method to explain the score.explalready contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Overrides:
explainin classLMSimilarity- Parameters:
subs- the list of details of the explanation to extendstats- the corpus level statistics.freq- the term frequency.docLen- the document length.
-
getMu
public float getMu()Returns the μ parameter. -
getName
Description copied from class:LMSimilarityReturns the name of the LM method. The values of the parameters should be included as well.Used in
LMSimilarity.toString().- Specified by:
getNamein classLMSimilarity
-