Class CommonTermsQuery
- java.lang.Object
-
- org.apache.lucene.search.Query
-
- org.apache.lucene.queries.CommonTermsQuery
-
public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off theaddedterms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.CommonTermsQueryhas several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
-
-
Field Summary
Fields Modifier and Type Field Description protected floathighFreqBoostprotected floathighFreqMinNrShouldMatchprotected BooleanClause.OccurhighFreqOccurprotected floatlowFreqBoostprotected floatlowFreqMinNrShouldMatchprotected BooleanClause.OccurlowFreqOccurprotected floatmaxTermFrequencyprotected List<Term>terms
-
Constructor Summary
Constructors Constructor Description CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)Creates a newCommonTermsQuery
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(Term term)Adds a term to theCommonTermsQueryprotected QuerybuildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)protected intcalcHighFreqMinimumNumberShouldMatch(int numOptional)protected intcalcLowFreqMinimumNumberShouldMatch(int numOptional)voidcollectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms)booleanequals(Object other)floatgetHighFreqBoost()Gets the boost used for high frequency terms.floatgetHighFreqMinimumNumberShouldMatch()Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.BooleanClause.OccurgetHighFreqOccur()Gets theBooleanClause.Occurused for high frequency terms.floatgetLowFreqBoost()Gets the boost used for low frequency terms.floatgetLowFreqMinimumNumberShouldMatch()Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.BooleanClause.OccurgetLowFreqOccur()Gets theBooleanClause.Occurused for low frequency terms.floatgetMaxTermFrequency()Gets the maximum threshold of a terms document frequency to be considered a low frequency term.List<Term>getTerms()Gets the list of terms.inthashCode()protected QuerynewTermQuery(Term term, TermStates termStates)Builds a new TermQuery instance.Queryrewrite(IndexSearcher indexSearcher)voidsetHighFreqMinimumNumberShouldMatch(float min)Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.voidsetLowFreqMinimumNumberShouldMatch(float min)Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part.StringtoString(String field)voidvisit(QueryVisitor visitor)-
Methods inherited from class org.apache.lucene.search.Query
classHash, createWeight, rewrite, sameClassAs, toString
-
-
-
-
Field Detail
-
maxTermFrequency
protected final float maxTermFrequency
-
lowFreqOccur
protected final BooleanClause.Occur lowFreqOccur
-
highFreqOccur
protected final BooleanClause.Occur highFreqOccur
-
lowFreqBoost
protected float lowFreqBoost
-
highFreqBoost
protected float highFreqBoost
-
lowFreqMinNrShouldMatch
protected float lowFreqMinNrShouldMatch
-
highFreqMinNrShouldMatch
protected float highFreqMinNrShouldMatch
-
-
Constructor Detail
-
CommonTermsQuery
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)
Creates a newCommonTermsQuery- Parameters:
highFreqOccur-BooleanClause.Occurused for high frequency termslowFreqOccur-BooleanClause.Occurused for low frequency termsmaxTermFrequency- a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term.- Throws:
IllegalArgumentException- ifBooleanClause.Occur.MUST_NOTis pass as lowFreqOccur or highFreqOccur
-
-
Method Detail
-
add
public void add(Term term)
Adds a term to theCommonTermsQuery- Parameters:
term- the term to add
-
rewrite
public Query rewrite(IndexSearcher indexSearcher) throws IOException
- Overrides:
rewritein classQuery- Throws:
IOException
-
visit
public void visit(QueryVisitor visitor)
-
calcLowFreqMinimumNumberShouldMatch
protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
-
calcHighFreqMinimumNumberShouldMatch
protected int calcHighFreqMinimumNumberShouldMatch(int numOptional)
-
buildQuery
protected Query buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
-
collectTermStates
public void collectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws IOException
- Throws:
IOException
-
setLowFreqMinimumNumberShouldMatch
public void setLowFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number>=1as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getLowFreqMinimumNumberShouldMatch
public float getLowFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
-
setHighFreqMinimumNumberShouldMatch
public void setHighFreqMinimumNumberShouldMatch(float min)
Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number>=1as an absolut number of clauses that need to match.By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
- Parameters:
min- the number of optional clauses that must match
-
getHighFreqMinimumNumberShouldMatch
public float getHighFreqMinimumNumberShouldMatch()
Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
-
getMaxTermFrequency
public float getMaxTermFrequency()
Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
-
getLowFreqOccur
public BooleanClause.Occur getLowFreqOccur()
Gets theBooleanClause.Occurused for low frequency terms.
-
getHighFreqOccur
public BooleanClause.Occur getHighFreqOccur()
Gets theBooleanClause.Occurused for high frequency terms.
-
getLowFreqBoost
public float getLowFreqBoost()
Gets the boost used for low frequency terms.
-
getHighFreqBoost
public float getHighFreqBoost()
Gets the boost used for high frequency terms.
-
newTermQuery
protected Query newTermQuery(Term term, TermStates termStates)
Builds a new TermQuery instance.This is intended for subclasses that wish to customize the generated queries.
- Parameters:
term- termtermStates- the TermStates to be used to create the low level term query. Can benull.- Returns:
- new TermQuery instance
-
-