Class Intervals
interval sources.
These sources implement minimum-interval algorithms taken from the paper Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics
Note: by default, sources that are sensitive to internal gaps (e.g. PHRASE
and MAXGAPS) will rewrite their sub-sources so that disjunctions of different lengths are
pulled up to the top of the interval tree. For example, PHRASE(or(PHRASE("a", "b", "c"),
"b"), "c") will automatically rewrite itself to OR(PHRASE("a", "b", "c", "c"),
PHRASE("b", "c")) to ensure that documents containing "b c" are matched. This can lead
to less efficient queries, as more terms need to be loaded (for example, the "c" iterator
above is loaded twice), so if you care more about speed than about accuracy you can use the
or(boolean, IntervalsSource...) factory method to prevent rewriting.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe default number of expansions in:multiterm(CompiledAutomaton, String) -
Method Summary
Modifier and TypeMethodDescriptionstatic IntervalsSourceafter(IntervalsSource source, IntervalsSource reference) Returns intervals from the source that appear after intervals from the referencestatic IntervalsSourceanalyzedText(String text, Analyzer analyzer, String field, int maxGaps, boolean ordered) Returns intervals that correspond to tokens from aTokenStreamreturned fortextby applying the providedAnalyzeras iftextwas the content of the givenfield.static IntervalsSourceanalyzedText(TokenStream tokenStream, int maxGaps, boolean ordered) Returns intervals that correspond to tokens from the providedTokenStream.static IntervalsSourceatLeast(int minShouldMatch, IntervalsSource... sources) Return intervals that span combinations of intervals fromminShouldMatchof the sourcesstatic IntervalsSourcebefore(IntervalsSource source, IntervalsSource reference) Returns intervals from the source that appear before intervals from the referencestatic IntervalsSourcecontainedBy(IntervalsSource small, IntervalsSource big) Create a contained-byIntervalsSourcestatic IntervalsSourcecontaining(IntervalsSource big, IntervalsSource small) Create a containingIntervalsSourcestatic IntervalsSourceextend(IntervalsSource source, int before, int after) Create anIntervalsSourcethat wraps another source, extending its intervals by a number of positions before and after.static IntervalsSourcefixField(String field, IntervalsSource source) Create anIntervalsSourcethat always returns intervals from a specific fieldstatic IntervalsSourceA fuzzy termIntervalsSourcematches the disjunction of intervals of terms that are within the specifiedmaxEditsfrom the provided term.static IntervalsSourceA fuzzy termIntervalsSourcematches the disjunction of intervals of terms that are within the specifiedmaxEditsfrom the provided term.static IntervalsSourcemaxgaps(int gaps, IntervalsSource subSource) Create anIntervalsSourcethat filters a sub-source by its gapsstatic IntervalsSourcemaxwidth(int width, IntervalsSource subSource) Create anIntervalsSourcethat filters a sub-source by the width of its intervalsstatic IntervalsSourcemultiterm(CompiledAutomaton ca, int maxExpansions, String pattern) Expert: Return anIntervalsSourceover the disjunction of all terms that are accepted by the given automatonstatic IntervalsSourcemultiterm(CompiledAutomaton ca, String pattern) Expert: Return anIntervalsSourceover the disjunction of all terms that are accepted by the given automatonstatic IntervalsSourcenoIntervals(String reason) Returns a source that produces no intervalsstatic IntervalsSourcenonOverlapping(IntervalsSource minuend, IntervalsSource subtrahend) Create a non-overlapping IntervalsSourcestatic IntervalsSourcenotContainedBy(IntervalsSource small, IntervalsSource big) Create a not-contained-byIntervalsSourcestatic IntervalsSourcenotContaining(IntervalsSource minuend, IntervalsSource subtrahend) Create a not-containingIntervalsSourcestatic IntervalsSourcenotWithin(IntervalsSource minuend, int positions, IntervalsSource subtrahend) Create a not-withinIntervalsSourcestatic IntervalsSourceor(boolean rewrite, List<IntervalsSource> subSources) Return anIntervalsSourceover the disjunction of a set of sub-sourcesstatic IntervalsSourceor(boolean rewrite, IntervalsSource... subSources) Return anIntervalsSourceover the disjunction of a set of sub-sourcesstatic IntervalsSourceor(List<IntervalsSource> subSources) Return anIntervalsSourceover the disjunction of a set of sub-sourcesstatic IntervalsSourceor(IntervalsSource... subSources) Return anIntervalsSourceover the disjunction of a set of sub-sourcesstatic IntervalsSourceordered(IntervalsSource... subSources) Create an orderedIntervalsSourcestatic IntervalsSourceoverlapping(IntervalsSource source, IntervalsSource reference) Returns intervals from a source that overlap with intervals from another sourcestatic IntervalsSourceReturn anIntervalsSourceexposing intervals for a phrase consisting of a list of termsstatic IntervalsSourcephrase(IntervalsSource... subSources) Return anIntervalsSourceexposing intervals for a phrase consisting of a list ofinterval sourcesstatic IntervalsSourceReturn anIntervalsSourceover the disjunction of all terms that begin with a prefixstatic IntervalsSourceExpert: Return anIntervalsSourceover the disjunction of all terms that begin with a prefixstatic IntervalsSourceReturn anIntervalsSourceover the disjunction of all terms that fall within the given rangestatic IntervalsSourcerange(BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper, int maxExpansions) Expert: Return anIntervalsSourceover the disjunction of all terms that fall within the given rangestatic IntervalsSourceReturn anIntervalsSourceover the disjunction of all terms that match a regular expressionstatic IntervalsSourceExpert: Return anIntervalsSourceover the disjunction of all terms that match a regular expressionstatic IntervalsSourceReturn anIntervalsSourceexposing intervals for a termstatic IntervalsSourceReturn anIntervalsSourceexposing intervals for a term, filtered by the value of the term's payload at each positionstatic IntervalsSourceReturn anIntervalsSourceexposing intervals for a termstatic IntervalsSourceReturn anIntervalsSourceexposing intervals for a term, filtered by the value of the term's payload at each positionstatic IntervalsSourceunordered(IntervalsSource... subSources) Create an unorderedIntervalsSource.static IntervalsSourceCreate an unorderedIntervalsSourceallowing no overlaps between subsourcesstatic IntervalsSourceReturn anIntervalsSourceover the disjunction of all terms that match a wildcard globstatic IntervalsSourceExpert: Return anIntervalsSourceover the disjunction of all terms that match a wildcard globstatic IntervalsSourcewithin(IntervalsSource source, int positions, IntervalsSource reference) Returns intervals of the source that appear within a set number of positions of intervals from the reference
-
Field Details
-
DEFAULT_MAX_EXPANSIONS
public static final int DEFAULT_MAX_EXPANSIONSThe default number of expansions in:- See Also:
-
-
Method Details
-
term
Return anIntervalsSourceexposing intervals for a term -
term
Return anIntervalsSourceexposing intervals for a term -
term
Return anIntervalsSourceexposing intervals for a term, filtered by the value of the term's payload at each position -
term
Return anIntervalsSourceexposing intervals for a term, filtered by the value of the term's payload at each position -
phrase
Return anIntervalsSourceexposing intervals for a phrase consisting of a list of terms -
phrase
Return anIntervalsSourceexposing intervals for a phrase consisting of a list ofinterval sources -
or
Return anIntervalsSourceover the disjunction of a set of sub-sourcesAutomatically rewrites if wrapped by an interval source that is sensitive to internal gaps
-
or
Return anIntervalsSourceover the disjunction of a set of sub-sources- Parameters:
rewrite- iffalse, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimizationsubSources- the sources to combine
-
or
Return anIntervalsSourceover the disjunction of a set of sub-sources -
or
Return anIntervalsSourceover the disjunction of a set of sub-sources- Parameters:
rewrite- iffalse, do not rewrite intervals that are sensitive to internal gaps; this may run more efficiently, but can miss valid hits due to minimizationsubSources- the sources to combine
-
prefix
Return anIntervalsSourceover the disjunction of all terms that begin with a prefix- Throws:
IllegalStateException- if the prefix expands to more thanDEFAULT_MAX_EXPANSIONSterms
-
prefix
Expert: Return anIntervalsSourceover the disjunction of all terms that begin with a prefixWARNING: Setting
maxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive- Parameters:
prefix- the prefix to expandmaxExpansions- the maximum number of terms to expand to- Throws:
IllegalStateException- if the prefix expands to more thanmaxExpansionsterms
-
wildcard
Return anIntervalsSourceover the disjunction of all terms that match a wildcard glob- Throws:
IllegalStateException- if the wildcard glob expands to more thanDEFAULT_MAX_EXPANSIONSterms- See Also:
-
wildcard
Expert: Return anIntervalsSourceover the disjunction of all terms that match a wildcard globWARNING: Setting
maxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive- Parameters:
wildcard- the glob to expandmaxExpansions- the maximum number of terms to expand to- Throws:
IllegalStateException- if the wildcard glob expands to more thanmaxExpansionsterms- See Also:
-
regexp
Return anIntervalsSourceover the disjunction of all terms that match a regular expression- Parameters:
regexp- regular expression- Throws:
IllegalStateException- if the regex expands to more thanDEFAULT_MAX_EXPANSIONSterms- See Also:
-
regexp
Expert: Return anIntervalsSourceover the disjunction of all terms that match a regular expressionWARNING: Setting
maxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive- Parameters:
regexp- regular expressionmaxExpansions- the maximum number of terms to expand to- Throws:
IllegalStateException- if the regex expands to more thanDEFAULT_MAX_EXPANSIONSterms- See Also:
-
range
public static IntervalsSource range(BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper) Return anIntervalsSourceover the disjunction of all terms that fall within the given range- Parameters:
lowerTerm- The term text at the lower end of the range; can benullto indicate an open-ended range at this endupperTerm- The term text at the upper end of the range; can benullto indicate an open-ended range at this endincludeLower- If true, thelowerTermis included in the rangeincludeUpper- If true, theupperTermis included in the range- Throws:
IllegalStateException- if the range expands to more thanDEFAULT_MAX_EXPANSIONSterms
-
range
public static IntervalsSource range(BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper, int maxExpansions) Expert: Return anIntervalsSourceover the disjunction of all terms that fall within the given rangeWARNING: Setting
maxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive- Parameters:
lowerTerm- The term text at the lower end of the range; can benullto indicate an open-ended range at this endupperTerm- The term text at the upper end of the range; can benullto indicate an open-ended range at this endincludeLower- If true, thelowerTermis included in the rangeincludeUpper- If true, theupperTermis included in the rangemaxExpansions- the maximum number of terms to expand to- Throws:
IllegalStateException- if the wildcard glob expands to more thanmaxExpansionsterms
-
fuzzyTerm
A fuzzy termIntervalsSourcematches the disjunction of intervals of terms that are within the specifiedmaxEditsfrom the provided term.- Parameters:
term- the term to search formaxEdits- must be>= 0and<=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, useFuzzyQuery.defaultMaxEditsfor the default, if needed.- See Also:
-
fuzzyTerm
public static IntervalsSource fuzzyTerm(String term, int maxEdits, int prefixLength, boolean transpositions, int maxExpansions) A fuzzy termIntervalsSourcematches the disjunction of intervals of terms that are within the specifiedmaxEditsfrom the provided term.The implementation is delegated to a
multiterm(CompiledAutomaton, int, String)interval source, with an automaton sourced fromFuzzyQuery.- Parameters:
term- the term to search formaxEdits- must be>= 0and<=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE, useFuzzyQuery.defaultMaxEditsfor the default, if needed.prefixLength- length of common (non-fuzzy) prefixtranspositions- true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.maxExpansions- the maximum number of terms to match. SettingmaxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive
-
multiterm
Expert: Return anIntervalsSourceover the disjunction of all terms that are accepted by the given automaton- Parameters:
ca- an automaton accepting matching termspattern- string representation of the given automaton, mostly used in exception messages- Throws:
IllegalStateException- if the automaton accepts more thanDEFAULT_MAX_EXPANSIONSterms
-
multiterm
Expert: Return anIntervalsSourceover the disjunction of all terms that are accepted by the given automatonWARNING: Setting
maxExpansionsto higher than the default value ofDEFAULT_MAX_EXPANSIONScan be both slow and memory-intensive- Parameters:
ca- an automaton accepting matching termsmaxExpansions- the maximum number of terms to expand topattern- string representation of the given automaton, mostly used in exception messages- Throws:
IllegalStateException- if the automaton accepts more thanmaxExpansionsterms
-
maxwidth
Create anIntervalsSourcethat filters a sub-source by the width of its intervals- Parameters:
width- the maximum width of intervals in the sub-source to filtersubSource- the sub-source to filter
-
maxgaps
Create anIntervalsSourcethat filters a sub-source by its gaps- Parameters:
gaps- the maximum number of gaps in the sub-source to filtersubSource- the sub-source to filter
-
extend
Create anIntervalsSourcethat wraps another source, extending its intervals by a number of positions before and after.This can be useful for adding defined gaps in a block query; for example, to find 'a b [2 arbitrary terms] c', you can call:
Intervals.phrase(Intervals.term("a"), Intervals.extend(Intervals.term("b"), 0, 2), Intervals.term("c"));Note that callingIntervalIterator.gaps()on iterators returned by this source delegates directly to the wrapped iterator, and does not include the extensions.- Parameters:
source- the source to extendbefore- how many positions to extend before the delegated intervalafter- how many positions to extend after the delegated interval
-
ordered
Create an orderedIntervalsSourceReturns intervals in which the subsources all appear in the given order
- Parameters:
subSources- an ordered set ofIntervalsSourceobjects
-
unordered
Create an unorderedIntervalsSource. Note that if there are multiple intervals ends at the same position are eligible, only the narrowest one will be returned. For example if asking forunordered(term("apple"), term("banana"))on field of "apple wolf apple orange banana", only the "apple orange banana" will be returned.Returns intervals in which all the subsources appear. The subsources may overlap
- Parameters:
subSources- an unordered set ofIntervalsSources
-
unorderedNoOverlaps
Create an unorderedIntervalsSourceallowing no overlaps between subsourcesReturns intervals in which both the subsources appear and do not overlap.
-
fixField
Create anIntervalsSourcethat always returns intervals from a specific fieldThis is useful for comparing intervals across multiple fields, for example fields that have been analyzed differently, allowing you to search for stemmed terms near unstemmed terms, etc.
-
nonOverlapping
Create a non-overlapping IntervalsSourceReturns intervals of the minuend that do not overlap with intervals from the subtrahend
- Parameters:
minuend- theIntervalsSourceto filtersubtrahend- theIntervalsSourceto filter by
-
overlapping
Returns intervals from a source that overlap with intervals from another source- Parameters:
source- the source to filterreference- the source to filter by
-
notWithin
public static IntervalsSource notWithin(IntervalsSource minuend, int positions, IntervalsSource subtrahend) Create a not-withinIntervalsSourceReturns intervals of the minuend that do not appear within a set number of positions of intervals from the subtrahend query
- Parameters:
minuend- theIntervalsSourceto filterpositions- the minimum distance that intervals from the minuend may occur from intervals of the subtrahendsubtrahend- theIntervalsSourceto filter by
-
within
public static IntervalsSource within(IntervalsSource source, int positions, IntervalsSource reference) Returns intervals of the source that appear within a set number of positions of intervals from the reference- Parameters:
source- theIntervalsSourceto filterpositions- the maximum distance that intervals of the source may occur from intervals of the referencereference- theIntervalsSourceto filter by
-
notContaining
Create a not-containingIntervalsSourceReturns intervals from the minuend that do not contain intervals of the subtrahend
- Parameters:
minuend- theIntervalsSourceto filtersubtrahend- theIntervalsSourceto filter by
-
containing
Create a containingIntervalsSourceReturns intervals from the big source that contain one or more intervals from the small source
- Parameters:
big- theIntervalsSourceto filtersmall- theIntervalsSourceto filter by
-
notContainedBy
Create a not-contained-byIntervalsSourceReturns intervals from the small
IntervalsSourcethat do not appear within intervals from the bigIntervalsSource.- Parameters:
small- theIntervalsSourceto filterbig- theIntervalsSourceto filter by
-
containedBy
Create a contained-byIntervalsSourceReturns intervals from the small query that appear within intervals of the big query
- Parameters:
small- theIntervalsSourceto filterbig- theIntervalsSourceto filter by
-
atLeast
Return intervals that span combinations of intervals fromminShouldMatchof the sources -
before
Returns intervals from the source that appear before intervals from the reference -
after
Returns intervals from the source that appear after intervals from the reference -
noIntervals
Returns a source that produces no intervals- Parameters:
reason- A reason string that will appear in the toString output of this source
-
analyzedText
public static IntervalsSource analyzedText(String text, Analyzer analyzer, String field, int maxGaps, boolean ordered) throws IOException Returns intervals that correspond to tokens from aTokenStreamreturned fortextby applying the providedAnalyzeras iftextwas the content of the givenfield. The intervals can be ordered or unordered and can have optional gaps inside.- Parameters:
text- The text to analyze.analyzer- TheAnalyzerto use to acquire aTokenStreamwhich is then converted into intervals.field- The fieldtextshould be parsed as.maxGaps- Maximum number of allowed gaps between sub-intervals resulting from tokens.ordered- Whether sub-intervals should enforce token ordering or not.- Returns:
- Returns an
IntervalsSourcethat matches tokens acquired from analysis oftext. Possibly an empty interval source, nevernull. - Throws:
IOException- If an I/O exception occurs.
-
analyzedText
public static IntervalsSource analyzedText(TokenStream tokenStream, int maxGaps, boolean ordered) throws IOException Returns intervals that correspond to tokens from the providedTokenStream. This is a low-level counterpart toanalyzedText(String, Analyzer, String, int, boolean). The intervals can be ordered or unordered and can have optional gaps inside.- Parameters:
tokenStream- The token stream to produce intervals for. The token stream may be fully or partially consumed after returning from this method.maxGaps- Maximum number of allowed gaps between sub-intervals resulting from tokens.ordered- Whether sub-intervals should enforce token ordering or not.- Returns:
- Returns an
IntervalsSourcethat matches tokens acquired from analysis oftext. Possibly an empty interval source, nevernull. - Throws:
IOException- If an I/O exception occurs.
-