Class PatternCaptureGroupTokenFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.pattern.PatternCaptureGroupTokenFilter
-
- All Implemented Interfaces:
Closeable,AutoCloseable,Unwrappable<TokenStream>
public final class PatternCaptureGroupTokenFilter extends TokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns.For example, a pattern like:
"(https?://([a-zA-Z\-_0-9.]+))"when matched against the string "http://www.foo.com/index" would return the tokens "https://www.foo.com" and "www.foo.com".
If none of the patterns match, or if preserveOriginal is true, the original token will be preserved.
Each pattern is matched as often as it can be, so the pattern
"(...)", when matched against"abcdefghi"would produce["abc","def","ghi"]A camelCaseFilter could be written as:
"([A-Z]{2,})", "(?<![A-Z])([A-Z][a-z]+)", "(?:^|\\b|(?<=[0-9_])|(?<=[A-Z]{2}))([a-z]+)", "([0-9]+)"plus if
preserveOriginalis true, it would also return"camelCaseFilter"
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanincrementToken()voidreset()-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, unwrap
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
PatternCaptureGroupTokenFilter
public PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
- Parameters:
input- the inputTokenStreampreserveOriginal- set to true to return the original token even if one of the patterns matchespatterns- an array ofPatternobjects to match against each token
-
-
Method Detail
-
incrementToken
public boolean incrementToken() throws IOException- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
reset
public void reset() throws IOException- Overrides:
resetin classTokenFilter- Throws:
IOException
-
-