Package org.apache.lucene.analysis.email
Fast, general-purpose URLs and email addresses tokenizers.
UAX29URLEmailTokenizer: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailAnalyzerincludesUAX29URLEmailTokenizer,LowerCaseFilterandStopFilter.
-
Class Summary Class Description UAX29URLEmailAnalyzer FiltersUAX29URLEmailTokenizerwithLowerCaseFilterandStopFilter, using a list of English stop words.UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.UAX29URLEmailTokenizerFactory Factory forUAX29URLEmailTokenizer.UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.