Class SimpleNaiveBayesDocumentClassifier
java.lang.Object
org.apache.lucene.classification.SimpleNaiveBayesClassifier
org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier
- All Implemented Interfaces:
Classifier<BytesRef>,DocumentClassifier<BytesRef>
public class SimpleNaiveBayesDocumentClassifier
extends SimpleNaiveBayesClassifier
implements DocumentClassifier<BytesRef>
A simplistic Lucene based NaiveBayes classifier, see
http://en.wikipedia.org/wiki/Naive_Bayes_classifier- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
FieldsFields inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
analyzer, classFieldName, indexReader, indexSearcher, query, textFieldNames -
Constructor Summary
ConstructorsConstructorDescriptionSimpleNaiveBayesDocumentClassifier(IndexReader indexReader, Query query, String classFieldName, Map<String, Analyzer> field2analyzer, String... textFieldNames) Creates a new NaiveBayes classifier. -
Method Summary
Modifier and TypeMethodDescriptionassignClass(Document document) Assign a class (with score) to the givenDocumentgetClasses(Document document) Get all the classes (sorted by score, descending) assigned to the givenDocument.getClasses(Document document, int max) Get the firstmaxclasses (sorted by score, descending) assigned to the given text String.protected String[]getTokenArray(TokenStream tokenizedText) Returns a token array from theTokenStreamin inputMethods inherited from class org.apache.lucene.classification.SimpleNaiveBayesClassifier
assignClass, assignClassNormalizedList, countDocsWithClass, getClasses, getClasses, normClassificationResults, tokenize
-
Field Details
-
field2analyzer
Analyzerto be used for tokenizing document fields
-
-
Constructor Details
-
SimpleNaiveBayesDocumentClassifier
public SimpleNaiveBayesDocumentClassifier(IndexReader indexReader, Query query, String classFieldName, Map<String, Analyzer> field2analyzer, String... textFieldNames) Creates a new NaiveBayes classifier.- Parameters:
indexReader- the reader on the index to be used for classificationquery- aQueryto eventually filter the docs used for training the classifier, ornullif all the indexed docs should be usedclassFieldName- the name of the field used as the output for the classifier NOTE: must not be heavely analyzed as the returned class will be a token indexed for this fieldtextFieldNames- the name of the fields used as the inputs for the classifier, they can contain boosting indication e.g. title^10
-
-
Method Details
-
assignClass
Description copied from interface:DocumentClassifierAssign a class (with score) to the givenDocument- Specified by:
assignClassin interfaceDocumentClassifier<BytesRef>- Parameters:
document- aDocumentto be classified. Fields are considered features for the classification.- Returns:
- a
ClassificationResultholding assigned class of typeTand score - Throws:
IOException- If there is a low-level I/O error.
-
getClasses
Description copied from interface:DocumentClassifierGet all the classes (sorted by score, descending) assigned to the givenDocument.- Specified by:
getClassesin interfaceDocumentClassifier<BytesRef>- Parameters:
document- aDocumentto be classified. Fields are considered features for the classification.- Returns:
- the whole list of
ClassificationResult, the classes and scores. Returnsnullif the classifier can't make lists. - Throws:
IOException- If there is a low-level I/O error.
-
getClasses
public List<ClassificationResult<BytesRef>> getClasses(Document document, int max) throws IOException Description copied from interface:DocumentClassifierGet the firstmaxclasses (sorted by score, descending) assigned to the given text String.- Specified by:
getClassesin interfaceDocumentClassifier<BytesRef>- Parameters:
document- aDocumentto be classified. Fields are considered features for the classification.max- the number of return list elements- Returns:
- the whole list of
ClassificationResult, the classes and scores. Cut for "max" number of elements. Returnsnullif the classifier can't make lists. - Throws:
IOException- If there is a low-level I/O error.
-
getTokenArray
Returns a token array from theTokenStreamin input- Parameters:
tokenizedText- the tokenized content of a field- Returns:
- a
Stringarray of the resulting tokens - Throws:
IOException- If tokenization fails because there is a low-level I/O error
-