Class DemoHTMLParser
java.lang.Object
org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser
- All Implemented Interfaces:
- HTMLParser
Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic final classThe actual parser to read HTML documents
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionParse the input Reader and return DocData.parse(DocData docData, String name, Date date, InputSource source, TrecContentSource trecSrc) 
- 
Constructor Details- 
DemoHTMLParserpublic DemoHTMLParser()
 
- 
- 
Method Details- 
parsepublic DocData parse(DocData docData, String name, Date date, Reader reader, TrecContentSource trecSrc) throws IOException Description copied from interface:HTMLParserParse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.- Specified by:
- parsein interface- HTMLParser
- Parameters:
- docData- result reused
- name- name of the result doc data.
- date- date of the result doc data. If null, attempt to set by parsed data.
- reader- reader of html text to parse.
- trecSrc- the- TrecContentSourceused to parse dates.
- Returns:
- Parsed doc data.
- Throws:
- IOException- If there is a low-level I/O error.
 
- 
parsepublic DocData parse(DocData docData, String name, Date date, InputSource source, TrecContentSource trecSrc) throws IOException, SAXException - Throws:
- IOException
- SAXException
 
 
-