Package opennlp.tools.formats.ad
Class ADNameSampleStream
java.lang.Object
opennlp.tools.formats.ad.ADNameSampleStream
- All Implemented Interfaces:
- AutoCloseable,- ObjectStream<NameSample>
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
 Portuguese NER training.
 
 The data contains four named entity types: Person, Organization, Group,
 Place, Event, ArtProd, Abstract, Thing, Time and Numeric.
 
Data can be found on this web site.
 Information about the format:
 Susana Afonso.
 
   "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica".
 
 12 de Fevereiro de 2006.
 
Detailed info about the NER tagset.
Note: Do not use this class, internal use only!
- 
Constructor SummaryConstructorsConstructorDescriptionADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) Deprecated, for removal: This API element is subject to removal in a future version.ADNameSampleStream(ObjectStream<String> lineStream, boolean splitHyphenatedTokens) Initializes a newADNameSampleStreamstream from aObjectStream<String>, that could be aPlainTextByLineStreamobject.
- 
Method SummaryModifier and TypeMethodDescriptionvoidclose()Closes theObjectStreamand releases all allocated resources.read()Returns the nextObjectStreamobject.voidreset()Repositions the stream at the beginning and the previously seen object sequence will be repeated exactly.
- 
Constructor Details- 
ADNameSampleStreamInitializes a newADNameSampleStreamstream from aObjectStream<String>, that could be aPlainTextByLineStreamobject.- Parameters:
- lineStream- An- ObjectStream<String>as input.
- splitHyphenatedTokens- If- truehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
 
- 
ADNameSampleStream@Deprecated(forRemoval=true) public ADNameSampleStream(InputStreamFactory in, String charsetName, boolean splitHyphenatedTokens) throws IOException Deprecated, for removal: This API element is subject to removal in a future version.Initializes a newADNameSampleStreamfrom anInputStreamFactory- Parameters:
- in- The Corpus- InputStreamFactory.
- charsetName- The- charsetto use for reading of the corpus.
- splitHyphenatedTokens- If- truehyphenated tokens will be separated: "carros-monstro" > "carros" "-" "monstro".
- Throws:
- IOException
 
 
- 
- 
Method Details- 
readDescription copied from interface:ObjectStreamReturns the nextObjectStreamobject. Calling this method repeatedly until it returnsnullwill return each object from the underlying source exactly once.- Specified by:
- readin interface- ObjectStream<NameSample>
- Returns:
- The next object or nullto signal that the stream is exhausted.
- Throws:
- IOException- Thrown if there is an error during reading.
 
- 
resetDescription copied from interface:ObjectStreamRepositions the stream at the beginning and the previously seen object sequence will be repeated exactly. This method can be used to re-read the stream if multiple passes over the objects are required.The implementation of this method is optional. - Specified by:
- resetin interface- ObjectStream<NameSample>
- Throws:
- IOException- Thrown if there is an error during resetting the stream.
- UnsupportedOperationException- Thrown if the- reset()is not supported. By default, this is the case.
 
- 
closeDescription copied from interface:ObjectStreamCloses theObjectStreamand releases all allocated resources. After close was called, it's not allowed to callObjectStream.read()orObjectStream.reset().- Specified by:
- closein interface- AutoCloseable
- Specified by:
- closein interface- ObjectStream<NameSample>
- Throws:
- IOException- Thrown if there is an error during closing the stream.
 
 
-