All Packages Class Hierarchy This Package Previous Next Index
java.lang.Object | +----com.jclark.xml.tok.Encoding
Encoding object corresponds to a possible
 encoding (a mapping from characters to sequences of bytes).
 It provides operations on byte arrays
 that represent all or part of a parsed XML entity in that encoding.
 
 The set of ASCII characters excluding $@\^`{}~
 have a special status; these are called XML significant
 characters.
 
This class imposes certain restrictions on an encoding:
 Several methods operate on byte subarrays. The subarray is specified
 by a byte array buf and two integers,
 off and end; off
 gives the index in buf of the first byte of the subarray
 and end gives the
 index in buf of the byte immediately after the last byte.
 
 Use the getInitialEncoding method to get an
 Encoding object to use to start parsing an entity.
 
 The main operations provided by Encoding are
 tokenizeProlog, tokenizeContent and
 tokenizeCdataSection;
 these are used to divide up an XML entity into tokens.
 tokenizeProlog is used for the prolog of an XML document
 as well as for the external subset and parameter entities (except
 when referenced in an EntityValue);
 it can also be used for parsing the Misc* that follows
 the document element.
 tokenizeContent is used for the document element and for
 parsed general entities that are referenced in content
 except for CDATA sections.
 tokenizeCdataSection is used for CDATA sections, following
 the <![CDATA[ up to and including the ]]>.
 
 tokenizeAttributeValue and tokenizeEntityValue
 are used to further divide up tokens returned by tokenizeProlog
 and tokenizeContent; they are also used to divide up entities
 referenced in attribute values or entity values.
 
 TOK_ATTRIBUTE_VALUE_S
	TOK_ATTRIBUTE_VALUE_S
   TOK_CDATA_SECT_CLOSE
	TOK_CDATA_SECT_CLOSE
  ]]>.
   TOK_CDATA_SECT_OPEN
	TOK_CDATA_SECT_OPEN
  < TOK_CHAR_PAIR_REF
	TOK_CHAR_PAIR_REF
   TOK_CHAR_REF
	TOK_CHAR_REF
   TOK_CLOSE_BRACKET
	TOK_CLOSE_BRACKET
  ] in the prolog.
   TOK_CLOSE_PAREN
	TOK_CLOSE_PAREN
  ) in the prolog that is not
 followed immediately by any of
  *, + or ?.
   TOK_CLOSE_PAREN_ASTERISK
	TOK_CLOSE_PAREN_ASTERISK
  )* in the prolog.
   TOK_CLOSE_PAREN_PLUS
	TOK_CLOSE_PAREN_PLUS
  )+ in the prolog.
   TOK_CLOSE_PAREN_QUESTION
	TOK_CLOSE_PAREN_QUESTION
  )? in the prolog.
   TOK_COMMA
	TOK_COMMA
  , in the prolog.
   TOK_COMMENT
	TOK_COMMENT
  <!-- comment -->.
   TOK_COND_SECT_CLOSE
	TOK_COND_SECT_CLOSE
  ]]> in the prolog.
   TOK_COND_SECT_OPEN
	TOK_COND_SECT_OPEN
  < TOK_DATA_CHARS
	TOK_DATA_CHARS
   TOK_DATA_NEWLINE
	TOK_DATA_NEWLINE
   TOK_DECL_CLOSE
	TOK_DECL_CLOSE
  > in the prolog.
   TOK_DECL_OPEN
	TOK_DECL_OPEN
  <!NAME in the prolog.
   TOK_EMPTY_ELEMENT_NO_ATTS
	TOK_EMPTY_ELEMENT_NO_ATTS
  <name/>,
 that doesn't have any attribute specifications.
   TOK_EMPTY_ELEMENT_WITH_ATTS
	TOK_EMPTY_ELEMENT_WITH_ATTS
  <name att="val"/>,
 that contains one or more attribute specifications.
   TOK_END_TAG
	TOK_END_TAG
  </name>.
   TOK_ENTITY_REF
	TOK_ENTITY_REF
   TOK_LITERAL
	TOK_LITERAL
   TOK_MAGIC_ENTITY_REF
	TOK_MAGIC_ENTITY_REF
  amp, lt, gt,
 quot, apos.
   TOK_NAME
	TOK_NAME
   TOK_NAME_ASTERISK
	TOK_NAME_ASTERISK
  *.
   TOK_NAME_PLUS
	TOK_NAME_PLUS
  +.
   TOK_NAME_QUESTION
	TOK_NAME_QUESTION
  ?.
   TOK_NMTOKEN
	TOK_NMTOKEN
   TOK_OPEN_BRACKET
	TOK_OPEN_BRACKET
  [ in the prolog.
   TOK_OPEN_PAREN
	TOK_OPEN_PAREN
  ( in the prolog.
   TOK_OR
	TOK_OR
  | in the prolog.
   TOK_PARAM_ENTITY_REF
	TOK_PARAM_ENTITY_REF
   TOK_PERCENT
	TOK_PERCENT
  % in the prolog that does not start
 a parameter entity reference.
   TOK_PI
	TOK_PI
   TOK_POUND_NAME
	TOK_POUND_NAME
  #NAME in the prolog.
   TOK_PROLOG_S
	TOK_PROLOG_S
   TOK_START_TAG_NO_ATTS
	TOK_START_TAG_NO_ATTS
  <name>,
 that doesn't have any attribute specifications.
   TOK_START_TAG_WITH_ATTS
	TOK_START_TAG_WITH_ATTS
  <name att="val">,
 that contains one or more attribute specifications.
   TOK_XML_DECL
	TOK_XML_DECL
  xml).
 
 convert(byte[], int, int, char[], int)
	convert(byte[], int, int, char[], int)
   getEncoding(String)
	getEncoding(String)
  Encoding corresponding to
 the specified IANA character set name.
   getFixedBytesPerChar()
	getFixedBytesPerChar()
  char,
 or zero if different chars are represented by different
 numbers of bytes.
   getInitialEncoding(byte[], int, int, Token)
	getInitialEncoding(byte[], int, int, Token)
   getInternalEncoding()
	getInternalEncoding()
  Encoding object for use with internal entities.
   getMinBytesPerChar()
	getMinBytesPerChar()
   getPublicId(byte[], int, int)
	getPublicId(byte[], int, int)
   getSingleByteEncoding(String)
	getSingleByteEncoding(String)
  Encoding for entities encoded with
 a single-byte encoding (an encoding in which each byte represents
 exactly one character).
   matchesXMLString(byte[], int, int, String)
	matchesXMLString(byte[], int, int, String)
   movePosition(byte[], int, int, Position)
	movePosition(byte[], int, int, Position)
   skipIgnoreSect(byte[], int, int)
	skipIgnoreSect(byte[], int, int)
   skipS(byte[], int, int)
	skipS(byte[], int, int)
   tokenizeAttributeValue(byte[], int, int, Token)
	tokenizeAttributeValue(byte[], int, int, Token)
   tokenizeCdataSection(byte[], int, int, Token)
	tokenizeCdataSection(byte[], int, int, Token)
   tokenizeContent(byte[], int, int, ContentToken)
	tokenizeContent(byte[], int, int, ContentToken)
   tokenizeEntityValue(byte[], int, int, Token)
	tokenizeEntityValue(byte[], int, int, Token)
   tokenizeProlog(byte[], int, int, Token)
	tokenizeProlog(byte[], int, int, Token)
   
 TOK_DATA_CHARS
TOK_DATA_CHARS
public static final int TOK_DATA_CHARS
 TOK_DATA_NEWLINE
TOK_DATA_NEWLINE
public static final int TOK_DATA_NEWLINE
 TOK_START_TAG_NO_ATTS
TOK_START_TAG_NO_ATTS
public static final int TOK_START_TAG_NO_ATTS
<name>,
 that doesn't have any attribute specifications.
 TOK_START_TAG_WITH_ATTS
TOK_START_TAG_WITH_ATTS
public static final int TOK_START_TAG_WITH_ATTS
<name att="val">,
 that contains one or more attribute specifications.
 TOK_EMPTY_ELEMENT_NO_ATTS
TOK_EMPTY_ELEMENT_NO_ATTS
public static final int TOK_EMPTY_ELEMENT_NO_ATTS
<name/>,
 that doesn't have any attribute specifications.
 TOK_EMPTY_ELEMENT_WITH_ATTS
TOK_EMPTY_ELEMENT_WITH_ATTS
public static final int TOK_EMPTY_ELEMENT_WITH_ATTS
<name att="val"/>,
 that contains one or more attribute specifications.
 TOK_END_TAG
TOK_END_TAG
public static final int TOK_END_TAG
</name>.
 TOK_CDATA_SECT_OPEN
TOK_CDATA_SECT_OPEN
public static final int TOK_CDATA_SECT_OPEN
< TOK_CDATA_SECT_CLOSE
TOK_CDATA_SECT_CLOSE
public static final int TOK_CDATA_SECT_CLOSE
]]>.
 TOK_ENTITY_REF
TOK_ENTITY_REF
public static final int TOK_ENTITY_REF
 TOK_MAGIC_ENTITY_REF
TOK_MAGIC_ENTITY_REF
public static final int TOK_MAGIC_ENTITY_REF
amp, lt, gt,
 quot, apos.
 TOK_CHAR_REF
TOK_CHAR_REF
public static final int TOK_CHAR_REF
 TOK_CHAR_PAIR_REF
TOK_CHAR_PAIR_REF
public static final int TOK_CHAR_PAIR_REF
 TOK_PI
TOK_PI
public static final int TOK_PI
 TOK_XML_DECL
TOK_XML_DECL
public static final int TOK_XML_DECL
xml).
 TOK_COMMENT
TOK_COMMENT
public static final int TOK_COMMENT
<!-- comment -->.
 This can occur both in the prolog and in content.
 TOK_ATTRIBUTE_VALUE_S
TOK_ATTRIBUTE_VALUE_S
public static final int TOK_ATTRIBUTE_VALUE_S
 TOK_PARAM_ENTITY_REF
TOK_PARAM_ENTITY_REF
public static final int TOK_PARAM_ENTITY_REF
 TOK_PROLOG_S
TOK_PROLOG_S
public static final int TOK_PROLOG_S
 TOK_DECL_OPEN
TOK_DECL_OPEN
public static final int TOK_DECL_OPEN
<!NAME in the prolog.
 TOK_DECL_CLOSE
TOK_DECL_CLOSE
public static final int TOK_DECL_CLOSE
> in the prolog.
 TOK_NAME
TOK_NAME
public static final int TOK_NAME
 TOK_NMTOKEN
TOK_NMTOKEN
public static final int TOK_NMTOKEN
 TOK_POUND_NAME
TOK_POUND_NAME
public static final int TOK_POUND_NAME
#NAME in the prolog.
 TOK_OR
TOK_OR
public static final int TOK_OR
| in the prolog.
 TOK_PERCENT
TOK_PERCENT
public static final int TOK_PERCENT
% in the prolog that does not start
 a parameter entity reference.
 This can occur in an entity declaration.
 TOK_OPEN_PAREN
TOK_OPEN_PAREN
public static final int TOK_OPEN_PAREN
( in the prolog.
 TOK_CLOSE_PAREN
TOK_CLOSE_PAREN
public static final int TOK_CLOSE_PAREN
) in the prolog that is not
 followed immediately by any of
  *, + or ?.
 TOK_OPEN_BRACKET
TOK_OPEN_BRACKET
public static final int TOK_OPEN_BRACKET
[ in the prolog.
 TOK_CLOSE_BRACKET
TOK_CLOSE_BRACKET
public static final int TOK_CLOSE_BRACKET
] in the prolog.
 TOK_LITERAL
TOK_LITERAL
public static final int TOK_LITERAL
 TOK_NAME_QUESTION
TOK_NAME_QUESTION
public static final int TOK_NAME_QUESTION
?.
 TOK_NAME_ASTERISK
TOK_NAME_ASTERISK
public static final int TOK_NAME_ASTERISK
*.
 TOK_NAME_PLUS
TOK_NAME_PLUS
public static final int TOK_NAME_PLUS
+.
 TOK_COND_SECT_OPEN
TOK_COND_SECT_OPEN
public static final int TOK_COND_SECT_OPEN
< TOK_COND_SECT_CLOSE
TOK_COND_SECT_CLOSE
public static final int TOK_COND_SECT_CLOSE
]]> in the prolog.
 TOK_CLOSE_PAREN_QUESTION
TOK_CLOSE_PAREN_QUESTION
public static final int TOK_CLOSE_PAREN_QUESTION
)? in the prolog.
 TOK_CLOSE_PAREN_ASTERISK
TOK_CLOSE_PAREN_ASTERISK
public static final int TOK_CLOSE_PAREN_ASTERISK
)* in the prolog.
 TOK_CLOSE_PAREN_PLUS
TOK_CLOSE_PAREN_PLUS
public static final int TOK_CLOSE_PAREN_PLUS
)+ in the prolog.
 TOK_COMMA
TOK_COMMA
public static final int TOK_COMMA
, in the prolog.
 
 convert
convert
 public abstract int convert(byte sourceBuf[],
                             int sourceStart,
                             int sourceEnd,
                             char targetBuf[],
                             int targetStart)
sourceBuf between sourceStart
 and sourceEnd are converted to characters and stored
 in targetBuf starting at targetStart.
 (targetBuf.length - targetStart) * getMinBytesPerChar()
 must be at greater than or equal to
 sourceEnd - sourceStart.
 If getFixedBytesPerChar returns a value greater than 0,
 then the return value will be equal to
 (sourceEnd - sourceStart)/getFixedBytesPerChar().
targetBuf
     getFixedBytesPerChar
getFixedBytesPerChar
public abstract int getFixedBytesPerChar()
char,
 or zero if different chars are represented by different
 numbers of bytes.  The value returned will 0, 1, 2, or 4.
 movePosition
movePosition
 public abstract void movePosition(byte buf[],
                                   int off,
                                   int end,
                                   Position pos)
pos gives the position of the byte at index
 off in buf.
 On exit, it pos will give the position of the byte at index
 end, which must be greater than or equal to off.
 The bytes between off and end must encode
 one or more complete characters.
 A carriage return followed by a line feed will be treated as a single
 line delimiter provided that they are given to movePosition
 together.
 tokenizeCdataSection
tokenizeCdataSection
 public final int tokenizeCdataSection(byte buf[],
                                       int off,
                                       int end,
                                       Token token) throws EmptyTokenException, PartialTokenException, InvalidTokenException, ExtensibleTokenException
TOK_DATA_CHARS
 TOK_DATA_NEWLINE
 TOK_CDATA_SECT_CLOSE
 
 Information about the token is stored in token.
 
 After TOK_CDATA_SECT_CLOSE is returned, the application
 should use tokenizeContent.
 tokenizeContent
tokenizeContent
 public final int tokenizeContent(byte buf[],
                                  int off,
                                  int end,
                                  ContentToken token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_START_TAG_NO_ATTS
 TOK_START_TAG_WITH_ATTS
 TOK_EMPTY_ELEMENT_NO_ATTS
 TOK_EMPTY_ELEMENT_WITH_ATTS
 TOK_END_TAG
 TOK_DATA_CHARS
 TOK_DATA_NEWLINE
 TOK_CDATA_SECT_OPEN
 TOK_ENTITY_REF
 TOK_MAGIC_ENTITY_REF
 TOK_CHAR_REF
 TOK_CHAR_PAIR_REF
 TOK_PI
 TOK_XML_DECL
 TOK_COMMENT
 
 Information about the token is stored in token.
 
 When TOK_CDATA_SECT_OPEN is returned,
 tokenizeCdataSection should be called until
 it returns TOK_CDATA_SECT.
 getInitialEncoding
getInitialEncoding
 public static final Encoding getInitialEncoding(byte buf[],
                                                 int off,
                                                 int end,
                                                 Token token)
buf of the first byte of the entity
    buf following the last available
 byte of the entity; end - off must be greater than or equal
 to 4 unless the entity has fewer that 4 bytes, in which case it must
 be equal to the length of the entity
    token.getTokenEnd()
 will return off + 2, otherwise it will return
 off
     getEncoding
getEncoding
public final Encoding getEncoding(String name)
Encoding corresponding to
 the specified IANA character set name.
 Returns this Encoding if the name is null.
 Returns null if the specified encoding is not supported.
 Note that there are two distinct Encoding objects
 associated with the name UTF-16, one for
 each possible byte order; if this Encoding
 is UTF-16 with little-endian byte ordering, then
 getEncoding("UTF-16") will return this,
 otherwise it will return an Encoding for
 UTF-16 with big-endian byte ordering.
 getSingleByteEncoding
getSingleByteEncoding
public final Encoding getSingleByteEncoding(String map)
Encoding for entities encoded with
 a single-byte encoding (an encoding in which each byte represents
 exactly one character).
map.charAt(b)
 specifies the character encoded by byte b; bytes that do
 not represent any character should be mapped to ?
   getInternalEncoding
getInternalEncoding
public static final Encoding getInternalEncoding()
Encoding object for use with internal entities.
 This is a UTF-16 big endian encoding, except that newlines
 are assumed to have been normalized into line feed,
 so carriage return is treated like a space.
 tokenizeProlog
tokenizeProlog
 public final int tokenizeProlog(byte buf[],
                                 int off,
                                 int end,
                                 Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException, EndOfPrologException
TOK_PI
 TOK_XML_DECL
 TOK_COMMENT
 TOK_PARAM_ENTITY_REF
 TOK_PROLOG_S
 TOK_DECL_OPEN
 TOK_DECL_CLOSE
 TOK_NAME
 TOK_NMTOKEN
 TOK_POUND_NAME
 TOK_OR
 TOK_PERCENT
 TOK_OPEN_PAREN
 TOK_CLOSE_PAREN
 TOK_OPEN_BRACKET
 TOK_CLOSE_BRACKET
 TOK_LITERAL
 TOK_NAME_QUESTION
 TOK_NAME_ASTERISK
 TOK_NAME_PLUS
 TOK_COND_SECT_OPEN
 TOK_COND_SECT_CLOSE
 TOK_CLOSE_PAREN_QUESTION
 TOK_CLOSE_PAREN_ASTERISK
 TOK_CLOSE_PAREN_PLUS
 TOK_COMMA
 
tokenizeContent should be used on the remainder
 of the entity
     tokenizeAttributeValue
tokenizeAttributeValue
 public final int tokenizeAttributeValue(byte buf[],
                                         int off,
                                         int end,
                                         Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_DATA_CHARS
 TOK_DATA_NEWLINE
 TOK_ATTRIBUTE_VALUE_S
 TOK_MAGIC_ENTITY_REF
 TOK_ENTITY_REF
 TOK_CHAR_REF
 TOK_CHAR_PAIR_REF
 
 tokenizeEntityValue
tokenizeEntityValue
 public final int tokenizeEntityValue(byte buf[],
                                      int off,
                                      int end,
                                      Token token) throws PartialTokenException, InvalidTokenException, EmptyTokenException, ExtensibleTokenException
TOK_DATA_CHARS
 TOK_DATA_NEWLINE
 TOK_PARAM_ENTITY_REF
 TOK_MAGIC_ENTITY_REF
 TOK_ENTITY_REF
 TOK_CHAR_REF
 TOK_CHAR_PAIR_REF
 
 skipIgnoreSect
skipIgnoreSect
 public final int skipIgnoreSect(byte buf[],
                                 int off,
                                 int end) throws PartialTokenException, InvalidTokenException
<![ IGNORE [.
]]>
     getPublicId
getPublicId
 public final String getPublicId(byte buf[],
                                 int off,
                                 int end) throws InvalidTokenException
 matchesXMLString
matchesXMLString
 public final boolean matchesXMLString(byte buf[],
                                       int off,
                                       int end,
                                       String str)
 skipS
skipS
 public final int skipS(byte buf[],
                        int off,
                        int end)
end if there is the subarray is all whitespace
   getMinBytesPerChar
getMinBytesPerChar
public final int getMinBytesPerChar()
All Packages Class Hierarchy This Package Previous Next Index