Class XMLTokenizer

java.lang.Object
de.pdark.decentxml.XMLTokenizer
Direct Known Subclasses:
DTDTokenizer

public class XMLTokenizer extends Object
This class allows to chop an XMLSource into tokens.

You can use it to parse XML yourself or use the XMLParser to let it parse XML into a Document.

See Also:
  • Field Details

    • source

      protected final XMLSource source
    • pos

      protected int pos
      The current position in the source
    • inStartElement

      protected boolean inStartElement
      true if we're currently inside of a start tag
    • treatEntitiesAsText

      private boolean treatEntitiesAsText
      Should the tokenizer return entities or treat them as text? Default is true.
    • charValidator

      private CharValidator charValidator
      The character validator for this tokenizer.
    • entityResolver

      private EntityResolver entityResolver
      The entity resolver to use to expand and verify entities.
  • Constructor Details

    • XMLTokenizer

      public XMLTokenizer(XMLSource source)
  • Method Details

    • setTreatEntitiesAsText

      public XMLTokenizer setTreatEntitiesAsText(boolean treatEntitiesAsText)
    • isTreatEntitiesAsText

      public boolean isTreatEntitiesAsText()
    • getCharValidator

      public CharValidator getCharValidator()
    • setCharValidator

      public XMLTokenizer setCharValidator(CharValidator charValidator)
    • getEntityResolver

      public EntityResolver getEntityResolver()
    • setEntityResolver

      public XMLTokenizer setEntityResolver(EntityResolver resolver)
    • next

      public Token next()
      Fetch the next token from the source. Returns null if there are no more tokens in the input.
      Returns:
      The next token or null at EOF
    • createToken

      protected Token createToken()
      All tokens are created here.

      Use this method to create custom tokens with additional information.

      Returns:
      a new, pre-initialized token
    • getSource

      public XMLSource getSource()
    • getOffset

      public int getOffset()
      Get the current parsing position (for error handling, for example).

      This value is not very accurate because the tokenizer might be anywhere in the stream.

    • setOffset

      public void setOffset(int offset)
      Set the current parsing position. You can use this to restart parsing after an error or to jump around in the input.
    • parseBeginSomething

      protected void parseBeginSomething(Token token)
      Read one of "<tag", "<?pi", "<!--", "<![CDATA[" or a end tag.
    • parseBeginElement

      protected void parseBeginElement(Token token)
      Read the name of an element.

      The resulting token will contain the '<' plus any whitespace between it and the name plus the name itself but no whitespace after the name.

    • parseEndElement

      protected void parseEndElement(Token token)
      Read an end tag.

      The resulting token will contain the '</' and '>' plus the name plus any whitespace between those three.

    • parseExcalamation

      protected void parseExcalamation(Token token)
      Parse "<!--" or "<![CDATA["
    • parseDocType

      protected void parseDocType(Token token)
      Parse a doctype declaration

      The resulting token will contain "invalid input: '<'!DOCTYPE"

    • parseCData

      protected void parseCData(Token token)
      Parse a CDATA element.

      The resulting token will contain the "<![CDATA[" plus the terminating "]]>".

    • parseComment

      protected void parseComment(Token token)
      Read a comment.

      The resulting token will contain the "<!--" plus the terminating "-->".

    • parseProcessingInstruction

      protected void parseProcessingInstruction(Token token)
      Read a processing instruction.

      The resulting token will contain the "<?" plus the terminating "?>".

    • parseAttribute

      protected void parseAttribute(Token token)
      Read the attribute of an element.

      The resulting token will contain the name, "=" plus the quotes and the value.

    • parseName

      protected void parseName(String objectName)
      Read an XML name
    • parseText

      protected void parseText(Token token)
      Read a piece of text.

      The resulting token will contain the text as is with all the entity and numeric character references.

    • skipChar

      protected void skipChar(char c)
      Advance one or two positions, depending on whether the current character if the high part of a surrogate pair.
    • verifyEntity

      protected void verifyEntity(int start, int end)
      Verify an entity. If no entityResolver is installed, this does nothing.
    • parseEntity

      protected void parseEntity(Token token)
    • nextChars

      protected void nextChars(String expected, int startPos, String errorMessage)
    • nextChar

      protected char nextChar(String errorMessage)
    • expect

      protected void expect(char expected)
      Check that the next character is expected and skip it
    • lookAheadForErrorMessage

      protected String lookAheadForErrorMessage(String conditionalPrefix, int pos, int len)
    • skipWhiteSpace

      protected void skipWhiteSpace()
      Advance the current position past any whitespace in the input