Class XMLTokenizer
- Direct Known Subclasses:
DTDTokenizer
You can use it to parse XML yourself or use the XMLParser to let it parse XML into a Document.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Types of tokens the tokenizer can return -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate CharValidator
The character validator for this tokenizer.private EntityResolver
The entity resolver to use to expand and verify entities.protected boolean
true if we're currently inside of a start tagprotected int
The current position in the sourceprotected final XMLSource
private boolean
Should the tokenizer return entities or treat them as text? Default is true. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected Token
All tokens are created here.protected void
expect
(char expected) Check that the next character isexpected
and skip itint
Get the current parsing position (for error handling, for example).boolean
protected String
lookAheadForErrorMessage
(String conditionalPrefix, int pos, int len) next()
Fetch the next token from the source.protected char
protected void
protected void
parseAttribute
(Token token) Read the attribute of an element.protected void
parseBeginElement
(Token token) Read the name of an element.protected void
parseBeginSomething
(Token token) Read one of "<tag", "<?pi", "<!--", "<![CDATA[" or a end tag.protected void
parseCData
(Token token) Parse a CDATA element.protected void
parseComment
(Token token) Read a comment.protected void
parseDocType
(Token token) Parse a doctype declarationprotected void
parseEndElement
(Token token) Read an end tag.protected void
parseEntity
(Token token) protected void
parseExcalamation
(Token token) Parse "<!--" or "<![CDATA["protected void
Read an XML nameprotected void
parseProcessingInstruction
(Token token) Read a processing instruction.protected void
Read a piece of text.setCharValidator
(CharValidator charValidator) setEntityResolver
(EntityResolver resolver) void
setOffset
(int offset) Set the current parsing position.setTreatEntitiesAsText
(boolean treatEntitiesAsText) protected void
skipChar
(char c) Advance one or two positions, depending on whether the current character if the high part of a surrogate pair.protected void
Advance the current position past any whitespace in the inputprotected void
verifyEntity
(int start, int end) Verify an entity.
-
Field Details
-
source
-
pos
protected int posThe current position in the source -
inStartElement
protected boolean inStartElementtrue if we're currently inside of a start tag -
treatEntitiesAsText
private boolean treatEntitiesAsTextShould the tokenizer return entities or treat them as text? Default is true. -
charValidator
The character validator for this tokenizer. -
entityResolver
The entity resolver to use to expand and verify entities.
-
-
Constructor Details
-
XMLTokenizer
-
-
Method Details
-
setTreatEntitiesAsText
-
isTreatEntitiesAsText
public boolean isTreatEntitiesAsText() -
getCharValidator
-
setCharValidator
-
getEntityResolver
-
setEntityResolver
-
next
Fetch the next token from the source. Returnsnull
if there are no more tokens in the input.- Returns:
- The next token or
null
at EOF
-
createToken
All tokens are created here.Use this method to create custom tokens with additional information.
- Returns:
- a new, pre-initialized token
-
getSource
-
getOffset
public int getOffset()Get the current parsing position (for error handling, for example).This value is not very accurate because the tokenizer might be anywhere in the stream.
-
setOffset
public void setOffset(int offset) Set the current parsing position. You can use this to restart parsing after an error or to jump around in the input. -
parseBeginSomething
Read one of "<tag", "<?pi", "<!--", "<![CDATA[" or a end tag. -
parseBeginElement
Read the name of an element.The resulting token will contain the '<' plus any whitespace between it and the name plus the name itself but no whitespace after the name.
-
parseEndElement
Read an end tag.The resulting token will contain the '</' and '>' plus the name plus any whitespace between those three.
-
parseExcalamation
Parse "<!--" or "<![CDATA[" -
parseDocType
Parse a doctype declarationThe resulting token will contain "invalid input: '<'!DOCTYPE"
-
parseCData
Parse a CDATA element.The resulting token will contain the "<![CDATA[" plus the terminating "]]>".
-
parseComment
Read a comment.The resulting token will contain the "<!--" plus the terminating "-->".
-
parseProcessingInstruction
Read a processing instruction.The resulting token will contain the "<?" plus the terminating "?>".
-
parseAttribute
Read the attribute of an element.The resulting token will contain the name, "=" plus the quotes and the value.
-
parseName
Read an XML name -
parseText
Read a piece of text.The resulting token will contain the text as is with all the entity and numeric character references.
-
skipChar
protected void skipChar(char c) Advance one or two positions, depending on whether the current character if the high part of a surrogate pair. -
verifyEntity
protected void verifyEntity(int start, int end) Verify an entity. If no entityResolver is installed, this does nothing. -
parseEntity
-
nextChars
-
nextChar
-
expect
protected void expect(char expected) Check that the next character isexpected
and skip it -
lookAheadForErrorMessage
-
skipWhiteSpace
protected void skipWhiteSpace()Advance the current position past any whitespace in the input
-