Class XmlScanner

java.lang.Object
com.fasterxml.aalto.in.XmlScanner
All Implemented Interfaces:
XmlConsts, NamespaceContext, XMLStreamConstants
Direct Known Subclasses:
ByteBasedScanner, ReaderScanner

public abstract class XmlScanner extends Object implements XmlConsts, XMLStreamConstants, NamespaceContext
This is the abstract base class for all scanner implementations, defining operations the actual parser requires from the low-level scanners. Scanners are encoding and input type (byte, char / stream, block) specific, so there are many implementations.
  • Field Details

    • CDATA_STR

      protected final String CDATA_STR
      String that identifies CDATA section (after "<![" prefix)
      See Also:
    • TOKEN_EOI

      public static final int TOKEN_EOI
      This token type signifies end-of-input, in cases where it can be returned. In other cases, an exception may be thrown.
      See Also:
    • MAX_UNICODE_CHAR

      protected static final int MAX_UNICODE_CHAR
      This constant defines the highest Unicode character allowed in XML content.
      See Also:
    • INT_NULL

      protected static final int INT_NULL
      See Also:
    • INT_CR

      protected static final int INT_CR
      See Also:
    • INT_LF

      protected static final int INT_LF
      See Also:
    • INT_TAB

      protected static final int INT_TAB
      See Also:
    • INT_SPACE

      protected static final int INT_SPACE
      See Also:
    • INT_HYPHEN

      protected static final int INT_HYPHEN
      See Also:
    • INT_QMARK

      protected static final int INT_QMARK
      See Also:
    • INT_AMP

      protected static final int INT_AMP
      See Also:
    • INT_LT

      protected static final int INT_LT
      See Also:
    • INT_GT

      protected static final int INT_GT
      See Also:
    • INT_QUOTE

      protected static final int INT_QUOTE
      See Also:
    • INT_APOS

      protected static final int INT_APOS
      See Also:
    • INT_EXCL

      protected static final int INT_EXCL
      See Also:
    • INT_COLON

      protected static final int INT_COLON
      See Also:
    • INT_LBRACKET

      protected static final int INT_LBRACKET
      See Also:
    • INT_RBRACKET

      protected static final int INT_RBRACKET
      See Also:
    • INT_SLASH

      protected static final int INT_SLASH
      See Also:
    • INT_EQ

      protected static final int INT_EQ
      See Also:
    • INT_A

      protected static final int INT_A
      See Also:
    • INT_F

      protected static final int INT_F
      See Also:
    • INT_a

      protected static final int INT_a
      See Also:
    • INT_f

      protected static final int INT_f
      See Also:
    • INT_z

      protected static final int INT_z
      See Also:
    • INT_0

      protected static final int INT_0
      See Also:
    • INT_9

      protected static final int INT_9
      See Also:
    • BIND_MISSES_TO_ACTIVATE_CACHE

      private static final int BIND_MISSES_TO_ACTIVATE_CACHE
      Let's activate cache quite soon, no need to wait for hundreds of misses; just try to avoid cache construction if all we get is soap envelope element or such.
      See Also:
    • BIND_CACHE_SIZE

      private static final int BIND_CACHE_SIZE
      Size of the bind cache can be reasonably small, and should still get high enough hit rate
      See Also:
    • BIND_CACHE_MASK

      private static final int BIND_CACHE_MASK
      See Also:
    • _config

      protected final ReaderConfig _config
    • _xml11

      protected final boolean _xml11
      Whether validity checks (wrt. name and text characters) and normalization (linefeeds) is to be done using xml 1.1 rules, or basic xml 1.0 rules. Default is 1.0.
    • _cfgCoalescing

      protected final boolean _cfgCoalescing
    • _cfgLazyParsing

      protected boolean _cfgLazyParsing
    • _currToken

      protected int _currToken
    • _tokenIncomplete

      protected boolean _tokenIncomplete
    • _depth

      protected int _depth
      Number of START_ELEMENT events returned for which no END_ELEMENT has been returned; including current event.
    • _textBuilder

      protected final TextBuilder _textBuilder
      Textual content of the current event
    • _entityPending

      protected boolean _entityPending
      Flag set to indicate that an entity is pending
    • _nameBuffer

      protected char[] _nameBuffer
      Similarly, need a char buffer for actual String construction (in future, could perhaps use StringBuilder?). It is used for holding things like names (element, attribute), and attribute values.
    • _tokenName

      protected PName _tokenName
      Current name associated with the token, if any. Name of the current element, target of processing instruction, or name of an unexpanded entity.
    • _isEmptyTag

      protected boolean _isEmptyTag
      Flag that is used if the current state is START_ELEMENT or END_ELEMENT, to indicate if the underlying physical tag is a so-called empty tag (one ending with "/>")
    • _currElem

      protected ElementScope _currElem
      Information about the current element on the stack
    • _publicId

      protected String _publicId
      Public id of the current event (DTD), if any.
    • _systemId

      protected String _systemId
      System id of the current event (DTD), if any.
    • _lastNsDecl

      protected NsDeclaration _lastNsDecl
      Pointer to the last namespace declaration encountered. Because of backwards linking, it also serves as the head of the linked list of all active namespace declarations starting from the most recent one.
    • _currNsCount

      protected int _currNsCount
      This is a temporary state variable, valid during START_ELEMENT event. For those events, contains number of namespace declarations available. For END_ELEMENT, this count is computed on the fly.
    • _defaultNs

      protected NsBinding _defaultNs
      Default namespace binding is a per-document singleton, like explicit bindings, and used for elements (never for attributes).
    • _nsBindings

      protected NsBinding[] _nsBindings
      Array containing all prefix bindings needed within the current document, so far (if any). These bindings are not in a particular order, and they specifically do NOT represent actual namespace declarations parsed from xml content.
    • _nsBindingCount

      protected int _nsBindingCount
    • _nsBindingCache

      protected PName[] _nsBindingCache
      Although unbound pname instances can be easily and safely reused, bound ones are per-document. However, it makes sense to try to reuse them too; at least using a minimal static cache, activate only after certain number of cache misses (to avoid overhead for tiny documents, or documents with few or no namespace prefixes).
    • _nsBindMisses

      protected int _nsBindMisses
    • _lastNsContext

      protected FixedNsContext _lastNsContext
      Last returned NamespaceContext, created for a call to getNonTransientNamespaceContext(), iff this would still be a valid context.
    • _attrCollector

      protected final AttributeCollector _attrCollector
    • _attrCount

      protected int _attrCount
    • _pastBytesOrChars

      protected long _pastBytesOrChars
      Number of bytes that were read and processed before the contents of the current buffer; used for calculating absolute offsets.
    • _currRow

      protected int _currRow
      The row on which the character to read next is on. Note that it is 0-based, so API will generally add one to it before returning the value
    • _rowStartOffset

      protected int _rowStartOffset
      Offset used to calculate the column value given current input buffer pointer. May be negative, if the first character of the row was contained within an earlier buffer.
    • _startRawOffset

      protected long _startRawOffset
      Offset (in chars or bytes) at start of current token
    • _startRow

      protected long _startRow
      Current row at start of current (last returned) token
    • _startColumn

      protected long _startColumn
      Current column at start of current (last returned) token
  • Constructor Details

  • Method Details