Class HTMLConfiguration

java.lang.Object
org.apache.xerces.util.ParserConfigurationSettings
org.cyberneko.html.HTMLConfiguration
All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponentManager, org.apache.xerces.xni.parser.XMLParserConfiguration, org.apache.xerces.xni.parser.XMLPullParserConfiguration

public class HTMLConfiguration extends org.apache.xerces.util.ParserConfigurationSettings implements org.apache.xerces.xni.parser.XMLPullParserConfiguration
An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

  • http://cyberneko.org/html/features/augmentations
  • http://cyberneko.org/html/features/report-errors
  • http://cyberneko.org/html/features/report-errors/simple
  • http://cyberneko.org/html/features/balance-tags
  • and
  • the features supported by the scanner and tag balancer components.

This configuration recognizes the following properties:

  • http://cyberneko.org/html/properties/names/elems
  • http://cyberneko.org/html/properties/names/attrs
  • http://cyberneko.org/html/properties/filters
  • http://cyberneko.org/html/properties/error-reporter
  • and
  • the properties supported by the scanner and tag balancer.

For complete usage information, refer to the documentation.

Version:
$Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
Author:
Andy Clark
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    protected class 
    Defines an error reporter for reporting HTML errors.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final String
    Include infoset augmentations.
    protected static final String
    Balance tags.
    protected static final String
    Error domain.
    protected static final String
    Error reporter.
    protected boolean
    Stream opened by parser.
    protected org.apache.xerces.xni.XMLDocumentHandler
    Document handler.
    protected final HTMLScanner
    Document scanner.
    protected org.apache.xerces.xni.XMLDTDContentModelHandler
    DTD content model handler.
    protected org.apache.xerces.xni.XMLDTDHandler
    DTD handler.
    protected org.apache.xerces.xni.parser.XMLEntityResolver
    Entity resolver.
    protected org.apache.xerces.xni.parser.XMLErrorHandler
    Error handler.
    protected final HTMLErrorReporter
    Error reporter.
    protected final Vector
    Components.
    protected static final String
    Pipeline filters.
    protected Locale
    Locale.
    protected final NamespaceBinder
    Namespace binder.
    protected final HTMLTagBalancer
    HTML tag balancer.
    protected static final String
    Modify HTML attribute names: { "upper", "lower", "default" }.
    protected static final String
    Modify HTML element names: { "upper", "lower", "default" }.
    protected static final String
    Namespaces.
    protected static final String
    Report errors.
    protected static final String
    Simple report format.
    protected static boolean
    Parser version is Xerces 2.0.0.
    protected static boolean
    Parser version is Xerces 2.0.1.
    protected static boolean
    Parser version is XML4J 4.0.x.

    Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings

    fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties, PARSER_SETTINGS
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    Adds a component.
    void
    If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
    protected HTMLScanner
     
    void
    evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
    EXPERIMENTAL: may change in next release
    Immediately evaluates an input source and add the new content (e.g.
    org.apache.xerces.xni.XMLDocumentHandler
    Returns the document handler.
    org.apache.xerces.xni.XMLDTDContentModelHandler
    Returns the DTD content model handler.
    org.apache.xerces.xni.XMLDTDHandler
    Returns the DTD handler.
    org.apache.xerces.xni.parser.XMLEntityResolver
    Returns the entity resolver.
    org.apache.xerces.xni.parser.XMLErrorHandler
    Returns the error handler.
    Returns the locale.
    boolean
    parse(boolean complete)
    Parses the document in a pull parsing fashion.
    void
    parse(org.apache.xerces.xni.parser.XMLInputSource source)
    Parses a document.
    void
    pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
    Pushes an input source onto the current entity stack.
    protected void
    Resets the parser configuration.
    void
    setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
    Sets the document handler.
    void
    setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
    Sets the DTD content model handler.
    void
    setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
    Sets the DTD handler.
    void
    setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
    Sets the entity resolver.
    void
    setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
    Sets the error handler.
    void
    setFeature(String featureId, boolean state)
    Sets a feature.
    void
    setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
    Sets the input source for the document to parse.
    void
    setLocale(Locale locale)
    Sets the locale.
    void
    setProperty(String propertyId, Object value)
    Sets a property.

    Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings

    addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration

    addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
  • Field Details

    • NAMESPACES

      protected static final String NAMESPACES
      Namespaces.
      See Also:
    • AUGMENTATIONS

      protected static final String AUGMENTATIONS
      Include infoset augmentations.
      See Also:
    • REPORT_ERRORS

      protected static final String REPORT_ERRORS
      Report errors.
      See Also:
    • SIMPLE_ERROR_FORMAT

      protected static final String SIMPLE_ERROR_FORMAT
      Simple report format.
      See Also:
    • BALANCE_TAGS

      protected static final String BALANCE_TAGS
      Balance tags.
      See Also:
    • NAMES_ELEMS

      protected static final String NAMES_ELEMS
      Modify HTML element names: { "upper", "lower", "default" }.
      See Also:
    • NAMES_ATTRS

      protected static final String NAMES_ATTRS
      Modify HTML attribute names: { "upper", "lower", "default" }.
      See Also:
    • FILTERS

      protected static final String FILTERS
      Pipeline filters.
      See Also:
    • ERROR_REPORTER

      protected static final String ERROR_REPORTER
      Error reporter.
      See Also:
    • ERROR_DOMAIN

      protected static final String ERROR_DOMAIN
      Error domain.
      See Also:
    • fDocumentHandler

      protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
      Document handler.
    • fDTDHandler

      protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
      DTD handler.
    • fDTDContentModelHandler

      protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
      DTD content model handler.
    • fErrorHandler

      protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
      Error handler.
    • fEntityResolver

      protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
      Entity resolver.
    • fLocale

      protected Locale fLocale
      Locale.
    • fCloseStream

      protected boolean fCloseStream
      Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
    • fHTMLComponents

      protected final Vector fHTMLComponents
      Components.
    • fDocumentScanner

      protected final HTMLScanner fDocumentScanner
      Document scanner.
    • fTagBalancer

      protected final HTMLTagBalancer fTagBalancer
      HTML tag balancer.
    • fNamespaceBinder

      protected final NamespaceBinder fNamespaceBinder
      Namespace binder.
    • fErrorReporter

      protected final HTMLErrorReporter fErrorReporter
      Error reporter.
    • XERCES_2_0_0

      protected static boolean XERCES_2_0_0
      Parser version is Xerces 2.0.0.
    • XERCES_2_0_1

      protected static boolean XERCES_2_0_1
      Parser version is Xerces 2.0.1.
    • XML4J_4_0_x

      protected static boolean XML4J_4_0_x
      Parser version is XML4J 4.0.x.
  • Constructor Details

    • HTMLConfiguration

      public HTMLConfiguration()
      Default constructor.
  • Method Details

    • createDocumentScanner

      protected HTMLScanner createDocumentScanner()
    • pushInputSource

      public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
      Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

      Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

      Parameters:
      inputSource - The new input source to start scanning.
      See Also:
    • evaluateInputSource

      public void evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
      EXPERIMENTAL: may change in next release
      Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).
      Parameters:
      inputSource - The new input source to start scanning.
      See Also:
    • setFeature

      public void setFeature(String featureId, boolean state) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Sets a feature.
      Specified by:
      setFeature in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      Overrides:
      setFeature in class org.apache.xerces.util.ParserConfigurationSettings
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException
    • setProperty

      public void setProperty(String propertyId, Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException
      Sets a property.
      Specified by:
      setProperty in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      Overrides:
      setProperty in class org.apache.xerces.util.ParserConfigurationSettings
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException
    • setDocumentHandler

      public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
      Sets the document handler.
      Specified by:
      setDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getDocumentHandler

      public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
      Returns the document handler.
      Specified by:
      getDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • setDTDHandler

      public void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
      Sets the DTD handler.
      Specified by:
      setDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getDTDHandler

      public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
      Returns the DTD handler.
      Specified by:
      getDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • setDTDContentModelHandler

      public void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
      Sets the DTD content model handler.
      Specified by:
      setDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getDTDContentModelHandler

      public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
      Returns the DTD content model handler.
      Specified by:
      getDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • setErrorHandler

      public void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
      Sets the error handler.
      Specified by:
      setErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getErrorHandler

      public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
      Returns the error handler.
      Specified by:
      getErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • setEntityResolver

      public void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
      Sets the entity resolver.
      Specified by:
      setEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getEntityResolver

      public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
      Returns the entity resolver.
      Specified by:
      getEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • setLocale

      public void setLocale(Locale locale)
      Sets the locale.
      Specified by:
      setLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • getLocale

      public Locale getLocale()
      Returns the locale.
      Specified by:
      getLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration
    • parse

      public void parse(org.apache.xerces.xni.parser.XMLInputSource source) throws org.apache.xerces.xni.XNIException, IOException
      Parses a document.
      Specified by:
      parse in interface org.apache.xerces.xni.parser.XMLParserConfiguration
      Throws:
      org.apache.xerces.xni.XNIException
      IOException
    • setInputSource

      public void setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) throws org.apache.xerces.xni.parser.XMLConfigurationException, IOException
      Sets the input source for the document to parse.
      Specified by:
      setInputSource in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
      Parameters:
      inputSource - The document's input source.
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException - Thrown if there is a configuration error when initializing the parser.
      IOException - Thrown on I/O error.
      See Also:
    • parse

      public boolean parse(boolean complete) throws org.apache.xerces.xni.XNIException, IOException
      Parses the document in a pull parsing fashion.
      Specified by:
      parse in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
      Parameters:
      complete - True if the pull parser should parse the remaining document completely.
      Returns:
      True if there is more document to parse.
      Throws:
      org.apache.xerces.xni.XNIException - Any XNI exception, possibly wrapping another exception.
      IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
      See Also:
    • cleanup

      public void cleanup()
      If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.
      Specified by:
      cleanup in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
    • addComponent

      protected void addComponent(HTMLComponent component)
      Adds a component.
    • reset

      protected void reset() throws org.apache.xerces.xni.parser.XMLConfigurationException
      Resets the parser configuration.
      Throws:
      org.apache.xerces.xni.parser.XMLConfigurationException