org.cyberneko.html
Class HTMLConfiguration

java.lang.Object
  extended byorg.apache.xerces.util.ParserConfigurationSettings
      extended byorg.cyberneko.html.HTMLConfiguration
All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponentManager, org.apache.xerces.xni.parser.XMLParserConfiguration, org.apache.xerces.xni.parser.XMLPullParserConfiguration

public class HTMLConfiguration
extends org.apache.xerces.util.ParserConfigurationSettings
implements org.apache.xerces.xni.parser.XMLPullParserConfiguration

An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.

This configuration recognizes the following features:

This configuration recognizes the following properties:

For complete usage information, refer to the documentation.

Version:
$Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
Author:
Andy Clark
See Also:
HTMLScanner, HTMLTagBalancer, HTMLErrorReporter

Nested Class Summary
protected  class HTMLConfiguration.ErrorReporter
          Defines an error reporter for reporting HTML errors.
 
Field Summary
protected static java.lang.String AUGMENTATIONS
          Include infoset augmentations.
protected static java.lang.String BALANCE_TAGS
          Balance tags.
protected static java.lang.String ERROR_DOMAIN
          Error domain.
protected static java.lang.String ERROR_REPORTER
          Error reporter.
protected  boolean fCloseStream
          Stream opened by parser.
protected  org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
          Document handler.
protected  HTMLScanner fDocumentScanner
          Document scanner.
protected  org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
          DTD content model handler.
protected  org.apache.xerces.xni.XMLDTDHandler fDTDHandler
          DTD handler.
protected  org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
          Entity resolver.
protected  org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
          Error handler.
protected  HTMLErrorReporter fErrorReporter
          Error reporter.
protected  java.util.Vector fHTMLComponents
          Components.
protected static java.lang.String FILTERS
          Pipeline filters.
protected  java.util.Locale fLocale
          Locale.
protected  NamespaceBinder fNamespaceBinder
          Namespace binder.
protected  HTMLTagBalancer fTagBalancer
          HTML tag balancer.
protected static java.lang.String NAMES_ATTRS
          Modify HTML attribute names: { "upper", "lower", "default" }.
protected static java.lang.String NAMES_ELEMS
          Modify HTML element names: { "upper", "lower", "default" }.
protected static java.lang.String NAMESPACES
          Namespaces.
protected static java.lang.String REPORT_ERRORS
          Report errors.
protected static java.lang.String SIMPLE_ERROR_FORMAT
          Simple report format.
protected static boolean XERCES_2_0_0
          Parser version is Xerces 2.0.0.
protected static boolean XERCES_2_0_1
          Parser version is Xerces 2.0.1.
protected static boolean XML4J_4_0_x
          Parser version is XML4J 4.0.x.
 
Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings
fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties, PARSER_SETTINGS
 
Constructor Summary
HTMLConfiguration()
          Default constructor.
 
Method Summary
protected  void addComponent(HTMLComponent component)
          Adds a component.
 void cleanup()
          If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.
 org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
          Returns the document handler.
 org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
          Returns the DTD content model handler.
 org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
          Returns the DTD handler.
 org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
          Returns the entity resolver.
 org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
          Returns the error handler.
 java.util.Locale getLocale()
          Returns the locale.
 boolean parse(boolean complete)
          Parses the document in a pull parsing fashion.
 void parse(org.apache.xerces.xni.parser.XMLInputSource source)
          Parses a document.
 void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
          Pushes an input source onto the current entity stack.
protected  void reset()
          Resets the parser configuration.
 void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
          Sets the document handler.
 void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
          Sets the DTD content model handler.
 void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
          Sets the DTD handler.
 void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
          Sets the entity resolver.
 void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
          Sets the error handler.
 void setFeature(java.lang.String featureId, boolean state)
          Sets a feature.
 void setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
          Sets the input source for the document to parse.
 void setLocale(java.util.Locale locale)
          Sets the locale.
 void setProperty(java.lang.String propertyId, java.lang.Object value)
          Sets a property.
 
Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration
addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
 

Field Detail

NAMESPACES

protected static final java.lang.String NAMESPACES
Namespaces.

See Also:
Constant Field Values

AUGMENTATIONS

protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.

See Also:
Constant Field Values

REPORT_ERRORS

protected static final java.lang.String REPORT_ERRORS
Report errors.

See Also:
Constant Field Values

SIMPLE_ERROR_FORMAT

protected static final java.lang.String SIMPLE_ERROR_FORMAT
Simple report format.

See Also:
Constant Field Values

BALANCE_TAGS

protected static final java.lang.String BALANCE_TAGS
Balance tags.

See Also:
Constant Field Values

NAMES_ELEMS

protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

NAMES_ATTRS

protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.

See Also:
Constant Field Values

FILTERS

protected static final java.lang.String FILTERS
Pipeline filters.

See Also:
Constant Field Values

ERROR_REPORTER

protected static final java.lang.String ERROR_REPORTER
Error reporter.

See Also:
Constant Field Values

ERROR_DOMAIN

protected static final java.lang.String ERROR_DOMAIN
Error domain.

See Also:
Constant Field Values

fDocumentHandler

protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
Document handler.


fDTDHandler

protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
DTD handler.


fDTDContentModelHandler

protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.


fErrorHandler

protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
Error handler.


fEntityResolver

protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
Entity resolver.


fLocale

protected java.util.Locale fLocale
Locale.


fCloseStream

protected boolean fCloseStream
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.


fHTMLComponents

protected java.util.Vector fHTMLComponents
Components.


fDocumentScanner

protected HTMLScanner fDocumentScanner
Document scanner.


fTagBalancer

protected HTMLTagBalancer fTagBalancer
HTML tag balancer.


fNamespaceBinder

protected NamespaceBinder fNamespaceBinder
Namespace binder.


fErrorReporter

protected HTMLErrorReporter fErrorReporter
Error reporter.


XERCES_2_0_0

protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.


XERCES_2_0_1

protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.


XML4J_4_0_x

protected static boolean XML4J_4_0_x
Parser version is XML4J 4.0.x.

Constructor Detail

HTMLConfiguration

public HTMLConfiguration()
Default constructor.

Method Detail

pushInputSource

public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.

Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.

Parameters:
inputSource - The new input source to start scanning.

setFeature

public void setFeature(java.lang.String featureId,
                       boolean state)
                throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a feature.

Specified by:
setFeature in interface org.apache.xerces.xni.parser.XMLParserConfiguration
Throws:
org.apache.xerces.xni.parser.XMLConfigurationException

setProperty

public void setProperty(java.lang.String propertyId,
                        java.lang.Object value)
                 throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a property.

Specified by:
setProperty in interface org.apache.xerces.xni.parser.XMLParserConfiguration
Throws:
org.apache.xerces.xni.parser.XMLConfigurationException

setDocumentHandler

public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
Sets the document handler.

Specified by:
setDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDocumentHandler

public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
Returns the document handler.

Specified by:
getDocumentHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setDTDHandler

public void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
Sets the DTD handler.

Specified by:
setDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDTDHandler

public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
Returns the DTD handler.

Specified by:
getDTDHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setDTDContentModelHandler

public void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
Sets the DTD content model handler.

Specified by:
setDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getDTDContentModelHandler

public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.

Specified by:
getDTDContentModelHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setErrorHandler

public void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
Sets the error handler.

Specified by:
setErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getErrorHandler

public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
Returns the error handler.

Specified by:
getErrorHandler in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setEntityResolver

public void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
Sets the entity resolver.

Specified by:
setEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getEntityResolver

public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
Returns the entity resolver.

Specified by:
getEntityResolver in interface org.apache.xerces.xni.parser.XMLParserConfiguration

setLocale

public void setLocale(java.util.Locale locale)
Sets the locale.

Specified by:
setLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration

getLocale

public java.util.Locale getLocale()
Returns the locale.

Specified by:
getLocale in interface org.apache.xerces.xni.parser.XMLParserConfiguration

parse

public void parse(org.apache.xerces.xni.parser.XMLInputSource source)
           throws org.apache.xerces.xni.XNIException,
                  java.io.IOException
Parses a document.

Specified by:
parse in interface org.apache.xerces.xni.parser.XMLParserConfiguration
Throws:
org.apache.xerces.xni.XNIException
java.io.IOException

setInputSource

public void setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
                    throws org.apache.xerces.xni.parser.XMLConfigurationException,
                           java.io.IOException
Sets the input source for the document to parse.

Specified by:
setInputSource in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
Parameters:
inputSource - The document's input source.
Throws:
org.apache.xerces.xni.parser.XMLConfigurationException - Thrown if there is a configuration error when initializing the parser.
java.io.IOException - Thrown on I/O error.
See Also:
parse(boolean)

parse

public boolean parse(boolean complete)
              throws org.apache.xerces.xni.XNIException,
                     java.io.IOException
Parses the document in a pull parsing fashion.

Specified by:
parse in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration
Parameters:
complete - True if the pull parser should parse the remaining document completely.
Returns:
True if there is more document to parse.
Throws:
org.apache.xerces.xni.XNIException - Any XNI exception, possibly wrapping another exception.
java.io.IOException - An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
See Also:
setInputSource(org.apache.xerces.xni.parser.XMLInputSource)

cleanup

public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.

Specified by:
cleanup in interface org.apache.xerces.xni.parser.XMLPullParserConfiguration

addComponent

protected void addComponent(HTMLComponent component)
Adds a component.


reset

protected void reset()
              throws org.apache.xerces.xni.parser.XMLConfigurationException
Resets the parser configuration.

Throws:
org.apache.xerces.xni.parser.XMLConfigurationException


(C) Copyright 2002-2005, Andy Clark. All rights reserved.