gnu.xml.pipeline
Class DomConsumer

java.lang.Object
  |
  +--gnu.xml.pipeline.DomConsumer
All Implemented Interfaces:
EventConsumer
Direct Known Subclasses:
Consumer

public class DomConsumer
extends java.lang.Object
implements EventConsumer

This consumer builds a DOM Document from its input, acting either as a pipeline terminus or as an intermediate buffer. When a document's worth of events has been delivered to this consumer, that document is read with a DomParser and sent to the next consumer. It is also available as a read-once property.

The DOM tree is constructed as faithfully as possible. There are some complications since a DOM should expose behaviors that can't be implemented without API backdoors into that DOM, and because some SAX parsers don't report all the information that DOM permits to be exposed. The general problem areas involve information from the Document Type Declaration (DTD). DOM only represents a limited subset, but has some behaviors that depend on much deeper knowledge of a document's DTD. You shouldn't have much to worry about unless you change handling of "extra" nodes from its default setting (which ignores them all); note if you use JAXP to populate your DOM trees, it wants to save "extra" nodes by default. Otherwise, your main worry will be if you use a SAX parser that doesn't flag ignorable whitespace unless it's validating (few don't).

The SAX2 events used as input must contain XML Names for elements and attributes, with original prefixes. In SAX2, this is optional unless the "namespace-prefixes" parser feature is set. Moreover, many application components won't provide completely correct structures anyway. Before you convert a DOM to an output document, you should plan to postprocess it to create or repair such namespace information. The NSFilter pipeline stage does such work.

Note: changes late in DOM L2 process made it impractical to attempt to create the DocumentType node in any implementation-neutral way, much less to populate it (L1 didn't support even creating such nodes). To create and populate such a node, subclass the inner DomConsumer.Handler class and teach it about the backdoors into whatever DOM implementation you want. It's possible that some revised DOM API will finally resolve this problem.

Version:
$Date: 2001/07/10 22:29:04 $
Author:
David Brownell
See Also:
DomParser

Inner Class Summary
static class DomConsumer.Handler
          Class used to intercept various parsing events and use them to populate a DOM document.
 
Constructor Summary
DomConsumer(java.lang.Class impl)
          Configures this consumer to use the specified implementation of DOM when constructing its result value.
DomConsumer(java.lang.Class impl, EventConsumer n)
          Configures this consumer as a buffer/filter, using the system default DOM implementation when constructing its result value.
 
Method Summary
 ContentHandler getContentHandler()
          Returns the document handler being used.
 Document getDocument()
          Returns the document constructed from the preceding sequence of events.
 DTDHandler getDTDHandler()
          Returns the DTD handler being used.
 java.lang.Object getProperty(java.lang.String id)
          Returns the lexical handler being used.
 boolean isExpandingReferences()
          Returns true if the consumer is expanding entity references in place (the default), and false if childless EntityReference nodes should instead be created.
 boolean isHidingComments()
          Returns true if the consumer is hiding comments (the default), and false if they should be placed into the output document.
 boolean isHidingWhitespace()
          Returns true if the consumer is hiding ignorable whitespace (the default), and false if such whitespace should be placed into the output document as children of element nodes.
 boolean isSavingExtraNodes()
          Returns true if the consumer is saving "extra" nodes, and false (the default) otherwise.
 boolean isUsingNamespaces()
          Returns true (the default for L2 DOM implementations) if the consumer is using an "XML + Namespaces" style DOM construction, which will cause fatal errors on some legal XML 1.0 documents.
 void setErrorHandler(ErrorHandler handler)
          This method provides a filter stage with a handler that abstracts presentation of warnings and both recoverable and fatal errors.
 void setExpandingReferences(boolean flag)
          Controls whether the consumer will expand entity references in place, or will instead replace them with childless entity reference nodes.
protected  void setHandler(DomConsumer.Handler h)
          This is the hook through which a subclass provides a handler which knows how to access DOM extensions, specific to some implementation, to record additional data in a DOM.
 void setHidingComments(boolean flag)
          Controls whether the consumer is hiding comments.
 void setHidingWhitespace(boolean flag)
          Controls whether the consumer hides ignorable whitespace
 void setSavingExtraNodes(boolean flag)
          Controls whether the consumer will save "extra" nodes.
 void setUsingNamespaces(boolean flag)
          Controls whether the consumer uses an "XML + Namespaces" style DOM construction.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DomConsumer

public DomConsumer(java.lang.Class impl)
            throws SAXException
Configures this consumer to use the specified implementation of DOM when constructing its result value.
Parameters:
impl - class implementing Document which publicly exposes a default constructor
Throws:
SAXException - when there is a problem creating an empty DOM document using the specified implementation

DomConsumer

public DomConsumer(java.lang.Class impl,
                   EventConsumer n)
            throws SAXException
Configures this consumer as a buffer/filter, using the system default DOM implementation when constructing its result value.

This event consumer acts as a buffer and filter, in that it builds a DOM tree and then writes it out when endDocument is invoked. Because of the limitations of DOM, much information will as a rule not be seen in that replay. To get a full fidelity copy of the input event stream, use a TeeConsumer.

Parameters:
impl - class implementing Document which publicly exposes a default constructor
next - receives a "replayed" sequence of parse events when the endDocument method is invoked.
Throws:
SAXException - when there is a problem creating an empty DOM document using the specified DOM implementation
Method Detail

setHandler

protected void setHandler(DomConsumer.Handler h)
This is the hook through which a subclass provides a handler which knows how to access DOM extensions, specific to some implementation, to record additional data in a DOM. Treat this as part of construction; don't call it except before (or between) parses.

getDocument

public final Document getDocument()
Returns the document constructed from the preceding sequence of events. This method should not be used again until another sequence of events has been given to this EventConsumer.

setErrorHandler

public void setErrorHandler(ErrorHandler handler)
Description copied from interface: EventConsumer
This method provides a filter stage with a handler that abstracts presentation of warnings and both recoverable and fatal errors. Most pipeline stages should share a single policy and mechanism for such reports, since application components require consistency in such activities. Accordingly, typical responses to this method invocation involve saving the handler for use; filters will pass it on to any other consumers they use.
Specified by:
setErrorHandler in interface EventConsumer
Following copied from interface: gnu.xml.pipeline.EventConsumer
Parameters:
handler - encapsulates error handling policy for this stage

isExpandingReferences

public final boolean isExpandingReferences()
Returns true if the consumer is expanding entity references in place (the default), and false if childless EntityReference nodes should instead be created.
See Also:
setExpandingReferences(boolean)

setExpandingReferences

public final void setExpandingReferences(boolean flag)
Controls whether the consumer will expand entity references in place, or will instead replace them with childless entity reference nodes.
Parameters:
flag - True iff extra nodes should be saved; false otherwise.
See Also:
isExpandingReferences()

isHidingComments

public final boolean isHidingComments()
Returns true if the consumer is hiding comments (the default), and false if they should be placed into the output document.
See Also:
setHidingComments(boolean)

setHidingComments

public final void setHidingComments(boolean flag)
Controls whether the consumer is hiding comments.
See Also:
isHidingComments()

isHidingWhitespace

public final boolean isHidingWhitespace()
Returns true if the consumer is hiding ignorable whitespace (the default), and false if such whitespace should be placed into the output document as children of element nodes.
See Also:
setHidingWhitespace(boolean)

setHidingWhitespace

public final void setHidingWhitespace(boolean flag)
Controls whether the consumer hides ignorable whitespace
See Also:
isHidingComments()

isSavingExtraNodes

public final boolean isSavingExtraNodes()
Returns true if the consumer is saving "extra" nodes, and false (the default) otherwise. "Extra" nodes are currently defined to be CDATA nodes (instead of normal text nodes), DocumentType and EntityReference nodes. (Notation and Entity nodes can't be portably created, and won't show up regardless of the setting of this flag.)

You may not consistently see all these node types even if you set this flag to true. Only Level 2 DOM implementations can create DocumentType nodes portably, but they can't be populated with any portable APIs. No DOM implementation can populate EntityReference nodes with any portable APIs. Not all parsers expose comment and CDATA nodes, but if they do than most DOM implementations are able to expose those nodes. Any SAX parser may expose ignorable whitespace, and most do so, so stripping out such whitespace is the most reliable of this set of inconsistently supportable DOM features.

See Also:
setSavingExtraNodes(boolean)

setSavingExtraNodes

public final void setSavingExtraNodes(boolean flag)
Controls whether the consumer will save "extra" nodes.
Parameters:
flag - True iff extra nodes should be saved; false otherwise.
See Also:
isSavingExtraNodes()

isUsingNamespaces

public boolean isUsingNamespaces()
Returns true (the default for L2 DOM implementations) if the consumer is using an "XML + Namespaces" style DOM construction, which will cause fatal errors on some legal XML 1.0 documents.
See Also:
setUsingNamespaces(boolean)

setUsingNamespaces

public void setUsingNamespaces(boolean flag)
Controls whether the consumer uses an "XML + Namespaces" style DOM construction.
Parameters:
flag - True iff namespaces should be enforced; else false.
See Also:
isUsingNamespaces()

getContentHandler

public final ContentHandler getContentHandler()
Returns the document handler being used.
Specified by:
getContentHandler in interface EventConsumer

getDTDHandler

public final DTDHandler getDTDHandler()
Returns the DTD handler being used.
Specified by:
getDTDHandler in interface EventConsumer

getProperty

public final java.lang.Object getProperty(java.lang.String id)
                                   throws SAXNotRecognizedException
Returns the lexical handler being used. (DOM construction can't really use declaration handlers.)
Specified by:
getProperty in interface EventConsumer
Following copied from interface: gnu.xml.pipeline.EventConsumer
Parameters:
id - This is a URI identifying the type of property desired.
Returns:
The value of that property, if it is defined.
Throws:
SAXNotRecognizedException - Thrown if the particular pipeline stage does not understand the specified identifier.

Source code is GPL'd in the JAXP subproject at http://savannah.gnu.org/projects/classpathx
This documentation was derived from that source code on 2001-07-12.