gnu.xml.pipeline
Class ValidationConsumer

java.lang.Object
  |
  +--gnu.xml.pipeline.EventFilter
        |
        +--gnu.xml.pipeline.ValidationConsumer
All Implemented Interfaces:
ContentHandler, DeclHandler, DTDHandler, EventConsumer, LexicalHandler

public final class ValidationConsumer
extends EventFilter

This class checks SAX2 events to report validity errors; it works as both a filter and a terminus on an event pipeline. It relies on the producer of SAX events to:

  1. Conform to the specification of a non-validating XML parser that reads all external entities, reported using SAX2 events.
  2. Report ignorable whitespace as such (through the ContentHandler interface). This is, strictly speaking, optional for nonvalidating XML processors.
  3. Make SAX2 DeclHandler callbacks, with default attribute values already normalized (and without "<").
  4. Make SAX2 LexicalHandler startDTD() and endDTD () callbacks.
  5. Act as if the (URI)/namespace-prefixes property were set to true, by providing XML 1.0 names and all xmlns* attributes (rather than omitting either or both).

At this writing, the major SAX2 parsers (such as Ælfred2, Crimson, and Xerces) meet these requirements, and this validation module is used by the optional Ælfred2 validation support.

Note that because this is a layered validator, it has to duplicate some work that the parser is doing; there are also other cost to layering. However, because of layering it doesn't need a parser in order to work! You can use it with anything that generates SAX events, such as an application component that wants to detect invalid content in a changed area without validating an entire document, or which wants to ensure that it doesn't write invalid data to a communications partner.

Also, note that because this is a layered validator, the line numbers reported for some errors may seem strange. For example, if an element does not permit character content, the validator will use the locator provided to it. That might reflect the last character of a characters event callback, rather than the first non-whitespace character.


Current limitations of the validation performed are in roughly three categories.

The first category represents constraints which demand violations of software layering: exposing lexical details, one of the first things that application programming interfaces (APIs) hide. These invariably relate to XML entity handling, and to historical oddities of the XML validation semantics. Curiously, recent (Autumn 1999) conformance testing showed that these constraints are among those handled worst by existing XML validating parsers. Arguments have been made that each of these VCs should be turned into WFCs (most of them) or discarded (popular for the standalone declaration); in short, that these are bugs in the XML specification (not all via SGML):

The second category of limitations on this validation represent constraints associated with information that is not guaranteed to be available (or in one case, is guaranteed not to be available, through the SAX2 API:

A third category relates to ease of implementation. (Think of this as "bugs".) The most notable issue here is character handling. Rather than attempting to implement the voluminous character tables in the XML specification (Appendix B), Unicode rules are used directly from the java.lang.Character class. Recent JVMs have begun to diverge from the original specification for that class (Unicode 2.0), meaning that different JVMs may handle that aspect of conformance differently.

Note that for some of the validity errors that SAX2 does not expose, a nonvalidating parser is permitted (by the XML specification) to report validity errors. When used with a parser that does so for the validity constraints mentioned above (or any other SAX2 event stream producer that does the same thing), overall conformance is substantially improved.

Version:
$Date: 2001/07/11 16:55:23 $
Author:
David Brownell
See Also:
SAXDriver, XmlReader

Fields inherited from class gnu.xml.pipeline.EventFilter
DECL_HANDLER, FEATURE_URI, LEXICAL_HANDLER, PROPERTY_URI
 
Constructor Summary
ValidationConsumer()
          Creates a pipeline terminus which consumes all events passed to it; this will report validity errors as if they were fatal errors, unless an error handler is assigned.
ValidationConsumer(EventConsumer next)
          Creates a pipeline filter which reports validity errors and then passes events on to the next consumer if they were not fatal.
ValidationConsumer(java.lang.String rootName, java.lang.String publicId, java.lang.String systemId, java.lang.String internalSubset, EntityResolver resolver, java.lang.String minimalDocument)
          Creates a validation consumer which is preloaded with the DTD provided.
 
Method Summary
 void attributeDecl(java.lang.String element, java.lang.String attribute, java.lang.String type, java.lang.String mode, java.lang.String value)
          DecllHandler Records attribute declaration for later use in validating document content, and checks validity constraints that are applicable to attribute declarations.
 void characters(char[] buf, int offset, int length)
          ContentHandler Reports a validity error if the element's content model does not permit character data.
 void elementDecl(java.lang.String name, java.lang.String model)
          DecllHandler Records the element declaration for later use when checking document content, and checks validity constraints that apply to element declarations.
 void endDocument()
          ContentHandler Checks whether all ID values that were referenced have been declared, and releases all resources.
 void endDTD()
          LexicalHandler Verifies that all referenced notations and unparsed entities have been declared.
 void endElement(java.lang.String uri, java.lang.String local, java.lang.String name)
          ContentHandler Reports a validity error if the element's content model does not permit end-of-element yet, or a well formedness error if there was no matching startElement call.
 void externalEntityDecl(java.lang.String name, java.lang.String pubId, java.lang.String sysId)
          DecllHandler passed to the next consumer, unless this one was preloaded with a particular DTD
 void internalEntityDecl(java.lang.String name, java.lang.String value)
          DecllHandler passed to the next consumer, unless this one was preloaded with a particular DTD
 void notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
          DTDHandler Records the notation name, for checking NOTATIONS attribute values and declararations of unparsed entities.
 void skippedEntity(java.lang.String name)
          ContentHandler Reports a fatal exception.
 void startDocument()
          ContentHandler Ensures that state from any previous parse has been deleted.
 void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
          LexicalHandler Records the declaration of the root element, so it can be verified later.
 void startElement(java.lang.String uri, java.lang.String local, java.lang.String name, Attributes attributes)
          ContentHandler Performs validity checks against element (and document) content models, and attribute values.
 void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notation)
          DTDHandler Records the entity name, for checking ENTITY and ENTITIES attribute values; records the notation name if it hasn't yet been declared.
 
Methods inherited from class gnu.xml.pipeline.EventFilter
bind, comment, endCDATA, endEntity, endPrefixMapping, getContentHandler, getDocumentLocator, getDTDHandler, getErrorHandler, getNext, getProperty, ignorableWhitespace, processingInstruction, setContentHandler, setDocumentLocator, setDTDHandler, setErrorHandler, setProperty, startCDATA, startEntity, startPrefixMapping
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ValidationConsumer

public ValidationConsumer()
Creates a pipeline terminus which consumes all events passed to it; this will report validity errors as if they were fatal errors, unless an error handler is assigned.
See Also:
EventFilter.setErrorHandler(org.xml.sax.ErrorHandler)

ValidationConsumer

public ValidationConsumer(EventConsumer next)
Creates a pipeline filter which reports validity errors and then passes events on to the next consumer if they were not fatal.
See Also:
EventFilter.setErrorHandler(org.xml.sax.ErrorHandler)

ValidationConsumer

public ValidationConsumer(java.lang.String rootName,
                          java.lang.String publicId,
                          java.lang.String systemId,
                          java.lang.String internalSubset,
                          EntityResolver resolver,
                          java.lang.String minimalDocument)
                   throws SAXException,
                          java.io.IOException
Creates a validation consumer which is preloaded with the DTD provided. It does this by constructing a document with that DTD, then parsing that document and recording its DTD declarations. Then it arranges not to modify that information.

The resulting validation consumer will only validate against the specified DTD, regardless of whether some other DTD is found in a document being parsed.

Parameters:
rootName - The name of the required root element; if this is null, any root element name will be accepted.
publicId - If non-null and there is a non-null systemId, this identifier provides an alternate access identifier for the DTD's external subset.
systemId - If non-null, this is a URI (normally URL) that may be used to access the DTD's external subset.
internalSubset - If non-null, holds literal markup declarations comprising the DTD's internal subset.
resolver - If non-null, this will be provided to the parser for use when resolving parameter entities (including any external subset).
resolver - If non-null, this will be provided to the parser for use when resolving parameter entities (including any external subset).
minimalElement - If non-null, a minimal valid document.
Throws:
SAXNotSupportedException - If the default SAX parser does not support the standard lexical or declaration handlers.
SAXParseException - If the specified DTD has either well-formedness or validity errors
java.io.IOException - If the specified DTD can't be read for some reason
Method Detail

startDTD

public void startDTD(java.lang.String name,
                     java.lang.String publicId,
                     java.lang.String systemId)
              throws SAXException
LexicalHandler Records the declaration of the root element, so it can be verified later. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
startDTD in class EventFilter
Following copied from interface: org.xml.sax.ext.LexicalHandler
Parameters:
name - The document type name.
publicId - The declared public identifier for the external DTD subset, or null if none was declared.
systemId - The declared system identifier for the external DTD subset, or null if none was declared.
Throws:
SAXException - The application may raise an exception.
See Also:
LexicalHandler.endDTD(), LexicalHandler.startEntity(java.lang.String)

endDTD

public void endDTD()
            throws SAXException
LexicalHandler Verifies that all referenced notations and unparsed entities have been declared. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
endDTD in class EventFilter
Following copied from interface: org.xml.sax.ext.LexicalHandler
Throws:
SAXException - The application may raise an exception.
See Also:
LexicalHandler.startDTD(java.lang.String, java.lang.String, java.lang.String)

attributeDecl

public void attributeDecl(java.lang.String element,
                          java.lang.String attribute,
                          java.lang.String type,
                          java.lang.String mode,
                          java.lang.String value)
                   throws SAXException
DecllHandler Records attribute declaration for later use in validating document content, and checks validity constraints that are applicable to attribute declarations. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
attributeDecl in class EventFilter
Following copied from interface: org.xml.sax.ext.DeclHandler
Parameters:
eName - The name of the associated element.
aName - The name of the attribute.
type - A string representing the attribute type.
valueDefault - A string representing the attribute default ("#IMPLIED", "#REQUIRED", or "#FIXED") or null if none of these applies.
value - A string representing the attribute's default value, or null if there is none.
Throws:
SAXException - The application may raise an exception.

elementDecl

public void elementDecl(java.lang.String name,
                        java.lang.String model)
                 throws SAXException
DecllHandler Records the element declaration for later use when checking document content, and checks validity constraints that apply to element declarations. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
elementDecl in class EventFilter
Following copied from interface: org.xml.sax.ext.DeclHandler
Parameters:
name - The element type name.
model - The content model as a normalized string.
Throws:
SAXException - The application may raise an exception.

internalEntityDecl

public void internalEntityDecl(java.lang.String name,
                               java.lang.String value)
                        throws SAXException
DecllHandler passed to the next consumer, unless this one was preloaded with a particular DTD
Overrides:
internalEntityDecl in class EventFilter
Following copied from interface: org.xml.sax.ext.DeclHandler
Parameters:
name - The name of the entity. If it is a parameter entity, the name will begin with '%'.
value - The replacement text of the entity.
Throws:
SAXException - The application may raise an exception.
See Also:
DeclHandler.externalEntityDecl(java.lang.String, java.lang.String, java.lang.String), DTDHandler.unparsedEntityDecl(java.lang.String, java.lang.String, java.lang.String, java.lang.String)

externalEntityDecl

public void externalEntityDecl(java.lang.String name,
                               java.lang.String pubId,
                               java.lang.String sysId)
                        throws SAXException
DecllHandler passed to the next consumer, unless this one was preloaded with a particular DTD
Overrides:
externalEntityDecl in class EventFilter
Following copied from interface: org.xml.sax.ext.DeclHandler
Parameters:
name - The name of the entity. If it is a parameter entity, the name will begin with '%'.
publicId - The declared public identifier of the entity, or null if none was declared.
systemId - The declared system identifier of the entity.
Throws:
SAXException - The application may raise an exception.
See Also:
DeclHandler.internalEntityDecl(java.lang.String, java.lang.String), DTDHandler.unparsedEntityDecl(java.lang.String, java.lang.String, java.lang.String, java.lang.String)

notationDecl

public void notationDecl(java.lang.String name,
                         java.lang.String publicId,
                         java.lang.String systemId)
                  throws SAXException
DTDHandler Records the notation name, for checking NOTATIONS attribute values and declararations of unparsed entities. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
notationDecl in class EventFilter
Following copied from interface: org.xml.sax.DTDHandler
Parameters:
name - The notation name.
publicId - The notation's public identifier, or null if none was given.
systemId - The notation's system identifier, or null if none was given.
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
DTDHandler.unparsedEntityDecl(java.lang.String, java.lang.String, java.lang.String, java.lang.String), AttributeList

unparsedEntityDecl

public void unparsedEntityDecl(java.lang.String name,
                               java.lang.String publicId,
                               java.lang.String systemId,
                               java.lang.String notation)
                        throws SAXException
DTDHandler Records the entity name, for checking ENTITY and ENTITIES attribute values; records the notation name if it hasn't yet been declared. Passed to the next consumer, unless this one was preloaded with a particular DTD.
Overrides:
unparsedEntityDecl in class EventFilter
Following copied from interface: org.xml.sax.DTDHandler
Parameters:
name - The unparsed entity's name.
publicId - The entity's public identifier, or null if none was given.
systemId - The entity's system identifier.
notation - name The name of the associated notation.
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
DTDHandler.notationDecl(java.lang.String, java.lang.String, java.lang.String), AttributeList

startDocument

public void startDocument()
                   throws SAXException
ContentHandler Ensures that state from any previous parse has been deleted. Passed to the next consumer.
Overrides:
startDocument in class EventFilter
Following copied from interface: org.xml.sax.ContentHandler
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
ContentHandler.endDocument()

skippedEntity

public void skippedEntity(java.lang.String name)
                   throws SAXException
ContentHandler Reports a fatal exception. Validating XML processors may not skip any entities.
Overrides:
skippedEntity in class EventFilter
Following copied from interface: org.xml.sax.ContentHandler
Parameters:
name - The name of the skipped entity. If it is a parameter entity, the name will begin with '%', and if it is the external DTD subset, it will be the string "[dtd]".
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.

startElement

public void startElement(java.lang.String uri,
                         java.lang.String local,
                         java.lang.String name,
                         Attributes attributes)
                  throws SAXException
ContentHandler Performs validity checks against element (and document) content models, and attribute values. Passed to the next consumer.
Overrides:
startElement in class EventFilter
Following copied from interface: org.xml.sax.ContentHandler
Parameters:
uri - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qName - The qualified name (with prefix), or the empty string if qualified names are not available.
atts - The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object.
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
ContentHandler.endElement(java.lang.String, java.lang.String, java.lang.String), Attributes

characters

public void characters(char[] buf,
                       int offset,
                       int length)
                throws SAXException
ContentHandler Reports a validity error if the element's content model does not permit character data. Passed to the next consumer.
Overrides:
characters in class EventFilter
Following copied from interface: org.xml.sax.ContentHandler
Parameters:
ch - The characters from the XML document.
start - The start position in the array.
length - The number of characters to read from the array.
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.
See Also:
ContentHandler.ignorableWhitespace(char[], int, int), Locator

endElement

public void endElement(java.lang.String uri,
                       java.lang.String local,
                       java.lang.String name)
                throws SAXException
ContentHandler Reports a validity error if the element's content model does not permit end-of-element yet, or a well formedness error if there was no matching startElement call. Passed to the next consumer.
Overrides:
endElement in class EventFilter
Following copied from interface: org.xml.sax.ContentHandler
Parameters:
uri - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qName - The qualified XML 1.0 name (with prefix), or the empty string if qualified names are not available.
Throws:
SAXException - Any SAX exception, possibly wrapping another exception.

endDocument

public void endDocument()
                 throws SAXException
ContentHandler Checks whether all ID values that were referenced have been declared, and releases all resources. Passed to the next consumer.
Overrides:
endDocument in class EventFilter
See Also:
EventFilter.setDocumentLocator(org.xml.sax.Locator)

Source code is GPL'd in the JAXP subproject at http://savannah.gnu.org/projects/classpathx
This documentation was derived from that source code on 2001-07-12.