One of the problems that every software developer meets from time to time is the validation of some XML text against a schema. I am talking about XML schema, not the less strict document type definitions. There are different techniques to do a programmed validation, and I want to summarize my Java experiences in this Blog.
The most frequent case is that the XML text you want to validate contains a reference to an XML schema.
<?xml version="1.0" encoding="UTF-8"?>
<example
xmlns="http://www.example.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.org http://www.example.org/example.xsd">
<title>....</title>
<summary>....</summary>
<content>....</content>
</example>
How to read this? The root element contains three attributes, where ...
http://www.example.org
as the identifier (not location!) for the default namespace of the XML-document, i.e. all elements that do not explicitly declare a namespace (namespace:element
) belong to that space, for example title
. noNamespaceSchemaLocation
attribute, see example on bottom of this page. xsi
, needed to use the attribute xsi:schemaLocation
xsi
to declare a concrete schema for the default namespace identifier http://www.example.org
(first part in attribute value), and it references http://www.example.org/example.xsd
(second part in attribute value, separated by space). Mind that there can be several namespace - location pairs in this attribute value! So the schema for this XML is available on http://www.example.org/example.xsd
. Loading this URI in a web browser should display the contents of the XML schema. All of the elements example, title, summary, content
must be described there.
Following shows a way how to validate this XML using the programming language Java.
First we need a SAX parsing-handler that receives errors and warnings. Conveniently we also want to receive line numbers for the messages.
public class XmlValidationResult extends DefaultHandler
{
public final List<String> warnings = new ArrayList<String>();
public final List<String> errors = new ArrayList<String>();
private Locator locator;
/**
* Called by the SAXParser before any other method.
* @param locator the parser's locator object where you can get line numbers from.
*/
@Override
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
@Override
public void warning(SAXParseException ex) throws SAXException {
warnings.add(lineNumber()+ex.getMessage());
}
@Override
public void error(SAXParseException ex) throws SAXException {
errors.add(lineNumber()+ex.getMessage());
}
@Override
public void fatalError(SAXParseException ex) throws SAXException {
errors.add(lineNumber()+ex.getMessage());
}
private String lineNumber() {
return "Exception during validation"
+((locator != null) ? " at line "+locator.getLineNumber() : "")
+": ";
}
}
Using this handler we now can check the XML for validity.
public static XmlValidationResult validateXml(byte [] documentBytes) {
final InputSource saxSource = new InputSource(new ByteArrayInputStream(documentBytes));
final SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
final XmlValidationResult errorHandler = new XmlValidationResult();
try {
final SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", XMLConstants.W3C_XML_SCHEMA_NS_URI);
parser.parse(saxSource, errorHandler);
}
catch (ParserConfigurationException | SAXException | IOException e) {
errorHandler.errors.add("Unexpected parsing error: "+e.getMessage());
}
return errorHandler;
}
For documentation about the used classes please read their JavaDoc. Unfortunately there isn't a String-constant for "http://java.sun.com/xml/jaxp/properties/schemaLanguage"
anywhere, but it is one.
<?xml version="1.0" encoding="UTF-8"?>
<example>
<title>....</title>
<summary>....</summary>
<content>....</content>
</example>
So here we have some XML that does not declare its schema, and we want to know if it conforms to http://www.example.org/example.xsd
.
Following source would validate this XML in case the schema is passed as Source parameter.
public static XmlValidationResult validateAgainstSchema(Source schemaSource, byte [] documentBytes) {
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
final Schema schema = schemaFactory.newSchema(schemaSource);
final Validator validator = schema.newValidator();
final XmlValidationResult errorHandler = new XmlValidationResult();
validator.setErrorHandler(errorHandler);
validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));
return errorHandler;
}
catch (Exception e) {
throw new RuntimeException("Unexpected validation error: "+e.getMessage());
}
}
This implementation uses the javax.xml
API introduced in Java 1.5.
The preferred way to drive validation surely is the one with internally given schema, because this gives the user the chance to alter the schema after deployment of the application. Else the application would have to maintain a compiled mapping of XML files to schemas.
A special problem with internally given validation is when you have schema files packed into an application.jar file. Imagine the case a user edits some XML, and the application has to validate that XML against one of these schemas. The user names the schema as relative or absolute path, instead through an http-URI.
<?xml version="1.0" encoding="UTF-8"?>
<addresses
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation='/absolute/path/in/jar/test.xsd'>
<address>
<name>Joe Tester</name>
<street>Baker street 5</street>
</address>
</addresses>
This is the simplest way to give XML a schema. The noNamespaceSchemaLocation
attribute can contain just one schema location, no id - location pairs like schemaLocation
.
The XML parser will not be able to locate this schema reference. You will get a message like
cvc-elt.1: Cannot find the declaration of element ....
But you can tell the validator how to load the schema via the org.w3c.dom.ls API (ls = L oad and S ave).
public static XmlValidationResult validateAgainstSchemaInClasspath(byte [] documentBytes) {
final SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
final Schema schema = factory.newSchema();
final Validator validator = schema.newValidator();
final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
validator.setResourceResolver(new LSResourceResolver() {
@Override
public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
final InputStream in = getClass().getResourceAsStream(systemId);
final DOMImplementationLS domImplementationLS = (DOMImplementationLS) registry.getDOMImplementation("LS");
final LSInput input = domImplementationLS.createLSInput();
input.setByteStream(in);
return input;
}
});
final XmlValidationResult errorHandler = new XmlValidationResult();
validator.setErrorHandler(errorHandler);
validator.validate(new StreamSource(new ByteArrayInputStream(documentBytes)));
return errorHandler;
}
catch (Exception e) {
throw new RuntimeException("Unexpected validation error: "+e.getMessage());
}
}
What can you do with such a validation?
For trying this out, here is the source of the XML schema used in this example.
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>
<xs:element name="addresses">
<xs:complexType>
<xs:sequence>
<xs:element ref="address" minOccurs='1' maxOccurs='unbounded' />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element ref="name" minOccurs='0' maxOccurs='1' />
<xs:element ref="street" minOccurs='0' maxOccurs='1' />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="name" type='xs:string' />
<xs:element name="street" type='xs:string' />
</xs:schema>
ɔ⃝ Fritz Ritzberger, 2017-05-29