Validating XML - xmlunit/user-guide GitHub Wiki

Validator

The core class of XMLUnit's validation support is Validator. You obtain an instance of it via a forLanguage factory method.

The "languages" are specified as URIs and support is different between Java and .NET. Constants defined in the Languages class hold the URIs supported.

Language Name in Languages Supported for Java Supported for .NET
W3C XML Schema W3C_XML_SCHEMA_NS_URI X X
DTD XML_DTD_NS_URI X X
XDR XDR_NS_URI - X
RELAX NG RELAXNG_NS_URI * -

XDR support for .NET has been deprecated since .NET 2.0 so may stop working in future versions. RELAX NG only works if you manage to configure the JAXP validation framework properly and add additional third party libraries.

In the Java case there are actually two implementations of Validator. For DTDs the ParsingValidator which delegates schema validation to the XML parser is used - JAXPValidator which sits on top of the javax.xml.validation package is used for all other languages.

The remainder of this section will loosely use the term "schema" for "the document holding the schema or DTD definition".

The schemas (or DTDs) needed for validation can be specified as Sources (see "Providing Input to XMLUnit") - when validating a DTD with XMLUnit for Java you may need to set schemaURI to the Public ID of the schema so that XMLUnit's EntityResolver can provide the schema to the parser.

If you don't specify any sources for the schema(s) then the parser or validator will try to resolve the URIs and "SYSTEM ID"s specified inside the document itself in order to find the schema document(s).

Result of Validation

No matter whether you validate a document against a schema or the schema document itself, the result of the validation is an instance of ValidationResult which contains a boolean flag indicating whether the document has been validated successfully and a potentially empty collection of ValidationProblems.

ValidationProblems may specify an exact location inside the document specified as line and column, but either may be ValidationProblem.UNKNOWN. It also has a type that reflects the severity of the problem and a message.

Validating Documents

Most people will try to validate XML documents against one or more schemas. This is the job of Validators validateInstance method. For example

Validator v = Validator.forLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.setSchemaSource(new StreamSource("Book.xsd"));
ValidationResult r = v.validateInstance(new StreamSource(new File("Book.xml")));
assertTrue(r.isValid());

for Java, or in C#

Validator v = Validator.ForLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.SchemaSource = new StreamSource("Book.xsd");
ValidationResult r = v.ValidateInstance(new StreamSource("BookXsd.xml"));
Assert.IsTrue(r.Valid);

verifies Book.xml can be validated against Book.xsd.

When using JAXPValidator validating a document will validate the schema document(s) specified as schemaSource as a side effect.

When Java's ParsingValidator is used for a DTD, you may need to explicitly set the systemId of the schema source for it to get applied. Something like

Validator v = Validator.forLanguage(Languages.XML_DTD_NS_URI);
StreamSource dtd = new StreamSource(getClass().getResourceAsStream("/My.dtd"));
dtd.setSystemId(getClass().getResource("/My.dtd").toURI().toString());
v.setSchemaSource(dtd);
ValidationResult r = v.validateInstance(xmlDocument);
assertTrue(r.isValid());

Validating Schemas

When you are defining a schema yourself, you may want to verify the schema document itself is valid - after all the schema document is an XML document as well (at least in the case of W3C's XML Schema). In this case schemaSource is a mandatory property and you use the validateSchema method.

Validator v = Validator.forLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.setSchemaSource(new StreamSource("Book.xsd"));
ValidationResult r = v.validateSchema();
assertTrue(r.isValid());

for Java, or in C#

Validator v = Validator.ForLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.SchemaSource = new StreamSource("Book.xsd");
ValidationResult r = v.ValidateSchema();
Assert.IsTrue(r.Valid);

ValidationMatcher and SchemaValidConstraint

XMLUnit for Java provides a Hamcrest matcher named ValidationMatcher and XMLUnit.NET has an NUnit Constraint1 named SchemaValidConstraint that can be used to validate schema instances. They only support the W3C XML Schema language.

You use the Hamcrest matcher as in

import static org.xmlunit.matchers.ValidationMatcher.valid;
import static org.hamcrest.CoreMatchers.is;

...

assertThat(new StreamSource(new File("Book.xml")),
           is(valid(new StreamSource(new File("Book.xsd")))));

and the NUnit constraint like

Assert.That(new StreamSource("Book.xml"),
            new SchemaValidConstraint(new StreamSource("Book.xsd")));

in order to verify Book.xml can be validated against Book.xsd.

1: Actually there are two of them, one for NUnit 2.x and one for NUnit 3.x.

XXE Prevention

Whenever you parse XML there is the danger of being vulnerable to XML External Entity Processing - XXE for short.

XMLUnit for Java

Even after the changes in 2.6.0 the default configurations of SAXParserFactory and SchemaFactory used by ParsingValidator and JAXPValidator respectively are not XXE safe. This is because it is more likely that you explicititly want to load external DTDs/Schemas and expand external entities when validating.

If you are concerned about XXE you need must pass in factory instances to the respective validators explicititly.

XMLUnit.NET

The validation process uses XmlSchema and ISource internally. When parsing the sources the section about input applies and so it should be XXE safe by default with XMLUnit 2.6.0 and has been all the time for .NET 4.5.2 or higher.

⚠️ **GitHub.com Fallback** ⚠️