Validating XML - xmlunit/user-guide GitHub Wiki
The core class of XMLUnit's validation support is Validator
. You
obtain an instance of it via a forLanguage
factory method.
The "languages" are specified as URIs and support is different between
Java and .NET. Constants defined in the Languages
class hold the
URIs supported.
Language | Name in Languages
|
Supported for Java | Supported for .NET |
---|---|---|---|
W3C XML Schema | W3C_XML_SCHEMA_NS_URI |
X | X |
DTD | XML_DTD_NS_URI |
X | X |
XDR | XDR_NS_URI |
- | X |
RELAX NG | RELAXNG_NS_URI |
* | - |
XDR support for .NET has been deprecated since .NET 2.0 so may stop working in future versions. RELAX NG only works if you manage to configure the JAXP validation framework properly and add additional third party libraries.
In the Java case there are actually two implementations of
Validator
. For DTDs the ParsingValidator
which delegates schema
validation to the XML parser is used - JAXPValidator
which sits on top of the
javax.xml.validation
package is used for all other languages.
The remainder of this section will loosely use the term "schema" for "the document holding the schema or DTD definition".
The schemas (or DTDs) needed for validation can be specified as
Source
s (see
"Providing Input to XMLUnit") - when
validating a DTD with XMLUnit for Java you may need to set schemaURI
to the Public ID of the schema so that XMLUnit's EntityResolver
can
provide the schema to the parser.
If you don't specify any sources for the schema(s) then the parser or validator will try to resolve the URIs and "SYSTEM ID"s specified inside the document itself in order to find the schema document(s).
No matter whether you validate a document against a schema or the
schema document itself, the result of the validation is an instance of
ValidationResult
which contains a boolean flag indicating whether
the document has been validated successfully and a potentially empty
collection of ValidationProblem
s.
ValidationProblem
s may specify an exact location inside the document
specified as line and column, but either may be
ValidationProblem.UNKNOWN
. It also has a type
that reflects the
severity of the problem and a message.
Most people will try to validate XML documents against one or more
schemas. This is the job of Validator
s validateInstance
method.
For example
Validator v = Validator.forLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.setSchemaSource(new StreamSource("Book.xsd"));
ValidationResult r = v.validateInstance(new StreamSource(new File("Book.xml")));
assertTrue(r.isValid());
for Java, or in C#
Validator v = Validator.ForLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.SchemaSource = new StreamSource("Book.xsd");
ValidationResult r = v.ValidateInstance(new StreamSource("BookXsd.xml"));
Assert.IsTrue(r.Valid);
verifies Book.xml
can be validated against Book.xsd
.
When using JAXPValidator
validating a document will validate the
schema document(s) specified as schemaSource
as a side effect.
When Java's ParsingValidator
is used for a DTD, you may need to explicitly set the systemId of
the schema source for it to get applied. Something like
Validator v = Validator.forLanguage(Languages.XML_DTD_NS_URI);
StreamSource dtd = new StreamSource(getClass().getResourceAsStream("/My.dtd"));
dtd.setSystemId(getClass().getResource("/My.dtd").toURI().toString());
v.setSchemaSource(dtd);
ValidationResult r = v.validateInstance(xmlDocument);
assertTrue(r.isValid());
When you are defining a schema yourself, you may want to verify the
schema document itself is valid - after all the schema document is an
XML document as well (at least in the case of W3C's XML Schema). In
this case schemaSource
is a mandatory property and you use the
validateSchema
method.
Validator v = Validator.forLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.setSchemaSource(new StreamSource("Book.xsd"));
ValidationResult r = v.validateSchema();
assertTrue(r.isValid());
for Java, or in C#
Validator v = Validator.ForLanguage(Languages.W3C_XML_SCHEMA_NS_URI);
v.SchemaSource = new StreamSource("Book.xsd");
ValidationResult r = v.ValidateSchema();
Assert.IsTrue(r.Valid);
XMLUnit for Java provides a Hamcrest matcher named ValidationMatcher
and XMLUnit.NET has an NUnit
Constraint1 named
SchemaValidConstraint
that can be used to validate schema
instances. They only support the W3C XML Schema language.
You use the Hamcrest matcher as in
import static org.xmlunit.matchers.ValidationMatcher.valid;
import static org.hamcrest.CoreMatchers.is;
...
assertThat(new StreamSource(new File("Book.xml")),
is(valid(new StreamSource(new File("Book.xsd")))));
and the NUnit constraint like
Assert.That(new StreamSource("Book.xml"),
new SchemaValidConstraint(new StreamSource("Book.xsd")));
in order to verify Book.xml
can be validated against Book.xsd
.
1: Actually there are two of them, one for NUnit 2.x and one for NUnit 3.x.
Whenever you parse XML there is the danger of being vulnerable to XML External Entity Processing - XXE for short.
Even after the changes in
2.6.0 the default
configurations of SAXParserFactory
and SchemaFactory
used by
ParsingValidator
and JAXPValidator
respectively are not XXE
safe. This is because it is more likely that you explicititly want to
load external DTDs/Schemas and expand external entities when
validating.
If you are concerned about XXE you need must pass in factory instances to the respective validators explicititly.
The validation process uses XmlSchema
and ISource
internally. When
parsing the sources the section about
input applies and so it
should be XXE safe by default with XMLUnit 2.6.0 and has been all the
time for .NET 4.5.2 or higher.