Providing Input to XMLUnit - xmlunit/user-guide GitHub Wiki
All core parts of XMLUnit use a single abstraction for "pieces of XML"
they are supposed to work on. For Java this is
javax.xml.transform.Source
and for .NET we've created
Org.XmlUnit.ISource
which basically adds a wrapper around an
XmlReader
.
For Java many implementations of said interface are part of the Java class library, for .NET we've added the corresponding
-
ReaderSource
- just wraps an existingXmlReader
-
DOMSource
- creates aSource
from anXmlNode
-
StreamSource
- creates aSource
from aTextReader
,Stream
or a string holding an URI -
LinqSource
- creates aSource
from anXNode
At the time of this writing there is no XML-Serialization based
equivalent of JAXBSource
for .NET.
In order to make it easier to create instances of Source
or
ISource
there
a builder, that provides a fluent API.
CommentLessSource
is a decorator of a different source and provides
XML that consists of the original source's content with all comments
removed.
Use this wrapper if you want XMLUnit to ignore comments.
This is class is used under the covers if you tell DiffBuilder
to
ignore comments.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlWhitespaceStrippedSource
instead - see below.
WhitespaceStrippedSource
is a decorator of a different source that
removes all empty text nodes and trims the remaining text nodes.
If you only want to remove all "element content whitespace", i.e. text
content between XML elements that is just an artifact of "pretty
printing" XML then you should use
ElementContentWhitespaceStrippedSource
instead.
Empty text nodes are removed:
<element>
</element>
becomes
<element></element>
Text Nodes are stripped:
<element>
foo
</element>
becomes
<element>foo</element>
If the XML content has been created in memory rather than been deserialized from an external source it could contain adjacent Text nodes so that
<element>
foo
bar
</element>
could become
<element>foobar</element>
or
<element>
foo
bar
</element>
depending on how the document has been structured. In order to get
more control the input had to be normalized (using
Document.normalize()
or XmlDocument.Normalize()
) before wrapping
it in a WhitespaceStrippedSource
- or by using an additional
NormalizedSource wrapper.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlWhitespaceNormalizedSource
instead - see below.
WhitespaceNormalizedSource
is a decorator of a different source that
replaces all whitespace characters found in Text nodes with Space
characters and collapses consecutive whitespace characters into a
single Space.
<element>a
b
</element>
becomes
<element>a b </element>
NormalizedSource
performs XML normalization on the wrapped document.
This means adjacent text nodes are merged to single nodes and empty
Text nodes removed (recursively). For Java when wrapping a Document
rather than a Node additional normalizations may be preformed - see
XmlNode.Normalize
for .NET and
Node#normalize
as well as
Document#normalizeDocument
for Java.
When reading documents a parser usually puts the document into normalized form anyway. You will only need to perform XML normalization on DOM trees you have created programmatically.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlElementContentWhitespaceStrippedSource
instead - see below.
ElementContentWhitespaceStrippedSource
is a decorator of a different
source that removes all text nodes solely consisting of whitespace.
The main use of this decorator is to remove all "element content whitespace", i.e. text content between XML elements that is just an artifact of "pretty printing" XML.
This class has been added with XMLUnit 2.6.0.
Empty text nodes are removed:
<element>
</element>
becomes
<element></element>
Text Nodes are not stripped:
<element>
foo
</element>
remains
<element>
foo
</element>
With the Helper Class Input
you can generate Input.Builder
to create Source
instances.
Source source = Input.fromFile("file:/..../test.xml").build();
or with XSL transformations:
Source source = Input.byTransforming(Input.fromFile("file:/..../test.xml"))
.withStylesheet(Input.fromFile("file:/..../test.xsl"))
.build();
In .NET the code Examples are very similar, see API:
Java: http://www.xmlunit.org/api/java/master/org/xmlunit/builder/Input.html
.NET: http://www.xmlunit.org/api/net/master/Org.XmlUnit.Builder/Input.html
A special case is the helper method Input.from(Object)
.
This generic method creates a Builder instance depending of the type of the given Object:
Java type | .NET type | Description |
---|---|---|
org.xmlunit.builder.Input.Builder | Org.XmlUnit.Builder.Input.IBuilder | Builder to create an XML-Source. |
javax.xml.transform.Source | Org.XmlUnit.ISource | XML-Source |
org.w3c.dom.Document | System.Xml.XmlDocument | dom Document |
org.w3c.dom.Node | System.Xml.XmlNode | dom Node |
- | System.Xml.Linq.XDocument | Linq Document |
- | System.Xml.Linq.XNode | Linq Node |
byte[] | byte[] | byte[] which is an XML-Content. |
String | string | String which is an XML-Content. |
java.io.File | - | File which contains XML. |
java.net.URL | - | URL to an XML |
java.net.URI | System.Uri | URI to an XML |
java.io.InputStream | System.IO.Stream | Stream from an XML. |
java.nio.channels.ReadableByteChannel | System.IO.TextReader | ReadableByteChannel or TextReader of an XML |
A Jaxb Object | - | Object which can be transformed to XML by javax.xml.bind.JAXB.marshal(...) |
This method simplifies the API of DiffBuilder
and CompareMatcher
which can accept nearly any Object as input to generate a valid Source.
Whenever you parse XML there is the danger of being vulnerable to XML External Entity Processing - XXE for short.
When passing input to XMLUnit the input is tranformed to a DOM
document with the help of a DocumentBuilder
most of the time. Prior
to XMLUnit for Java 2.6.0 the DocumentBuilder
used by default was
not configured to prevent XXE as Java's defaults are
vulnerable. Starting with XMLUnit 2.6.0 the default DocumentBuilder
is configured according to OWASP's XXE Prevention
Cheat
Sheet.
This means if you want to protect yourself against XXE and you use a
version of XMLUnit prior to 2.6.0 you have to explicitly set a
DocumentBuilderFactory
that is configured properly. Likewise if you
rely on DTD loading or expansion of external entities you must provide
an explicit DocumentBuilderFactory
when using XMLUnit 2.6.0 or
later.
If you use the legacy module, XXE prevention is disabled by
default. Starting with XMLUnit 2.6.0 the XMLUnit
class has a new
setEnableXXEPrevention
method that can be used to enable it.
When using .NET 4.5.2 or newer the default settings used by
XMLUnit.NET have always been safe according to OWASP's XXE Prevention
Cheat
Cheet. Prior
to XMLUnit.NET 2.6.0 there have been a few places where XmlDocument
is used and did not explicitly disable the XmlResolver
which means
these places have been vulnerable.
If you rely on XmlDocument
loading external entities you will need
to provide an XmlResolver
of your own startting with XMLUnit.NET
2.6.0.
The XML specification has a very limited set of characters it considers whitespace while Unicode knows a lot more whitespace characters.
Some of the sources provided by XMLUnit are used to ignore whitespace differences - they use the trim/Trim
methods of the
String
class respectively. For Java trim
's idea of whitespace is compatible with the XML definition (it also removes some control characters which would be illegal inside an XML document). For .NET things are different, though, Trim
uses Unicode's definition of
whitespace and thus may hide differences in non-XML whitespace.
Starting with XMLUnit.NET 2.10.0 new sources XmlWhitespaceStrippedSource
, XmlWhitespaceNormalizedSource
,
and XmlElementContentWhitespaceStrippedSource
have been added that only act on whitespace by XML's definition.
This means Java's WhitespaceStrippedSource
acts more like .NET's XmlWhitespaceStrippedSource
than WhitespaceStrippedSource
-
and the same is true for the other sources. "Fixing" the original .NET sources would have broken too many existing
tests, so new types have been added.
Java 11 introduces a new strip
method to the String
class that acts like .NET's Trim
and could be used to implement Source
types that act like .NET's WhitespaceStrippedSource
, WhitespaceNormalizedSource
, and ElementContentWhitespaceStrippedSource
respectively.