SelectingNodes - xmlunit/user-guide GitHub Wiki
Sometimes you are not interested in the whole of a document but only
want to compare parts of it, for example when your document contains a
lot of boilerplate XML and you are just filling in a small part of
it. You can use a NodeFilter
to tell XMLUnit which parts it should
ignore and which to compare.
Once XMLUnit is focused on the interesting parts, it may need help to
pick the correct pairs of XML nodes to compare. The most strict
scenario is one where the trees must be completely identical and the
order of nodes is significant at every level - but there is a
surprisingly big number of use cases where order is completely
irrelevant. NodeMatcher
is responsible for telling XMLUnit which
nodes of the two documents it compares need to be matched with each
other.
It may help to think of the process XMLUnit applies when comparing documents as a series of four separate steps
- remove all child nodes of any given element that you are not
interested in. This is what
NodeFilter
does. - match the nodes - and in particular the XML elements - to compare
with each other among the children of any given element. This is
NodeMatcher
's job. - for each element remove all attribute you are not interested
in.
AttributeFilter
is responsible for this. - after all this, compare the nodes
using
DifferenceEvaluator
.
If you make NodeMatcher
too picky there won't be anything left for
the difference engine to compare and you end up with a bunch of
CHILD_LOOKUP
differences which are XMLUnit's way of saying "I
haven't found anything to compare this node to". If you make it to
lenient, XMLUnit is going to compare nodes you didn't intend it to
compare.
In order to properly use NodeFilter
and NodeMatcher
it is crucial
to understand that XMLUnit traverses the document from its root
element to the leaves in a depth-first approach and whenever it
encounters an XML element, it consults NodeFilter
to prune the child
nodes that are not interesting and NodeMatcher
to pick the branches
of the two XML documents that should get compared. Once a branch has
been chosen, there is no going back.
For example, assume a control document of
<table>
<tbody>
<tr>
<th>some key</th>
<td>some value</td>
</tr>
<tr>
<th>another key</th>
<td>another value</td>
</tr>
</tbody>
</table>
and a test document of
<table>
<tbody>
<tr>
<th>another key</th>
<td>another value</td>
</tr>
<tr>
<th>some key</th>
<td>some value</td>
</tr>
</tbody>
</table>
If your requirement is to ignore the order of <tr>
s but identify
matching rows based on the textual content of the <th>
nodes, then
NodeMatcher
must already select the "correct" <tr>
elements when
it gets passed in the children of <tbody>
. Once XMLUnit is set on
the <tr>
branches, there is no way to match nodes from one branch to
those of another one.
This is you can't just say "match elements based on their name and
textual content" because any two <tr>
s have the same element name
and the same textual content - none at all if ignoring element content
whitespace. Therefore XMLUnit would simply match the <tr>
s in
document order and not select the rows the way you want them to be
selected.
So when deciding what to prune in NodeFilter
and in particular which
parts to match in NodeMatcher
you have to follow your structure
towards the root of the document tree and find the common ancestor
that needs to make the right decision for the order of branches you
need.
NodeFilter
and AttributeFilter
aren't interfaces of their own
right but just Predicate<(Xml)Node>
and
Predicate<(Xml)Attr(ibute)>
functional interfaces or delegates.
When XMLUnit visits an element, it will invoke the configured
NodeFilter
for each of the child nodes and ignore all nodes where
the filter returned false
.
Likewise it will invoke the configured AttributeFilter
for each
attribute of the element and ignore those where the filter returns
false
.
By default - if no NodeFilter
or AttributeFilter
have been
configured at all - all child nodes and attributes are part of the
comparison process.
As of XMLUnit 2.0.0 there is no public built-in implementation of
NodeFilter
or AttributeFilter
.
(I)NodeMatcher
searches the nodes which should be compared from the
list of test- and control-nodes. It is invoked with the children of
the current elements of the control and test documents and returns the
matching pairs of nodes. Any node not returned as part of a matching
pair is considered "unmatched" and will result in a failed
CHILD_LOOKUP
comparison.
Usually you won't implement (I)NodeMatcher
itself but rather use the
default implementation DefaultNodeMatcher
and configure it to you
needs.
The DefaultNodeMatcher
implementation delegates the decision for
each node to the ElementSelector
and NodeTypeMatcher
implementations passed in as arguments to its constructor.
-
ElementSelector: is used for all nodes of type
(Xml)Element
. The default implementation always returns true which makes XMLUnit compare all elements in document order. -
NodeTypeMatcher: is used for any other nodes that are not
(Xml)Element
s. The default implementation matches nodes by their node type with one exception,CDATA
andText
-nodes are considered the same kind of node.
ElementSelector
receives a single element node from the control and
the test document and decides, whether those two elements should be
compared with each other by XMLUnit. DefaultNodeMatcher
will try to
match each control element with each test element that hasn't been
matched already trying to stay in document order.
For example, when comparing
<root>
<a/>
<b/>
<c/>
<d/>
</root>
with
<root>
<d/>
<a/>
<e/>
<b/>
</root>
Assuming the configured ElementSelector
would return true
if the
element names matched. DefaultNodeMatcher
would invoke
ElementSelector
with the following pairs (the first one from the
control, the second from the test document):
First argument | Second argument | Comment |
---|---|---|
a |
d |
|
a |
a |
=> matching pair found |
b |
e |
tries to keep element order, so doesn't start over again |
b |
b |
=> matching pair found |
c |
d |
hit end of list, start from the front |
c |
e |
list exhausted, no match for c at all |
d |
d |
hit end of list, start from the front => match |
It is possible to configure DefaultNodeMatcher
to use more than one
ElementSelector
when matching elements. If you do so,
DefaultNodeMatcher
will first try to find a matching test node for a
given control node by consulting the first ElementSelector
. If it
didn't find any match it uses the second ElementSelector
and so on.
ElementSelector
is most likely the part that needs to get customized
most often since the exact logic of matching branches with each other
is very specific to each single use case.
Note that when you make XMLUnit visit elements in a different order
than document order XMLUnit will report differences of type
CHILD_NODELIST_SEQUENCE
which in turn results in a SIMILAR
outcome
by DifferenceEvaluators.Default
. If you want to suppress this
difference completely you'll have to provide a custom
DifferenceEvaluator
as well.
XMLUnit comes with a several ElementSelector
implementations most of which
are available as static members of the ElementSelectors
class.
This is the ElementSelector
used by DefaultNodeMatcher
if no
ElementSelector
has been configured explicitly. It simply matches
elements in document order, i.e. the first child element of any given
control element is compared to the first child element of any given
test element, the second to the second and so on.
Actually document order is ensured by DefaultNodeMatcher
itself,
this ElementSelector
simply always returns true
.
This implementation doesn't care about element names at all.
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same.
It doesn't care for namespace prefixes at all, neither does any of
the other built-in ElementSelector
s.
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and their textual content matches.
Example:
Control XML:
<flowers>
<flower>Roses</flower>
<flower>Daisy</flower>
<flower>Crocus</flower>
</flowers>
Test XML:
<flowers>
<flower>Daisy</flower>
<flower>Roses</flower>
<flower>Crocus</flower>
</flowers>
Without custom ElementSelector
you will get a difference "Expected
text value 'Roses' but was 'Daisy' ... ".
With a custom ElementSelectors.byNameAndText
you can ensure the
"right" nodes are compared with each others:
String controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
String testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";
Diff myDiff = DiffBuilder.compare(controlXml).withTest(testXml)
.checkForSimilar() // a different order is always 'similar' not equals.
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
.build();
Assert.assertFalse("XML similar " + myDiff.toString(), myDiff.hasDifferences());
for Java, or for .NET:
string controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
string testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";
var myDiff = DiffBuilder.Compare(controlXml).WithTest(testXml)
.CheckForSimilar() // a different order is always 'similar' not equals.
.WithNodeMatcher(new DefaultNodeMatcher(ElementSelectors.ByNameAndText))
.Build();
Assert.IsFalse(myDiff.hasDifferences(), "XML similar " + myDiff.toString());
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes (as identified by their local name and namespace URI) have the same value.
Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes who's names have been given as parameters have the same value.
There are two overloads of ElementSelectors.byNameAndAttributes
, one
accepts String
s and one QName
s or XmlQualifiedName
s. The
string-arg version only considers attributes in the null-namespace
(i.e. those with only a local name and no associated namespace URI).
Is a variant of ElementSelectors.byNameAndAttributes
where attribute
local names are given as strings and the namespace URI is expected to
be the one defined for the attribute on the control element - this
only works properly if the local names of the attributes are unique
for the given elements.
Expects an XPath expression yielding elements (where the XPath context
"." is the current control or test element) and another
ElementSelector
as arguments.
When comparing to elements, the XPath expression is applied to the
test and control elements and the resulting node lists are compared to
each other using the given ElementSelector
. The control and test
elements match, if the given ElementSelector
finds matching pairs
for all node lists returned by the XPath expression.
This is a (partial) option for a case like the <table>
example from
the beginning of this chapter.
ElementSelectors.byXPath(".//th", ElementSelectors.byNameAndText)
would match the "correct" <tr>
s to each other. It is only a partial
solution since it also works for <th>
and <td>
only by accident
(the node lists are empty, so they match trivially) and blindly using
byXPath
in more complex scenarios is likely to fail.
If your document have defined namespace URI you may want to provide those URI to a XPath engine to get the expected results. An additional overload allows you to that.
Example
<root xmlns="http://namespace.xml">
<a>x</a>
</root>
Map<String, String> prefix2Uri = Collections.singletonMap("pref", "http://namespace.xml");
ElementSelectors.byXPath("./pref:a", prefix2Uri, ElementSelectors.byNameAndText)
These are combiners for other ElementSelectors
, where not
negates
an ElementSelector
, or
returns true if any of the given selectors
does, and
returns true if all of the given selectors would do and
xor
returns true if one of the two given selectors returns true and
the other one returns false. To be honest xor
is only there for
completeness, so far we haven't seen any usecase for it.
There is an important difference between ElementSelectors.or
and
passing several ElementSelectors
to the constructor of
DefaultNodeMatcher
. or
will apply all ElementSelector
s to each
pair of elements immediately, while DefaultNodeMatcher
tries all
control elements for the first ElementSelector
before consulting the
second.
Example
Assuming
<root>
<a>x</a>
<b/>
<a>y</a>
</root>
and
<root>
<a>y</a>
<b>some text</b>
<a>x</a>
</root>
and you want to match by element name and nested textual content - but fall back to just the element's name if there is no match including the textual content.
Using DefaultNodeMatcher(ElementSelectors.byNameAndText, ElementSelectors.byName)
will match the <a>
s with matching textual
content, just as required. Using
ElementSelectors.or(ElementSelectors.byNameAndText, ElementSelectors.byName)
the byNameAndText
will return false
for
the first <a>
elements, but byName
will return true
and so the
"wrong" <a>
s get compared to each other.
ElementSelectors.conditionalSelector
expects a Predicate
that is
applied to the control element when the selector is invoked and
another ElementSelector
. It returns true
if and only if both the
Predicate
and the wrapped ElementSelector
return true. This can
be used together with the boolean combiners to build more complex
ElementSelector
s.
ElementSelectors.selectorForElementNamed
is a convenience shortcut
for conditionalSelector
for the pretty common case of applying a
given ElementSelector
only to elements of a certain name. If has two
overloads that either only uses the element's local name (when using a
string argument) or the local name and namespace URI (when using the
QName
or XmlQualifiedName
argument).
Using ElementSelectors.conditionalBuilder
allows several
ElementSelector
s to be combined based on Predicate
s. It can be
used to set up specific selectors for special nodes and combine them
with a default to use for all elements that didn't match any of the
predicates.
As explained above byXPath
is only a partial solution to the problem
of the beginning of this document. A more robust solution would be
ElementSelectors.conditionalBuilder()
.whenElementIsNamed("tr").thenUse(ElementSelectors.byXPath("./th", ElementSelectors.byNameAndText))
.elseUse(ElementSelectors.byName)
.build();
The C# version would look almost the same just with capitalized member names.
MultiLevelByNameAndTextSelector
is one of two built-in
ElementSelector
s that is not accessible via a member of
ElementSelectors
but rather implemented in a class of its own - for
.NET MultiLevelByNameAndTextSelector.CanBeCompared
is the actual
ElementSelector
delegate.
It extends the idea of ElementSelectors.byNameAndText
by matching
elements of their names match as must the names of their only child
elements for as many levels as is configured inside of the constructor
and in addition the text nested into the child nested as deeply as
given as constructor argument must match. This means new MultiLevelByNameAndTextSelector(1)
and
ElementSelectors.byNameAndText
check the same properties.
This ElementSelector
is only useful in very specific scenarios and
has mostly been added to provide a replacement for XMLUnit for Java
1.x's MultiLevelElementNameAndTextQualifier
.
ByNameAndTextRecSelector
- or
ByNameAndTextRecSelector.CanBeCompared
in the .NET case - is the
heir of XMLUnit for Java 1.x's RecursiveElementNameAndTextQualifier
which has become the infamous default answer on Stackoverflow for user
questions about XMLUnit not matching the proper elements - even and in
particular when it would be the wrong choice.
ByNameAndTextRecSelector
matches two elements, if their local name
and namespace URIs (if any) and their nested text is the same and this
condition also holds true for all nested child elements.
Many times it works in complex scenarios but often only by accident -
for example if there is no nested text at all and a byName
element
selector would work at all.
Rather than using ByNameAndTextRecSelector
blindly it is recommended
to apply it only sparingly combined with more specific
ElementSelector
s inside a conditional construct. It would solve the
problem of the example at the beginning of this document and probably
is a better choice than byXPath
as XPath expressions come at a
higher price than the DOM tree traversal ByNameAndTextRecSelector
has to perform. So for the sake of completeness
ElementSelectors.ConditionalBuilder()
.WhenElementIsNamed("tr").ThenUse(ByNameAndTextRecSelector.CanBeCompared)
.ElseUse(ElementSelectors.ByName)
.Build();
would select the proper <tr>
s - the Java version would be
ElementSelectors.conditionalBuilder()
.whenElementIsNamed("tr").thenUse(new ByNameAndTextRecSelector())
.elseUse(ElementSelectors.byName)
.build();
At first glance there are three ways to combine multiple ElementSelector
s where each should only apply under certain conditions. You could use multiple ElementSelector
s as arguments to DefaultNodeMatcher
s constructor. You could combine the ElementSelector
s using or
or - in the case of conditional ElementSelector
s you could build a single selector with conditionalBuilder
.
The difference between or
and the multi-arg constructor has already been explained. To recap: assume I usually want to compare elements by their name and nested text and am willing to fall back to just the element name if that fails. I've got documents
<root>
<a>foo</a>
<a>bar</a>
</root>
and
<root>
<a>bar</a>
<a>foo</a>
</root>
If I use ElementSelectors.or(ElementSelectors.byNameAndText, ElementSelectors.byName)
and ask it for the first a
element in either document byNameAndText
will return false
and or
asks byName
which returns true
and so the two first a
elements will be matched by XMLUnit.
If instead I use new DefaultNodeMatcher(ElementSelectors.byNameAndText, ElementSelectors.byName)
byNameAndText
gets a chance to see all a
elements first. byName
will only be consulted for the elements that haven't been matched before. This way XMLUnit matches the first a
of the control document to the second a
of the test and vice versa.
Another subtle difference arises with conditional selectors. Assume I want to compare all a
s based on their nested text and any other element based on its name only. This time the docs shall be
<root>
<a>foo</a>
<a>bar</a>
</root>
and
<root>
<a>foo</a>
<a>baz</a>
</root>
If I try new DefaultNodeMatcher(ElementSelectors.selectorForElementNamed("a", ElementSelectors.byNameAndText), ElementSelectors.byName)
the first selector will match the first a
elements and leave unmatched the second elements in either document. Then byName
gets to look at them and will match the second a
s to each other - causing a TEXT_VALUE
difference.
If I instead use ElementSelectors.conditionalBuilder().whenElementIsNamed("a").thenUse(ElementSelectors.byNameAndText).elseUse(ElementSelectors.byName).build()
then the byName
selector will never be applied to any a
element (because the whenElementIsNamed
-clause precludes it) and the result will be two CHILD_LOOKUP
differences.
By picking an ElementSelector
on the one hand you make the
comparison process do the right thing for you use-case if the
documents are similar enough, but you also influence strongly what
amount of information detail you get if the documents are
different. Let's assume in your case you cannot identify elements by
their name alone, you also need to look at some attribute values to
find matching pairs. And generally you expect all attributes to match.
Let's say you've got a control document of
<root>
<child id="1" attr2="foo" attr3="bar"/>
<child id="2" attr2="xyzzy"/>
</root>
and a test document of
<root>
<child id="2" attr2="xyzzy"/>
<child id="1" attr2="foo" attr3="baz"/>
</root>
The child nodes not only differ in order but also the attr3
attribute has different values.
If you now chose an ElementSelector
that requires "element name and
all attribute values must match", then XMLUnit will compare the two
child nodes with id
2 and in addition create two CHILD_LOOKUP
differences - one with the test node being null, one with the control
node being null - because no matching nodes could be found.
If - on the other hand - you ElementSelector
required "element name
and value of the id
attribute", you'd get an ATTR_VALUE
difference
for the attr3
attribute - which is more detailed and likely more
useful.
Therefore you should try to pick the least restrictive
ElementSelector
that describes your use case. Sometimes it is the
best to create a conditional selector with something simple like
byName
as the default and only add more restrictive selectors when
needed.