Whitespace only Text Nodes - xspec/xspec GitHub Wiki

In short

XSpec handles whitespace-only text nodes in a way similar to XSLT.

When comparing nodes

  • In XSLT, when deep-equal() compares nodes, whitepace-only text nodes are not ignored.
  • Likewise, when XSpec compares the actual result and the expected result, whitespace-only text nodes are not ignored.

When loading embedded XML

  • XSLT ignores whitespace-only text nodes written directly in a stylesheet.
  • Likewise, XSpec ignores whitespace-only text nodes written directly in an XSpec document.

When loading external XML

  • In XSLT, doc() and document() keep whitespace-only text nodes intact.
  • Likewise, in XSpec, @href keeps whitespace-only text nodes intact.

Comparing the actual result and the expected result containing whitespace-only text nodes

When XSpec compares the actual result and the expected result, all nodes are considered as significant. Whitespace-only text nodes are not ignored.

tested.xsl

Note that the constructed body element will have no whitespace-only text nodes (unless it is serialized with indentation and then reloaded).

<xsl:stylesheet exclude-result-prefixes="#all" version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template as="element(body)" name="construct-body">
    <body>
      <p>abc</p>
    </body>
  </xsl:template>
</xsl:stylesheet>

expected.xml

Note that this body element was serialized with indentation.

<?xml version="1.0" encoding="UTF-8"?>
<body>
  <p>abc</p>
</body>

test.xspec

<x:description stylesheet="tested.xsl" xmlns:x="http://www.jenitennison.com/xslt/xspec">
  <x:scenario label="Calling body constructor template">
    <x:call template="construct-body" />
    <x:expect label="Expect body" href="expected.xml" select="body" />
  </x:scenario>
</x:description>

The result of this x:expect is Failure, because of differences between the actual result (<body><p>abc</p></body>) and the expected result (<body>&#x0A;&#x20;&#x20;<p>abc</p>&#x0A;</body>).

Whitespace-only text nodes in the comparison report are represented as grey \t, \n, \r and characters.

Loading embedded XML containing whitespace-only text nodes

XSpec discards whitespace-only text nodes when loading embedded XML.

In this example XSpec

<x:param name="span">
  <span>&#x09;&#x0A;&#x0D;&#x20;</span>
</x:param>

$span is not &#x0A;&#x20;&#x20;<span>&#x09;&#x0A;&#x0D;&#x20;</span>&#x0A; but <span/>.

A whitespace-only text node in embedded XML is kept intact only when one of the following conditions is met:

  • Its nearest ancestor element with @xml:space has @xml:space="preserve". For example,

    <x:context xml:space="preserve"><span>&#x09;&#x0A;&#x0D;&#x20;</span></x:context>
  • Its parent element name is specified in /x:description/@preserve-space. For example,

    <x:description preserve-space="code pre">
    ...
      <x:param>
        <pre>&#x09;&#x0A;&#x0D;&#x20;</pre>
      </x:param>
  • Its parent element is x:text. For example,

    <x:expect label="Expects a whitespace-only text node">
      <x:text>&#x09;&#x0A;&#x0D;&#x20;</x:text>
    </x:expect>

Loading external XML containing whitespace-only text nodes

XSpec keeps whitespace-only text nodes intact when loading external XML. For example,

XSpec

<x:param name="href" href="body.xml" />
<x:param name="doc" select="doc('.../body.xml')" />

body.xml

<?xml version="1.0" encoding="UTF-8"?>
<body>
  <p>abc</p>
</body>

$href and $doc are not <body><p>abc</p></body> but <body>&#x0A;&#x20;&#x20;<p>abc</p>&#x0A;</body>.

x:*/@xml:space and /x:description/@preserve-space have no effect on external XML. For example, this XSpec has no effect on whitespace-only text nodes in body.xml:

<x:description preserve-space="p">
  ...
  <x:param name="href" href="body.xml" xml:space="default" />
  <x:param name="doc" select="doc('.../body.xml')" xml:space="default" />
  ...

Controlling whitespace-only text nodes in external XML

You may want to remove some or all of whitespace-only text nodes when they are in external XML. There is more than one way to do it.

Remove whitespace-only text nodes after loading external XML

You can write a test helper function and use it in @select. For example, see tutorial/helper/ws-only-text/.

Remove whitespace-only text nodes transparently at a lower level

You can use a Saxon-specific URI query parameter, strip-space. For example,

<x:param href="space.xml?strip-space=yes" />

By default, Saxon-specific URI query parameters including strip-space are not recognized. To enable the query parameters, you need to enable a Saxon-specific configuration option, RECOGNIZE_URI_QUERY_PARAMETERS. To configure Saxon-specific configuration options, you can use SAXON_CUSTOM_OPTIONS environment variable (for command line) or saxon.custom.options (for Ant). For example,

  • Command line

    Linux/macOS

    export SAXON_CUSTOM_OPTIONS=--recognize-uri-query-parameters:true

    Windows

    set SAXON_CUSTOM_OPTIONS=--recognize-uri-query-parameters:true
  • Ant

    ant ... -Dsaxon.custom.options=--recognize-uri-query-parameters:true ...

Unfortunately the RECOGNIZE_URI_QUERY_PARAMETERS option (--recognize-uri-query-parameters command line parameter) does not work side by side with XML Catalog. When XML Catalog support is enabled, Saxon does not recognize the strip-space query parameter.

⚠️ **GitHub.com Fallback** ⚠️