Mastering Arrays - beckchr/staxon GitHub Wiki

Much of StAXON's Mapping Convention deals with expressing XML-specific concepts such as attributes and namespaces as JSON.

However, as long as you're using StAXON to process "natural" JSON, you won't need to know those rules, simply because your JSON does not have such things as attributes and namespaces.

On the other hand, most JSON constructs are easily represented by XML. For example, a JSON property is represented by an XML element containing the XML representing the property's value. In case of a JSON object, this results in a sequence of XML elements whereas a simple JSON value (string, number, boolean) is represented by text.

That is, the object property

"alice": { ... }

becomes

<alice> ... </alice>

and the simple value property

"alice": "bob"

becomes

<alice>bob</alice>

That's pretty much it! But wait...

What about JSON arrays? Unfortunately, there's nothing like this in XML. And to be honest, this causes most of the trouble when writing JSON via an XML API like StAX. Simply omitting the array boundaries would lead to non-unique JSON properties, which is usually not desired.

StAXON provides several ways to deal with JSON arrays. At the core is the idea to leverage XML processing instructions to tell a StAXON writer about to start an array.

Initiating Arrays with

StAXON supports the <?xml-multiple?> processing instruction to map a sequence of XML elements with the same name to a JSON array.

The processing instruction optionally takes the array element tag name (with prefix) as data. There's no end array hint as StAXON detects the end of an array sequence and closes it automatically.

Consider the following JSON:

{
  "alice" : {
    "bob" : [ "edgar", "charlie" ],
    "peter" : null
  }
}

In order to get a "bob" array instead of two separate "bob" properties, we need to provide XML events corresponding to this:

<?xml version="1.0"?>
<alice>
  <?xml-multiple bob?>
  <bob>edgar</bob>
  <bob>charlie</bob>
  <peter/>
</alice>

To do this with the StAX Cursor-API, get a stream writer from StAXON:

XMLOutputFactory factory = new JsonXMLOutputFactory();
factory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
factory.setProperty(JsonXMLOutputFactory.PROP_MULTIPLE_PI, true);
XMLStreamWriter writer = factory.createXMLStreamWriter(...);

Having the writer, continue like this:

writer.writeStartDocument();
writer.writeStartElement("alice");

writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET, "bob");

writer.writeStartElement("bob");
writer.writeCharacters("edgar");
writer.writeEndElement();

writer.writeStartElement("bob");
writer.writeCharacters("charlie");
writer.writeEndElement();

writer.writeEmptyElement("peter");

writer.writeEndElement();
writer.writeEndDocument();

writer.close();

There are a few things to note:

  • The constant JsonXMLStreamConstants.MULTIPLE_PI_TARGET has the value "xml-multiple".
  • The JsonXMLOutputFactory.PROP_MULTIPLE_PI property is true by default.
  • The name provided as processing instruction data is optional. That is, <?xml-multiple?> will trigger an array start for the next element. However, there's a caveat: you must specify the name when writing an empty element sequence to get an empty array, otherwise the processing instruction accidentally applies to the following element!

When reading JSON, StAXON's readers may insert xml-multiple processing instructions when encountering arrays, too. This feature is controlled by the JsonXMLInputFactory.PROP_MULTIPLE_PI property, which is true by default.

Initiating Arrays with Element Paths

Sometimes it is not possible to generate <?xml-multiple?> processing instruction to control arrays. This is the case if the actual writing isn't done by your code, but some other framework like JAXB or similar, and you only provide a stream writer.

Addressing such a scenario, wouldn't it be nice being able to tell the writer beforehand, which elements should trigger a JSON array?

This is where the XMLMultipleStreamWriter and XMLMultipleEventWriter step in. These writers wrap another writer and "know" the paths of the array elements. Before delegating to its underlying writer to begin a sequence of any of these array elements, it inserts a ` processing instruction into the stream.

A multiple path has the form

path ::= '/'? <localName> ('/' <localName>)*

I.e. paths are absolute or relative, where path segments are local names separated by '/'.

As an example, let's write the following JSON using a XMLMultipleStreamWriter:

{
  "alice" : {
    "bob" : [ "edgar", "charlie" ],
    "peter" : null
  }
}

Obviously, we want to specify "/alice/bob" as a multiple path:

XMLOutputFactory outputFactory = new JsonXMLOutputFactory();
outputFactory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
XMLStreamWriter writer = outputFactory.createXMLStreamWriter(System.out);

writer = new XMLMultipleStreamWriter(writer, true, "/alice/bob");

The boolean parameter specifies whether our paths include the root node (alice) from the paths. That is, we could also use

writer = new XMLMultipleStreamWriter(writer, false, "/bob");

To wrap all bob fields into arrays (not just alice children), we can use a relative path, without a leading slash:

writer = new XMLMultipleStreamWriter(writer, false, "bob");

Now we (or some legacy code, framework, ...) may write our document, and the writer will take care to trigger the bob array for us:

writer.writeStartDocument();
writer.writeStartElement("alice");

writer.writeStartElement("bob");
writer.writeCharacters("edgar");
writer.writeEndElement();

writer.writeStartElement("bob");
writer.writeCharacters("charlie");
writer.writeEndElement();

writer.writeEmptyElement("peter");

writer.writeEndElement();
writer.writeEndDocument();

writer.close();

Instead of providing the multiple paths at construction time, we could have also use the addMultiplePath(String) method.

See Using JAXB for a more realistic example of XMLMultipleStreamWriter utilization.

The XMLMultipleEventWriter is used pretty much the same way. As an example, consider the following "ordinary" XML document:

<alice>
  <bob>edgar</bob>
  <bob>charlie</bob>
  <peter/>
</alice>

This is the equivalent to our JSON above, but without the <?xml-multiple?> processing instruction. Let's use an XMLMultipleEventWriter to copy the XML document to JSON:

InputStream input = ...
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
XMLEventReader reader = inputFactory.createXMLEventReader(input);

XMLOutputFactory outputFactory = new JsonXMLOutputFactory();
outputFactory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
XMLEventWriter writer = outputFactory.createXMLEventWriter(output);

writer = new XMLMultipleEventWriter(writer, "/alice/bob");

writer.add(reader);

reader.close();
writer.close();

In conclusion, XMLMultipleStreamWriter and XMLMultipleEventWriter provide a simple mechanism to correctly insert array boundaries when producing JSON, which works great as long as you know the array paths in advance.

Triggering Arrays automatically

Finally, if nothing else works for you, you may also let StAXON fully automatically determine array boundaries. Use this only if you cannot provide <?xml-multiple?> processing instructions and cannot provide the paths of the elements that should be wrapped into JSON arrays.

However, using this method has several drawbacks:

  • The writer basically needs to cache the entire document in memory, eating both space and time.
  • The writer will not be able to produce empty arrays or arrays with a single element.

To enable this feature, set the JsonXMLOutputFactory.PROP_AUTO_ARRAY property to true.

Note: Though in general this might not be a good idea, it is possible to mix the auto-array and and multiple-paths features.

Triggering Document Arrays

StAXON's writer implementation allows you to wrap a sequence of documents into a JSON array. To do this, write the <?xml-multiple?> PI before writing anything else:

writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET);
writer.writeStartDocument(); // first array component
...
writer.writeEndDocument();
writer.writeStartDocument(); // second array component
...
writer.writeEndDocument();
...
writer.close();

The writer.close() call is crucial here, as it will close the JSON array.

⚠️ **GitHub.com Fallback** ⚠️