Mastering Arrays - beckchr/staxon GitHub Wiki
Much of StAXON's Mapping Convention deals with expressing XML-specific concepts such as attributes and namespaces as JSON.
However, as long as you're using StAXON to process "natural" JSON, you won't need to know those rules, simply because your JSON does not have such things as attributes and namespaces.
On the other hand, most JSON constructs are easily represented by XML. For example, a JSON property is represented by an XML element containing the XML representing the property's value. In case of a JSON object, this results in a sequence of XML elements whereas a simple JSON value (string, number, boolean) is represented by text.
That is, the object property
"alice": { ... }
becomes
<alice> ... </alice>
and the simple value property
"alice": "bob"
becomes
<alice>bob</alice>
That's pretty much it! But wait...
What about JSON arrays? Unfortunately, there's nothing like this in XML. And to be honest, this causes most of the trouble when writing JSON via an XML API like StAX. Simply omitting the array boundaries would lead to non-unique JSON properties, which is usually not desired.
StAXON provides several ways to deal with JSON arrays. At the core is the idea to leverage XML processing instructions to tell a StAXON writer about to start an array.
StAXON supports the <?xml-multiple?>
processing instruction to map a sequence of XML elements
with the same name to a JSON array.
The processing instruction optionally takes the array element tag name (with prefix) as data. There's no end array hint as StAXON detects the end of an array sequence and closes it automatically.
Consider the following JSON:
{
"alice" : {
"bob" : [ "edgar", "charlie" ],
"peter" : null
}
}
In order to get a "bob"
array instead of two separate "bob"
properties, we need to provide
XML events corresponding to this:
<?xml version="1.0"?>
<alice>
<?xml-multiple bob?>
<bob>edgar</bob>
<bob>charlie</bob>
<peter/>
</alice>
To do this with the StAX Cursor-API, get a stream writer from StAXON:
XMLOutputFactory factory = new JsonXMLOutputFactory();
factory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
factory.setProperty(JsonXMLOutputFactory.PROP_MULTIPLE_PI, true);
XMLStreamWriter writer = factory.createXMLStreamWriter(...);
Having the writer, continue like this:
writer.writeStartDocument();
writer.writeStartElement("alice");
writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET, "bob");
writer.writeStartElement("bob");
writer.writeCharacters("edgar");
writer.writeEndElement();
writer.writeStartElement("bob");
writer.writeCharacters("charlie");
writer.writeEndElement();
writer.writeEmptyElement("peter");
writer.writeEndElement();
writer.writeEndDocument();
writer.close();
There are a few things to note:
- The constant
JsonXMLStreamConstants.MULTIPLE_PI_TARGET
has the value"xml-multiple"
. - The
JsonXMLOutputFactory.PROP_MULTIPLE_PI
property istrue
by default. - The name provided as processing instruction data is optional. That is,
<?xml-multiple?>
will trigger an array start for the next element. However, there's a caveat: you must specify the name when writing an empty element sequence to get an empty array, otherwise the processing instruction accidentally applies to the following element!
When reading JSON, StAXON's readers may insert xml-multiple
processing instructions when
encountering arrays, too. This feature is controlled by the JsonXMLInputFactory.PROP_MULTIPLE_PI
property, which is true
by default.
Sometimes it is not possible to generate <?xml-multiple?>
processing instruction to control
arrays. This is the case if the actual writing isn't done by your code, but some other framework
like JAXB or similar, and you only provide a stream writer.
Addressing such a scenario, wouldn't it be nice being able to tell the writer beforehand, which elements should trigger a JSON array?
This is where the XMLMultipleStreamWriter
and XMLMultipleEventWriter
step in. These writers
wrap another writer and "know" the paths of the array elements. Before delegating to its underlying
writer to begin a sequence of any of these array elements, it inserts a ` processing
instruction into the stream.
A multiple path has the form
path ::= '/'? <localName> ('/' <localName>)*
I.e. paths are absolute or relative, where path segments are local names separated by '/'
.
As an example, let's write the following JSON using a XMLMultipleStreamWriter
:
{
"alice" : {
"bob" : [ "edgar", "charlie" ],
"peter" : null
}
}
Obviously, we want to specify "/alice/bob"
as a multiple path:
XMLOutputFactory outputFactory = new JsonXMLOutputFactory();
outputFactory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
XMLStreamWriter writer = outputFactory.createXMLStreamWriter(System.out);
writer = new XMLMultipleStreamWriter(writer, true, "/alice/bob");
The boolean parameter specifies whether our paths include the root node (alice
)
from the paths. That is, we could also use
writer = new XMLMultipleStreamWriter(writer, false, "/bob");
To wrap all bob
fields into arrays (not just alice
children), we can use a relative
path, without a leading slash:
writer = new XMLMultipleStreamWriter(writer, false, "bob");
Now we (or some legacy code, framework, ...) may write our document, and the
writer will take care to trigger the bob
array for us:
writer.writeStartDocument();
writer.writeStartElement("alice");
writer.writeStartElement("bob");
writer.writeCharacters("edgar");
writer.writeEndElement();
writer.writeStartElement("bob");
writer.writeCharacters("charlie");
writer.writeEndElement();
writer.writeEmptyElement("peter");
writer.writeEndElement();
writer.writeEndDocument();
writer.close();
Instead of providing the multiple paths at construction time, we could have also use the
addMultiplePath(String)
method.
See Using JAXB for a more realistic example of XMLMultipleStreamWriter
utilization.
The XMLMultipleEventWriter
is used pretty much the same way. As an example, consider
the following "ordinary" XML document:
<alice>
<bob>edgar</bob>
<bob>charlie</bob>
<peter/>
</alice>
This is the equivalent to our JSON above, but without the <?xml-multiple?>
processing
instruction. Let's use an XMLMultipleEventWriter
to copy the XML document to JSON:
InputStream input = ...
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
XMLEventReader reader = inputFactory.createXMLEventReader(input);
XMLOutputFactory outputFactory = new JsonXMLOutputFactory();
outputFactory.setProperty(JsonXMLOutputFactory.PROP_PRETTY_PRINT, true);
XMLEventWriter writer = outputFactory.createXMLEventWriter(output);
writer = new XMLMultipleEventWriter(writer, "/alice/bob");
writer.add(reader);
reader.close();
writer.close();
In conclusion, XMLMultipleStreamWriter
and XMLMultipleEventWriter
provide a simple
mechanism to correctly insert array boundaries when producing JSON, which works great
as long as you know the array paths in advance.
Finally, if nothing else works for you, you may also let StAXON fully automatically determine
array boundaries.
Use this only if you cannot provide <?xml-multiple?>
processing instructions
and cannot provide the paths of the elements that should be wrapped into JSON arrays.
However, using this method has several drawbacks:
- The writer basically needs to cache the entire document in memory, eating both space and time.
- The writer will not be able to produce empty arrays or arrays with a single element.
To enable this feature, set the JsonXMLOutputFactory.PROP_AUTO_ARRAY
property to true
.
Note: Though in general this might not be a good idea, it is possible to mix the auto-array and and multiple-paths features.
StAXON's writer implementation allows you to wrap a sequence of documents into a JSON array. To
do this, write the <?xml-multiple?>
PI before writing anything else:
writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET);
writer.writeStartDocument(); // first array component
...
writer.writeEndDocument();
writer.writeStartDocument(); // second array component
...
writer.writeEndDocument();
...
writer.close();
The writer.close()
call is crucial here, as it will close the JSON array.