Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XmlMapper/UntypedObjectDeserializer swallows duplicated elements in XML documents #205

Closed
joaovarandas opened this issue Aug 4, 2016 · 31 comments
Milestone

Comments

@joaovarandas
Copy link

joaovarandas commented Aug 4, 2016

Hello guys.

I think I have already seen this issue around, but, for UntypedObjectDeserializer the solution maybe a bit different.

Entity

<?xml version="1.0" encoding="UTF-8"?>
<person>
   <name>John</name>
   <parent>Jose</parent>
   <parent>Maria</parent>
   <dogs>
      <count>3</count>
      <dog>
         <name>Spike</name>
         <age>12</age>
      </dog>
      <dog>
         <name>Brutus</name>
         <age>9</age>
      </dog>
      <dog>
         <name>Bob</name>
         <age>14</age>
      </dog>
   </dogs>
</person>

Code

new XmlMapper().readValue(xml, Object.class);       

Output

{
  "name" : "John",
  "parent" : "Maria",
  "dogs" : {
    "count" : "3",
    "dog" : {
      "name" : "Bob",
      "age" : "14"
    }
  }
}

Problem

Duplicated elements in the entity get swallowed by current UntypedObjectDeserializer implementation.

I can't use Typed Objects. In my use case, I don't have any typed objects, because I don't know how objects are sent to me.

Possible Solution

While creating the Map for the data, check if there are duplicated keys, and start an Array, with this approach, the output would be:

{
  "name" : "John",
  "parent" : [ "Jose", "Maria" ],
  "dogs" : {
    "count" : "3",
    "dog" : [ {
      "name" : "Spike",
      "age" : "12"
    }, {
      "name" : "Brutus",
      "age" : "9"
    }, {
      "name" : "Bob",
      "age" : "14"
    } ]
  }
}

How to Reproduce

Gist

Artifacts:

  • jackson-core
  • jackson-databind
  • jackson-dataformat-xml

Version:

  • 2.7.4

Conclusion

The gist implements the solution using an extended version of the UntypedObjectDeserializer.

If not the default behavior, what about creating a new DeserializationFeature to enable this(default or not)?

Should you guys like/aprove this solution, I can always fork the project and submit a pull request with the full solution as a feature or default behavior.

Thanks!
Jp

@joaovarandas joaovarandas changed the title XmlMapper swallows duplicated elements in UntypedObjectDeserializer XmlMapper/UntypedObjectDeserializer swallows duplicated elements in XML documents Aug 4, 2016
joaovarandas added a commit to joaovarandas/jackson-dataformat-xml that referenced this issue Aug 4, 2016
…bject deser

Signed-off-by: joaovarandas <jpvarandas@gmail.com>
@joaovarandas
Copy link
Author

The new test 'testDuplicatedElementsSwallowing' in class 'XmlParserTest' ensures it is working as proposed.

Let me know if you want me to pull this.

@cowtowncoder
Copy link
Member

I don't think this should be the default behavior, but I would be fine with a feature that allows doing this.

One question here is regarding how it should be enabled; at least two possibilities:

  1. Straight setter/getter in XmlMapper (similar to setDefaultUseWrapper)
  2. FromXmlParser.Feature

Of these I would otherwise favor (2) since it's easier to expand, but one problem is that this feature can not be changed on per-call basis and as such should not be a (format-specific) DeserializationFeature. So perhaps (1) is the better way to go.

Alternatively perhaps it would be better to add option (3); add XmlMapper.Feature.

So if you can do a PR for feature itself (with simple setter, for now), and I can retrofit XmlMapper.Feature?

@joaovarandas
Copy link
Author

The problem with adding as a "optional" feature (or setter) is that "unaware" developers could lose data deserialized from XML. That's what happened to me: - Users found out the bug weeks after it has been deployed to production, since during the tests there were no repeated elements.

Anyway, sure I can do that. And it fits my use case as well.
Thanks!

@cowtowncoder
Copy link
Member

Another option would be to explicitly fail on any attempt to bind to "untyped" (java.lang.Object), to make it explicitly clear that failure occurs. I am open to doing this, although that would have to go in next minor version. It would probably also have to apply to Map...

In the end I do not think there are good options: ideally XML is not being mapped to either Object or Map. Trying to auto-detect between List of things and single thing does not work well with Java style typing -- access to such values is awkward. In fact, I might go as far as claiming that it might make more sense to use DOM (and alternatives, JDOM, DOM4J) instead of (Jackson) databind.

@joaovarandas
Copy link
Author

joaovarandas commented Aug 8, 2016

Yeah... I agree with you, but:

I'm currently building a facade for the Apache Components/HTTPClient, and invoking SOAP services... Then I just "transform" the return object to a simple Map/List, and invoke JavaScript code (using Nashorn).

In that use case, it doesn't matter if the object is an Array or a Map since I access them using simple JavaScript bindings like: data["attribute-key"] or data[0] for an array ...

Probably for Scala or other languages inside JVM it would be the same.

I think failing is good too, to fail when there are duplicated elements in a default UntypedObject deser ...

It's not final yet, but if you wanna take a look at the project:
https://github.com/inpaas/inpaas-httpclient

@joaovarandas
Copy link
Author

@cowtowncoder how could I "deregister" a Module or the deserializer?

See the code below:

public XmlMapper setUseXmlUntypedObjectDeserModule(boolean state) {
    if (state == true) {
        Module xmlUntypedObjectDeserModule = new SimpleModule().addDeserializer(Object.class, new XmlUntypedObjectDeserializer());
        registerModule(xmlUntypedObjectDeserModule);
    } else {
        // is there anyway to rollback the state?
    }

    return this;
}

joaovarandas added a commit to joaovarandas/jackson-dataformat-xml that referenced this issue Aug 9, 2016
Signed-off-by: joaovarandas <jpvarandas@gmail.com>
@cowtowncoder
Copy link
Member

There is no way to deregister a module, since registration really is just a callback to get a chance to register other things. Further, there is no metadata on what registered any given handler (no back-references to module). So answer is you can not. You can just register more handlers.

So question would rather become "what are you trying to do". It is possible to add suitable Serializers / Deserializers handler to -- for example -- avoid return custom serializer/deserializer depending configuration.

@Paradoxia
Copy link

@joaovarandas Thank you, your gist solution saved my day.

@arakelian
Copy link

@joaovarandas Saved me, too. Thanks.

@perezmlal
Copy link

@joaovarandas thanks, your gist solution works perfect.

@gonzalogarciajaubert
Copy link

Thanks @joaovarandas for your gist

@isipkh
Copy link

isipkh commented Mar 13, 2018

Thanks @joaovarandas for the solution in gist

@dadoonet
Copy link

Thanks @joaovarandas for the gist. Worked perfectly.

@vy
Copy link
Contributor

vy commented Sep 14, 2018

I linked this gem to a relevant StackOverflow post. Thanks @joaovarandas for the report and the temporary fix!

@Musikolo
Copy link

@cowtowncoder, could you please clarify whether this issue has been solved?
I wonder whether there is a way to support XML with duplicate child nodes?
In the worst case scenario, is there any way to get all XML child node loaded?

Using version 2.9.4 currently.

Thank you!

@cowtowncoder
Copy link
Member

@Musikolo it hasn't, and I don't see a way to handle it via Untyped object deserialization. Jackson XML module does not offer general-purpose node approach to XML -- problem is that XML simply has no native difference that would allow separating Arrays (lists) from Objects (maps), and I really really dislike heuristics that try to separate single-value from multiple-values, as this has proven not to be reliable or stable handling for anything.

@sschuberth
Copy link

Excuse my probably naive question, but why can't we simply use a MultiMap instead of a Map and be good?

BTW: The problem with the approach suggested in this issue is that it does not work if you also need to parse the attributes of the keys / tags with the identical names.

@cowtowncoder
Copy link
Member

@sschuberth Because "untyped" databinding is defined as using basic Maps, Lists, Numbers, Strings and booleans. It can not be changed to use MultiMaps (and I don't think JDK has MultiMaps at this point, so API shouldn't be tied to specific external library that provides type). Since JSON does not allow multiple values per property it would also not make much sense to complicate handling for data formats other than XML, at least not at this level.

One possibility, I think, could be to make XMLMapper expose different type of tree model.
But I am not sure how valuable that would be over someone just using DOM values.

@goodlifeinc
Copy link

Absolute life saver!

In order to throw my 5 cents on the topic, for me it was crucial to readValue to Object.class instead of TypeReference of Map<String, Object>.

Thanks for the workaround provided!

@ST-DDT
Copy link

ST-DDT commented Apr 10, 2019

Is there a workaround for the (XML) JsonNode way as well?

@ViniciusArnhold
Copy link

@cowtowncoder Is this (Handling duplicate elements natively) targeted for 3.0?

@cowtowncoder
Copy link
Member

cowtowncoder commented Sep 11, 2019

I hope it can be addressed, although I do not really have solid plans. But if there was, I think it'd be one
of "Big Ideas", to be described here:

https://github.com/FasterXML/jackson-future-ideas/wiki/JSTEP

with a new JSTEP id, description of basic idea. I wish I had some concrete ideas, as I agree with everyone that availability of a tree model is a HUGE win, and that with XML there has to be a way to indicate element sequences, element/attribute distinction. But still isolating it from some of complexities of XML, while keeping model interoperable with other representations.

I added tentative placeholder (looks like it'd be `JSTEP-6') on page linked-to above.

@cowtowncoder
Copy link
Member

cowtowncoder commented May 20, 2020

One update: created FasterXML/jackson-databind#2732 to help with JsonNode to -- I hope -- essentially implement "coerce as Array if you see "duplicate" property.
This could also pave the way to address "untyped" deserialization case.

@cowtowncoder cowtowncoder added this to the 2.12.0 milestone May 22, 2020
@cowtowncoder cowtowncoder changed the title XmlMapper/UntypedObjectDeserializer swallows duplicated elements in XML documents XmlMapper/UntypedObjectDeserializer swallows duplicated elements in XML documents May 22, 2020
@cowtowncoder
Copy link
Member

I ended up implementing suggestion, so that repeated elements result in automatic "upgrade" into List. This is done automatically when reading xml content.

Will be in 2.12(.0).

alex-bel-apica pushed a commit to ApicaSystem/jackson-dataformat-xml that referenced this issue Sep 4, 2020
…2733)

# Conflicts:
#	release-notes/VERSION-2.x
@taojoe
Copy link

taojoe commented Sep 28, 2020

@cowtowncoder does 2.12 release in a few days?

@cowtowncoder
Copy link
Member

@taojoe not quite that fast: I am hoping to get the first release candidate (rc1) out within a week, and then bit of time is needed for testing. So earliest I think would be in maybe 4 weeks.

@taojoe
Copy link

taojoe commented Sep 29, 2020

@cowtowncoder rc1 is ok for me. Thanks a lot. ' repeated elements result in automatic "upgrade" into List' is the key part in my project. It'll save a lot of time.

@taojoe
Copy link

taojoe commented Oct 7, 2020

@cowtowncoder when will rc1 release ?

@cowtowncoder
Copy link
Member

Missed doing it last weekend, now hoping to have time over upcoming weekend, in 4-5 days.

@taojoe
Copy link

taojoe commented Oct 8, 2020

@cowtowncoder thanks

@yairlenga
Copy link

I added a ticket to improve handling of repeated elements, see [https://github.com//issues/495]

For those with Perl background: current solution is similar to XML::Simple forceArray = true, the addition will follow the forceArray = [ tag1, tag2, …] approach. For parsing in Perl, it was a life saver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests