Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readValue for Map[String, Any] or List[Any] is very slow #145

Closed
anantn opened this issue Apr 22, 2014 · 12 comments
Closed

readValue for Map[String, Any] or List[Any] is very slow #145

anantn opened this issue Apr 22, 2014 · 12 comments
Milestone

Comments

@anantn
Copy link

anantn commented Apr 22, 2014

Anytime there's a type that contains 'Any', 'AnyRef' or '_', the scala module's performance is very slow.

Consider JSON of the form ["foo", "bar", "baz"]. objectMapper.readValue[List[_]](json) is 10x slower than objectMapper.readValue[List[String]](json).

This would be fine on its own, except in the cases where the JSON contains mixed types. What would be the optimal way to parse JSON that looks like the following?

{
"test": [
  {
    "name": "foobar",
    "list": [
       {"type": "int"},
       {"type": "float"}
     ]
  }
]
}
@christophercurrie
Copy link
Member

10x seems unreasonable to have to work around, but I'll have to do some benchmarking to find what's causing it. Do you find the same slowdown with java.util.List, or just Scala lists?

@anantn
Copy link
Author

anantn commented Apr 23, 2014

It works great with java.util.List and java.util.Map which is the workaround I am using currently.

@christophercurrie
Copy link
Member

Ok, thanks; I will investigate and see what can be done.

@anantn
Copy link
Author

anantn commented Apr 30, 2014

I have some stats and a simple benchmarking script to share: https://gist.github.com/anantn/87a7a696b9059979189d - 100 iterations over a 800-line json file.

jackson.readValue[Map[_,_]] ::
Average time: 161523670 ns, Max time: 614400000 ns, Min time: 108651000 ns

jackson.readValue(json, classOf[Map[_,_]]) ::
Average time: 114812600 ns, Max time: 164650000 ns, Min time: 107193000 ns

jackson.readValue(json, classOf[Map[String, _]]) ::
Average time: 115764080 ns, Max time: 127452000 ns, Min time: 108470000 ns

jackson.readValue(json, classOf[java.util.Map[Object, Object]]) ::
Average time: 604850 ns, Max time: 3465000 ns, Min time: 364000 ns

jackson.readValue(json, classOf[java.util.Map[String, Object]]) ::
Average time: 484300 ns, Max time: 810000 ns, Min time: 355000 ns

@darkjh
Copy link

darkjh commented Apr 30, 2014

I have the same problem.
My deser helper looks like this.

  def deser(value: String): Map[String, Any] = {
    mapper.readValue(value, classOf[Map[String, Any]])
  }

On a 31MB file with 270560 jsons, it takes 191 secs. On the same machine ujson in python takes less than one sec.

@cowtowncoder
Copy link
Member

Not sure if it'd help but there are at least 2 possible sources of problem:

  1. Type resolution
  2. Locating serializers for values

To work around first one, you may be able to pre-resolve classOf[Map[String, Any]] into JavaType. In Java, you'd do this by:

final JavaType MAP_TYPE = mapper.constructType(cls);
// or, more complex
final JavaType MAP_TYPE = mapper.getTypeFactory().construct(new TypeReference<Map<String,Object>>() { });

and then pass JavaType instead of class or TypeReference. Benefit here is that resolution of generic types is done once; it may be significant overhead.

For second part (and actually also solving first one, if it was an issue) you can pre-construct ObjectReader, use that:

final ObjectReader reader = mapper.reader(type);

which will resolve type, and pre-fetch serializer needed for that type. ObjectReader instances are fully thread-safe, immutable, and their use is recommeneded over ObjectMapper (similarly for ObjectWriters when writing JSON).

I don't know if this works around the performance issue, but it is worth trying, until root cause is found.

@anantn
Copy link
Author

anantn commented Apr 30, 2014

I'm circumventing this by manually converting Java collections into Scala types recursively:

def convert(obj: Any): Any = {
  import collection.JavaConverters._
  obj match {
    case l: java.util.List[_] => l.asScala.map{convert}.toList
    case m: java.util.Map[_, _] => m.asScala.mapValues(convert).view.force
    case _ => obj
  }
}

val mapper = new ObjectMapper
convert(mapper.readValue(json, classOf[java.util.Map[String, _]]))

That gives me a immutable.Map[String, _] / immutable.List[_] from JSON, in about the same speed as plain Java collections.

@christophercurrie
Copy link
Member

@anantn, can you share your 800-line test file? I have my own test data to run with, but I will be better able to reproduce your issue if I can profile the same data. Feel free to contact me privately if you don't want the data to be public.

@anantn
Copy link
Author

anantn commented May 2, 2014

@christophercurrie christophercurrie added this to the 2.4.5 milestone Nov 26, 2014
@christophercurrie
Copy link
Member

Some work has been done on this issue in the 2.4 branch, which I need to release in the next few days; if any of you have the chance to run your local perf tests on 2.4.4-SNAPSHOT, your feedback would be appreciated.

@christophercurrie
Copy link
Member

FYI, version 2.4.4 was released today. Hopefully it will improve things for you.

@christophercurrie
Copy link
Member

Closing this issue as resolved; reports have come in on a similar issue that performance has improved in 2.4.4. If this is not your experience, feel free to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants