Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonPointer quadratic memory use: OOME on deep inputs #736

Closed
htmldoug opened this issue Dec 29, 2021 · 7 comments
Closed

JsonPointer quadratic memory use: OOME on deep inputs #736

htmldoug opened this issue Dec 29, 2021 · 7 comments
Labels
2.14 Issue planned (at earliest) for 2.14 performance Issue related to performance problems or enhancements
Milestone

Comments

@htmldoug
Copy link
Contributor

htmldoug commented Dec 29, 2021

Adding https://github.com/nst/JSONTestSuite in rallyhealth/weePickle#106 uncovered that JsonPointer memory usage is quadratic over depth. -Xmx8g is insufficient to create a JsonPointer for 100k opening arrays without an OutOfMemoryError.

JsonPointer seems to be a linked list where each node contains an _asString field containing the full path to that node.

Minimal test to repro the issue: 3764ff4

    // such as https://github.com/nst/JSONTestSuite/blob/master/test_parsing/n_structure_100000_opening_arrays.json
    public void testDeepJsonPointer() throws Exception {
        int DEPTH = 100000;
        String INPUT = new String(new char[DEPTH]).replace("\0", "[");
        JsonParser parser = createParser(MODE_READER, INPUT);
        try {
            while (true) {
                parser.nextToken();
            }
        } catch (Exception e) {
            JsonStreamContext parsingContext = parser.getParsingContext();
            JsonPointer jsonPointer = parsingContext.pathAsPointer(); // OOME
            String pointer = jsonPointer.toString();
            String expected = new String(new char[DEPTH - 1]).replace("\0", "/0");
            assertEquals(expected, pointer);
        }
    }

heap dump filled with JsonPointer instances, each containing a String of the full path

@htmldoug htmldoug changed the title JsonPointer quadratic memory usage: OOME on deep inputs JsonPointer quadratic memory use: OOME on deep inputs Dec 29, 2021
@cowtowncoder
Copy link
Member

Interesting. I will have to look into this a bit. My immediate thinking is that this might be an unfortunate side effect of attempting to retain lazily computed description for JsonPointer for common (if wasteful) use cases where JsonPointer.toString() is called multiple times.

But it sounds like here it is more directly called as a side effect, causing every single pointer "node" to create and retain its full path description.

@cowtowncoder
Copy link
Member

Hmmmh. Unfortunately, looking at code involved, it is not the way I thought -- actual full path is eagerly constructed. This makes changing a whole lot more difficult and probably only doable for a new minor version (2.14).
Not sure if and how to address this, need to think a bit more. :-/

cowtowncoder added a commit that referenced this issue Jan 16, 2022
@pjfanning
Copy link
Member

Can I make a case for Caffeine cache? The cache could be configured to hold onto a max number of pointer strings and to remove others on a LRU basis.

If jackson were to buy into real LRU behaviour with its caches. The LRUMap could be rewritten to use a similar caffeine cache - like the one I support at https://github.com/pjfanning/jackson-caffeine-cache

@cowtowncoder
Copy link
Member

@pjfanning If we needed cache Caffeine is great. But in this case unfortunately caches would not really work, I think. Problem is rather that of how to remove eager construction and retaining of String representation while still allowing efficient operation for common usage.

Caches are highly problematic for this usage due to various reasons, including lack of clear ownership (JsonPointer instances should not have references back to any larger retained objects, all caches need to be owned by life-cycled entities like either JsonFactory or ObjectMapper and so on).

@cowtowncoder
Copy link
Member

cowtowncoder commented Feb 13, 2022

Quick note: still thinking about how to tackle this problem. No solution yet, but one very preliminary idea is around a limit for "_asString" such that existing processing would only be applied for shorter paths (either by length of path representation or number of segments; whichever is easier to track), and different, more dynamic (and less pre-processing) approach was taken beyond that initial set.
I don't have a good idea of how to implement that, starting from what exists; my specific concern is that this might require some sort of back-reference to allow dynamic generation of full path. Such references might be just fine, as long as things are kept immutable, but require some care (mutable ones would be just asking for trouble and within strict no-fly zone for me personally :) ).

Note: marking as performance related since that label covers efficiency aspects too (excessive memory usage).

@cowtowncoder cowtowncoder added the performance Issue related to performance problems or enhancements label Feb 13, 2022
@cowtowncoder
Copy link
Member

Actually I have a bit more specific idea to consider: when constructing an instance, create full path for the deepest level -- but then for parent levels have reference to full path and length (from beginning), "leftstr". New _length field would then be used so that:

  1. If length 1 or larger, create sub-string when needed
  2. Otherwise use String

This would avoid eagerly allocating N String representations upon construction. It would not necessarily be optimal for use cases where JsonPointer instances were retained, but it would at least reduce complexity by one dimension (from quadratic to linear).

I'll see if I can implement something along these lines: it seems doable without sacrificing backwards compatibility, and doable for 2.14 (but just to play it safe probably not for 2.13).

@cowtowncoder cowtowncoder added 2.14 Issue planned (at earliest) for 2.14 and removed 2.13 labels Jul 30, 2022
@cowtowncoder cowtowncoder added this to the 2.14.0 milestone Oct 11, 2022
@cowtowncoder cowtowncoder changed the title JsonPointer quadratic memory use: OOME on deep inputs JsonPointer quadratic memory use: OOME on deep inputs Oct 11, 2022
@cowtowncoder
Copy link
Member

Finally found time to fix this; will be in 2.14.0 final.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Issue planned (at earliest) for 2.14 performance Issue related to performance problems or enhancements
Projects
None yet
Development

No branches or pull requests

3 participants