Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase default max allowed String value length from 5 megs to 20 megs #1014

Closed
cowtowncoder opened this issue May 5, 2023 · 17 comments
Closed
Labels
2.15 Issue planned (at earliest) for 2.15 performance Issue related to performance problems or enhancements
Milestone

Comments

@cowtowncoder
Copy link
Member

cowtowncoder commented May 5, 2023

(note: possible follow-up to #959)

Based on user/dev feedback wrt 2.15.0 release, there seem to be use cases where JSON String values (note: NOT Document size) exceed 5 million characters. While this is configurable limit it seems best to try to change default to be yet more lenient, given that this is not the likeliest way to try to cause DoS. It is also the case that those who do wish to use more modest sizes will anyway need to lower limits, regardless of defaults we choose.

My initial thinking is to increase this further to 20 million (giving max memory usage of about 40 megs in JVM -- although temporarily up to twice that due to segmented storage), for Jackson 2.15.1.

@cowtowncoder cowtowncoder added performance Issue related to performance problems or enhancements 2.15 Issue planned (at earliest) for 2.15 labels May 5, 2023
@gdimitrov7
Copy link

We hit the same issue in our product. 20M should be much better and less intrusive and I fully support it.

Q1: Why there is a limit for the max String value? I mean I understand the other limits - maxDepth, maxNumLen - there are performance penalties for them. In the same I do not see exponential increase for the String values length. And yes, there can be OOM but most apps limit the input to be processed - for example on HTTP request body length. Example: Our server has 2GiB heap with max request body 20M - it does not matter if we have JSON with:

  • 2000 fields by 10k chars each (~20M)
  • or 1 field by 20M chars (~20M)

in both cases it should take 2x40M bytes to process the request and parse the body, meaning that we can process ~25 request in the same (if there is no other mem consumed).

I do not say we have to remove the limit, but can it be much bigger by default?

Q2: Is it possible to read the default max from system property?

   public static final int DEFAULT_MAX_STRING_LEN =  
              Integer.getInteger("com.fasterxml.jackson.core.StreamReadConstraints.maxStringLen", 20_000_000);

It will be much easy to be overwritten by apps that need bigger limit by some valid reasons.

@cowtowncoder
Copy link
Member Author

cowtowncoder commented May 12, 2023

Yeah, String value length limit is sort of inspired by number value limit; but practical effects are quite different. In theory I think that if limits were included from beginning they wouldn't have been problematic; going from unlimited to limit is the issue.
On multiple vs individual fields; this is where maximum doc length limit (not yet implemented) would be needed anyway.

As to system properties: for libraries like Jackson, which may be used by multiple other things, frameworks, concurrently, system-wide settings do not make sense in my opinion (system properties essentially being mutable singletons).

Be that as it may, Jackson does not use system properties for anything and my view is that while I do not object to their use, it should occur via extension modules (although in this particular case that'd be difficult to apply).

@gdimitrov7
Copy link

I understand that system properties are not used in Jackson. In the same time they have place in other libraries like logback or even in the JDK.

I agree they are mutable singletons but in cases like this - enforcing new limits - they provide easy way to change to custom bigger value at runtime with just restarting the application and with no code changes and no new version. And provides much easy support when we have issue at the customer environment.

@pjfanning
Copy link
Member

I suggested before the idea of having static methods on StreamReadConstraints that can be used to change the settings for the default StreamReadConstraints. I think this is a viable solution with one downside - some rogue lib maintainer could set values using this approach and that affects any other code running using that ClassLoader. The static methods would ideally only be used by users providing the actual services.

Most users should strongly consider creating their own factory methods for creating ObjectMapper instances - so that they can centralise their preferred config settings.

@cowtowncoder
Copy link
Member Author

Ok. So, although I have been resistant to ideas like what @pjfanning proposed, I am starting to think about this bit more. And I prefer static setter methods over system properties, as mechanism jackson-core could provide; and then allow something else to offer system properties mechanism if that is deemed useful.

As long as static setter method was properly documented with all the caveat and warnings -- basically stating that it is not recommended to be used except by end-user application; and that there are problems with stateful global singletons in general -- I think I'd be +1 for adding such methods in 2.16.
It seems like one of those things that while not generally recommended, could solve some actual real-world issues.

@pjfanning
Copy link
Member

pjfanning commented May 13, 2023

@cowtowncoder the static methods are most useful for v2.15.1 - to make it easier for people who are migrating now. They won't be very useful if left till v2.16.0.

I've raised issues in a number of projects (libs and apps) that use Jackson and there hasn't been much interest in them supporting allowing their users to configure StreamReadConstraints.

I think we are stuck with having to support a mechanism that allows users to modify the default values that StreamReadConstraints applies when users don't set their own via creating their own JsonFactory with their own StreamReadConstraints instance attached.

@cowtowncoder cowtowncoder changed the title Further increase default max allowed String value from 5 megs (2.15.0) Increase default max allowed String value length from 5 megs to 20 megs May 14, 2023
@cowtowncoder cowtowncoder added this to the 2.15.1 milestone May 14, 2023
@cowtowncoder
Copy link
Member Author

At this point I don't want to make API changes against semantic versioning, adding new functionality in patch versions.
We'll see if this change alone helps wrt 2.15.1.

As to adding change in 2.16 I'll wait for someone requesting such a thing; all I am saying is am no longer against that possibility.

@cowtowncoder
Copy link
Member Author

Changed my mind, see #1019 to be included in 2.15.2.

@stolsvik
Copy link

stolsvik commented Jun 7, 2023

I realize I am late to the party here, but I'd prefer you set the default limit to some very high value, like 1G or even higher.

It makes exceptionally little sense for me that a deserialization library should hinder the sizes of something as ordinary as the individual strings within a JSON.

I do get the problem of DoS-style attacks, and that the other limits makes some sense. But string values are absolutely fundamental. If you have problems with "random JSONs" coming over your wires, I fail to see why it should be the (de)serialization library that hinders very long strings - this should be caught by some earlier mechanism where you control the size of the received payload.

Again, the other limits (e.g. excessive number of decimals, excessive nesting etc) makes more sense: These are arguably more subtle vectors for DoSing when using deserialization mechanisms, that are harder to envision and comprehend, compared to plain "large payload size". But even here the default limits should be very high, so that they never come in the way for something that most probably are standard processing.

If you as a developer uncritically accept a payload of 10GB, then that is on you - I just fail to see why "a random JSON deserialization library" should have the default obligation to protect you from this. It could be an optional feature, but why default?

Do I make any sense here?

Edit: Also, as probably mentioned earlier: The total document size (either just the consumed bytes - or the deserialized object tree or whatever) would be much saner to limit the size of - not any individual strings within such a document. I mean, as a DoS attack, I could now just hammer in 1M x 20M strings and rip your memory to pieces anyway.

@cowtowncoder
Copy link
Member Author

I think it's safe to say that opinions on how much "default protection" should be provided vary A LOT. From "why would you ever accept longer than 64k string unless someone enables it" to "just leave it at Integer.MAX_VALUE unless I say otherwise". There are valid points to all views and it really depends.

In hindsight this particular setting should probably have been left at relatively high value; but with versions being out there I don't think I'll be changing defaults for 2.15.x at least.
For 2.16 we can still consider changes; if so, please file a new issue.

And yes, I agree wrt points on max doc size limit -- it was left out due to timing constraints.
I do not necessarily agree that max doc size only matters (if we are talking about max input length), since there is streaming use case where input size may be sort of infinite, as it is never read fully in memory in its entirety: this is different from JSON Strings which are always read fully in memory (even if lazily).

And conversely limiting String value length was inspired by number length limit; however, it's not a good analogy as the trade-offs are quite different.

One takeway for me, too, is that I should have heeded my own "never add a feature just for sake of consistency; only implement things that have been specifically requested".
That'd have worked well here, addressing just number length & nesting depth -- both of which have specific known threats.

Thank you everyone for your feedback here!

@ghost
Copy link

ghost commented Jul 6, 2023

Hi,
I'm late in the game too, but I would like to thank you for correcting that "bug" with the 5'000'000 bytes size and increasing to 20'000'000 bytes. That solved our issue.
That said, since we are handling documents that can vary very much in size, even the 20'000'000 limit that you put, is already too small for us.
We installed the patch for our application yesterday with the version 2.15.2, and it solved all our problems, and today we receive a document which was bigger, and got the following error:

Caused by: org.springframework.core.codec.DecodingException: JSON decoding error: String length (20054016) exceeds the maximum length (20000000)

And for end of year, it will probably skyrocket to even higher numbers.
So for me, what would make more sense, is to have a generic solution, where the developer can decide what is the limit he wants.
I will open a new issue ticket that in mind.
For the moment, we will try to have a custom version of this library and set the limit higher, so until an official solution is found, we can go back to the official one.

PS: In our case, we are not subject to attacks, since our systems are not in internet, and the cycle of handling documents are well controlled from each step of the process. So we don't "care" if the limit was set to 10GB. Well we care about the memory used :-D of course!!! But these potential DOS-style attack in our really particular case is not a concern.

PS2: Well I didn't open a new ticket since there are already tickets open for that and other aspects that are also important! So I won't pollute by opening a duplicate one! :-D

@cowtowncoder
Copy link
Member Author

There is already way to change maximum limits; both via per-factory StreamReadConstraints configuration and static (2.15.2) setter method on StreamReadConstraints.

@singleta
Copy link

Hi, I came across this discussion when the system I work on encountered the limit error this week after we updated our dependency to 2.15.2. We process PDF and JPEG files from a few KB up to 100MB as Base64 encoded strings in the request body. We have a data security restriction that prevents us from storing this data as a temporary file so we have to process it from the request body.

Fortunately our production code is still using 2.14.0-rc1 so that is not suffering.

As AlessandroPerucchi mentioned in his comment, we would also appreciate a user-configurable limit. Our data comes from trusted sources that have already had to jump through some security hoops so we are only concerned with the memory footprint of the large files.

@pjfanning
Copy link
Member

pjfanning commented Jul 19, 2023

@singleta StreamReadConstraints is configurable - have you read #1014 (comment) ?

@ghost
Copy link

ghost commented Jul 19, 2023

@cowtowncoder Thank you, I wasn't aware of this possibility :)

@dan2097
Copy link

dan2097 commented Aug 20, 2023

My initial thinking is to increase this further to 20 million (giving max memory usage of about 40 megs in JVM -- although temporarily up to twice that due to segmented storage), for Jackson 2.15.1.

Typical machines have gigabytes of memory!

The limits like nesting depth make sense as these can't easily be checked by the caller, but string length definitely is checkable.
Given that the JSON specification doesn't set a maximum length there should be a damn good reason for the out of the box behaviour to be incompatible with the specification and a "unlikely attack vector" that can be easily checked by the caller to me doesn't seem like sufficient justification. (EDIT RFC 7159 does allow a limit on length to be specified, so it is still technically complain with the JSON specifications)

These sort of limits are especially insidious as code that worked on test examples can fail for a small percent of input in production that happens to exceed this arbritrary limit.

@mrexodia
Copy link

In case anyone needs to set a reasonable limit (String length (5041581) exceeds the maximum length (5000000)):

// NOTE: This object mapper should be reused across the application
    public static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(
            new JsonFactoryBuilder()
                    .streamReadConstraints(
                            StreamReadConstraints
                                    .builder()
                                    .maxStringLength(Integer.MAX_VALUE) // your limit here
                                    .build()
                    ).build()
    );

There is no security issue being fixed here (see https://www.youtube.com/watch?v=lr1KuL8OmJY for a good explanation), so all this limit does is cause unexpected breakages in production systems and waste engineering time 🤦🏻‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.15 Issue planned (at earliest) for 2.15 performance Issue related to performance problems or enhancements
Projects
None yet
Development

No branches or pull requests

7 participants