Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonFactory.Feature.CHARSET_DETECTION to disable charset detection (default to UTF-8) #921

Merged
merged 1 commit into from Feb 21, 2023

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Feb 14, 2023

No description provided.

@cowtowncoder
Copy link
Member

Ok this looks nice and reasonable. But let me think about this for just a bit before merging. :)

Thank you again!

@cowtowncoder
Copy link
Member

Oh, one idea/suggestion: although JsonFactory.Feature is easier to add, making is JsonReadFeature would allow per-call changes (at least when used via ObjectReader).
In case someone wanted to change this more dynamically.

I don't have strong opinion but this just occurred to me and thought I'd share.

@yawkat
Copy link
Member Author

yawkat commented Feb 20, 2023

@cowtowncoder i used a JsonFactory.Feature because it was easier in this case. it looks like JsonReadFeature also requires a JsonParser.Feature at the moment, which did not seem appropriate here, the other JsonParser.Features seem more like they operate to all formats (not just bytes). Do you think it should be a JsonParser.Feature as well, or should I figure out something to add a JsonReadFeature without a JsonParser.Feature?

@cowtowncoder
Copy link
Member

@cowtowncoder i used a JsonFactory.Feature because it was easier in this case. it looks like JsonReadFeature also requires a JsonParser.Feature at the moment, which did not seem appropriate here, the other JsonParser.Features seem more like they operate to all formats (not just bytes). Do you think it should be a JsonParser.Feature as well, or should I figure out something to add a JsonReadFeature without a JsonParser.Feature?

Good point. Unfortunately coupling for 2.x remains (untangling is not super easy), so let's just leave things the way they are as JsonFactory.Feature.
We can reconsider this for 3.0 if there are reasons to change it.

@cowtowncoder cowtowncoder changed the title Add JsonFactory.Feature to disable charset detection Add JsonFactory.Feature to disable charset detection Feb 21, 2023
@cowtowncoder cowtowncoder merged commit 36cd882 into FasterXML:2.15 Feb 21, 2023
cowtowncoder added a commit that referenced this pull request Feb 21, 2023
@yawkat
Copy link
Member Author

yawkat commented Feb 21, 2023

ok thanks!

graemerocher pushed a commit to micronaut-projects/micronaut-core that referenced this pull request Feb 24, 2023
Currently, we use a non-blocking parser to transform input JSON to a JsonNode, then send it downstream, and finally convert it to the desired request type when the route arguments are bound. This patch delays parsing until the conversion to the route type. It consists of these parts:

- A change to JsonContentProcessor. The JsonContentProcessor has to handle certain scenarios (`application/x-json-stream`, or our support for streaming a json array as a Publisher) where the input contains multiple json tokens, Instead of forwarding the input bytes to an async parser, they are first split into individual json values, that are then forwarded or parsed in bulk.
- A new JsonCounter class. This class handles splitting the input into its individual json nodes. It is basically a limited JSON parser that counts braces. It is already very optimized, but it also has a special optimization when the content type is `application/json`, in which case it assumes only a single json value in the input (this is technically a breaking change, but breaks no tests). I also wrote fuzz tests for JsonCounter, I will submit them in a separate PR to https://github.com/micronaut-projects/micronaut-fuzzing .
- A benchmark. It's not run by the build tools, but it is good to have around.
- An option in NettyHttpServerConfiguration to re-enable eager parsing to JsonNode. JsonCounter is still used, so it's not 100% the old behavior, but it should be more compatible in some respects because it doesn't change the conversion logic anymore.
- A new LazyJsonNode class. It contains a ByteBuffer with the actual unparsed bytes. It's a bit complicated because it contains a reference counted buffer that has to be released after conversion, but also sometimes multiple converters are called for the same LazyJsonNode (once to JsonNode, once to a user-defined type). To solve this, when there is a conversion to JsonNode, it keeps that JsonNode around (and releases the buffer). If there are future conversions to a specific type, it will use the JsonNode as the source instead.
- A new JsonSyntaxException class. JsonMappers can throw this to signal a syntax error in the JSON, which will be reported differently than a data binding error.
- Changes to JsonMapper API: The asynchronous parser is deprecated. There is a new readValue method that takes a ByteBuffer. The jackson implementation has an optimization for netty ByteBuffers.
- New converters for LazyJsonNode. Also removed some old superfluous converters for JsonNode.
- A change to RequestLifecycle to show JSON syntax error messages like we did previously, instead of the opaque 400 that we usually send for conversion errors.
- Some test changes to reflect changes in error messages. Because we now parse the input in bulk, in some cases jackson can decorate errors with short snippets of the failing input data. Is this an issue?

There are a few potential incompatibilities with this change:

- Non-standard JSON features when combined with JsonCounter (i.e. when streaming a JSON array, or when using `application/x-json-stream`) will break even when the JSON parser is configured to support them. e.g. if the user configured jackson to ignore comments, that may fail now.
- When the input body is never bound, JSON syntax errors may not be caught.
- JsonCounter only supports UTF-8, by design. This is permitted by the JSON standard, however Jackson also supports UTF-16 and UTF-32. imo this is a net benefit, to avoid potential parser differential vulnerabilities. I also made a jackson feature to match this, though JsonCounter is sufficient now: FasterXML/jackson-core#921
- JsonMapper implementations that do not use JsonSyntaxException yet will have less verbose HTTP errors until they switch to JsonSyntaxException.
@cowtowncoder cowtowncoder changed the title Add JsonFactory.Feature to disable charset detection Add JsonFactory.Feature.CHARSET_DETECTION to disable charset detection Mar 18, 2023
@cowtowncoder cowtowncoder changed the title Add JsonFactory.Feature.CHARSET_DETECTION to disable charset detection Add JsonFactory.Feature.CHARSET_DETECTION to disable charset detection (default to UTF-8) Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants