Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide implementation of async JSON parser fed by ByteBufferFeeder #478

Closed
poutsma opened this issue Sep 17, 2018 · 11 comments
Closed

Provide implementation of async JSON parser fed by ByteBufferFeeder #478

poutsma opened this issue Sep 17, 2018 · 11 comments
Labels
2.14 Issue planned (at earliest) for 2.14
Milestone

Comments

@poutsma
Copy link

poutsma commented Sep 17, 2018

Currently, there is no implementation of com.fasterxml.jackson.core.async.ByteBufferFeeder provided in Jackson Core; there is only an implementation of ByteArrayFeeder. Within Spring Framework, this requires us to covert the (potentially non-heap-allocated) ByteBuffers into byte arrays before feeding them to the parser, resulting in insufficient memory usage.

Please provide an implementation of ByteBufferFeeder.

@cowtowncoder
Copy link
Member

That would be good addition: I always assumed there would be need for it.

Note to potential implementors: beyond feeder, bigger job is to implement actual parser that uses ByteBuffer as input source. That should be mostly monkey work in copying array-backed variant, changing accessors (but not logic): unfortunately more modular design that could cover both turns out to have non-negligible amount of overhead (JVM can not quite optimize access patterns to remove all call dispatch, I am guessing, or something similar).

@kptfh
Copy link

kptfh commented Oct 1, 2018

Does it make sense to switch internals of NonBlockingJsonParser to ByteBuffer and add 2 new methods
public ByteBufferFeeder getByteBufferNonBlockingInputFeeder() and
public void feedInput(ByteBuffer buf) throws IOException
while retaining previous byte array methods for backward compatibility?

@mizosoft
Copy link

mizosoft commented Mar 8, 2020

Any updates on this? Having to convert ByteBuffers to byte[] is annoying :(

It seems that #483 is hanging. Are there any performance concerns for it?

@kptfh
Copy link

kptfh commented Mar 8, 2020

@mizosoft
Copy link

mizosoft commented Mar 9, 2020

@kptfh Thanks. Might use your NonBlockingJsonParser port. Wonder why it's not merged :/

@cowtowncoder
Copy link
Member

Some would need to show that adding a layer of abstraction does not measurably (say, no more than 5% slower) slow down existing byte[] backed implementation; or, alternatively, provide implementation that does not require abstraction away single-byte accessors. Since single-byte access is on critical path wrt performance I wouldn't want to simply merge something that could significantly slow existing implementation.

Either approach (show suggested patch has no significant performance overhead; add more sizable complete ByteBuffer backed impl) would be acceptable.

On testing, there is:

https://github.com/FasterXML/jackson-benchmarks

which actually does have async variant already. Could test one of releases, and then build from branch. I unfortunately do not have time to do this.

@mizosoft
Copy link

mizosoft commented Mar 10, 2020

Ok so here's my take on this... (see here)

I've tried two possible approaches:

  • UseByteBuffer instead of byte[] as the internal buffer (suggested by the mentioned PR).
  • Have two byte[] and ByteBuffer internal buffers and put them on a mutual exclusive state (only
    one is non-null according to which feeder was used). This requires adding an if/else pair on
    each buffer access.

Another option might be to have two versions of NonBlockingJsonParser for byte[] and ByteBuffer that differ only on buffer accessors. I'm sure this will be a maintenance hell since they both will have the same code (not sure if there is a clean way to extract an abstraction as NonBlockingJsonParser's logic seems pretty complex).

As with performance, the "toggling" parser's thrpt seems exactly the same as the original with byte[] chunks. I'm guessing the JVM is smart enough to branch-predict the accessor as input is always fed from one side (_inputBuffer will always be a non-null). The "nio wrapping" parser is ~27% slower than the original for byte[] (also same with byte[] vs ByteBuffer access). However, it's as fast as the "toggling" parser with ByteBuffer chunks.

@cowtowncoder
Copy link
Member

@mizosoft first of all, thank you for running tests, sharing result. This is useful and gives some data to consider performance aspect.

Couple of questions:

  1. On testing, did you use warmup time? Typically should have a few rounds (5 - 10 seconds) before actual test for everything to compare steady-state. Actual run itself should also last for 10+ seconds.
  2. On toggling... yeah, JVM can possibly branch-predict. My concern there would whether it'd be reliably in real-world usage. I assume most users would only use one, not the other so maybe that's fine.

I am not surprised by wrapping (or, use of ByteBuffer in general) having lower performance than toggling (or native access).

One other thought: I agree that maintenance of 2 implementations would be unfortunate. Not the end of the world -- already there are 4 backends (byte / char, non-blocking, DataInput) -- but not ideal.
But I wonder if it was possible to use code generating with templating to use same template, and have placeholder for accessor. This would generate 2 separate implementations, letting JVM optimize them, but with only 1 source. I do not have much experience with this, but I know tools exist (and Maven plug-ins to use from build). If this could be done it seems like reasonable compromise.

Alternatively it might be possible to do extraction as you mention, and trust JVM to inline single-byte access method (the only difference between two). That is a possibility too.

@mizosoft
Copy link

mizosoft commented Mar 10, 2020

@cowtowncoder No worries :)

On testing, did you use warmup time? Typically should have a few rounds (5 - 10 seconds) before actual test for everything to compare steady-state. Actual run itself should also last for 10+
seconds.

Yes, parser benchmarks have 5 warmup and 10 measurement iterations, 5 seconds each.

On toggling... yeah, JVM can possibly branch-predict. My concern there would whether it'd be reliably in real-world usage. I assume most users would only use one, not the other so maybe that's fine.

Aha. I also added a benchmark for feeding mixed chunks. I think this skewed JVM's branch-prediction and regressed performance w.r.t the wrapping variant. But as you said, most uses come from one feeder and this should rarely affect real world cases.

As with code generation, I think this might be the best of both worlds since you can maintain one version and have the other generated without sacrificing performance. I'm not sure I know a tool that would cover this exact use case, but maybe one can come up with an ad-hoc solution using something like javapoet. Will have to do some research for this.

@pjfanning
Copy link
Member

pjfanning commented Jul 25, 2022

This should be sorted out by #795

That PR adds a createNonBlockingByteBufferParser() method to JsonFactory - which is the ByteBuffer equivalent of the existing createNonBlockingByteArrayParser() method.

@cowtowncoder cowtowncoder added 2.14 Issue planned (at earliest) for 2.14 and removed 3.x labels Jul 25, 2022
@cowtowncoder cowtowncoder changed the title Provide implementation of ByteBufferFeeder Provide implementation of async JSON parser fed by ByteBufferFeeder Jul 25, 2022
cowtowncoder added a commit that referenced this issue Jul 25, 2022
@cowtowncoder cowtowncoder added this to the 2.14.0 milestone Jul 25, 2022
@cowtowncoder
Copy link
Member

Yes. Thanks to @pjfanning we now finally have ByteBuffer backed implementation as well. And the possibility of adding other similar backends either via this core component or (more likely) via extensions.
Although TBH would need to think of actual details of extensibility for alternate implementations.

poutsma added a commit to spring-projects/spring-framework that referenced this issue Oct 19, 2022
This commit upgrades Jackson to 2.14.0-rc2, and uses the new
ByteBufferFeeder in Jackson2Tokenizer.

Unfortunately, because of FasterXML/jackson-core#478,
we had to change the CompilerConventions to suppress class file warnings.

Closes gh-29343
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Issue planned (at earliest) for 2.14
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants