Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of faster floating-point number parsing (Schubfach) with StreamReadFeature.USE_FAST_DOUBLE_PARSER #577

Closed
cowtowncoder opened this issue Nov 6, 2019 · 26 comments
Assignees
Labels
2.14 Issue planned (at earliest) for 2.14 performance Issue related to performance problems or enhancements
Milestone

Comments

@cowtowncoder
Copy link
Member

cowtowncoder commented Nov 6, 2019

Jsoniter project (https://github.com/plokhotnyuk/jsoniter-scala) has many impressive performance optimizations; linked f.ex from here:

https://www.reddit.com/r/java/comments/darehu/jackson_release_210/f1ysf1e/

Of ones included, number-parsing would be relevant for this repo.

EDIT: also see (from the comment below)

"Unrelated to jsoniter but this recent port of Lemire's Double parser:

https://github.com/wrandelshofer/FastDoubleParser

and the original paper https://arxiv.org/abs/2101.11408 also relevant"

@cowtowncoder cowtowncoder added the performance Issue related to performance problems or enhancements label Aug 23, 2020
@cowtowncoder cowtowncoder added good first issue Issue that seems easy to resolve and is likely a good candidate for contributors new to project hacktoberfest Issue related to Hactoberfest2020 activities, eligible for additional rewards labels Oct 9, 2020
@anneloreegger
Copy link
Contributor

I'd like to work on this issue, is it still available?

@cowtowncoder
Copy link
Member Author

@anneloreegger Yes, this is available! I'll assign it to you (no obligation to work, just a marker so others know you are considering it).

@anneloreegger
Copy link
Contributor

Just to be sure I understand the issue correctly. Shall I especially improve the parsingMethods inside the NumberInput-class? (especially parseBigDecimal)
And should this be done on 2.12 or master branch?
Thanks for the answers :)

@cowtowncoder
Copy link
Member Author

@anneloreegger Yes, I think NumberInput has stubs that would allow for this. On 2.12 vs master -- ideally 2.12, if this seems doable on short term with well contained changes. But master if there were need to change API or internal interfaces.
I have not really investigated jsoniter's approach enough to know how involved this gets unfortunately.

@cowtowncoder
Copy link
Member Author

NOTE: BigDecimal use case covered by #677 (for 2.13), will edit title

@cowtowncoder cowtowncoder changed the title Consider number-decoding improvements from jsoniter (esp. for double/float, BigInteger, BigDecimal) Consider number-decoding improvements from jsoniter (esp. for double/float, BigInteger) Feb 5, 2021
@cowtowncoder cowtowncoder removed the hacktoberfest Issue related to Hactoberfest2020 activities, eligible for additional rewards label Feb 11, 2021
@LifeIsStrange
Copy link

LifeIsStrange commented Mar 22, 2021

Unrelated to jsoniter but this recent port of Lemire's Double parser is 5.5X faster than the default double parser!!
(if the microbenchmark generalize)
https://github.com/wrandelshofer/FastDoubleParser

see also the original paper https://arxiv.org/abs/2101.11408

@cowtowncoder
Copy link
Member Author

@LifeIsStrange thank you for sharing this! I hope someone might have time to maybe investigate possibility of PR for improvements here.

@cowtowncoder cowtowncoder changed the title Consider number-decoding improvements from jsoniter (esp. for double/float, BigInteger) Consider number-decoding improvements from jsoniter or Lemire's "fast double parser" (esp. for double/float, BigInteger) Mar 27, 2021
@abc12345678912345
Copy link

abc12345678912345 commented Sep 3, 2021

@cowtowncoder I'd like to work on this issue, could you assign it to me?

@cowtowncoder
Copy link
Member Author

I think @anneloreegger is not working on this, so will change assignee to @abc12345678912345.

If anyone else wants to try to do it instead, please simple add a note here as courtesy; assignments are about intentions but with OSS things come and go.

@pjfanning
Copy link
Member

@cowtowncoder if FastDoubleParser was to be used in jacakson-core - would you accept it as dependency or would the code need to be inlined in jackson-core?

jackson-core is still java6 bound - is this going to change? FastDoubleParser 0.2.0 is Java 8 and 0.3.0 is Java 11 bound.

@cowtowncoder
Copy link
Member Author

I would require inlining; using Maven shade plug-in might be acceptable. But not external dependency.

As to Java 6... tough call. Was hoping to leave it for Jackson 2.x, but may reconsider. Java 6 support is iffy anyway; most likely we have Java 7. And not sure if much usage of latest versions by anyone is pre-Java 8 (although Android may have some odd limits).

But Java 11 would be step too far for Jackson 2.x I think. Jackson 3.0, once I get back to doing it, may well choose different baseline -- Java 14? -- but right now it still only requires Java 8.

@pjfanning
Copy link
Member

pjfanning commented Mar 31, 2022

com.fasterxml.jackson.core.io.ContentReferenece uses java.lang.Objects and that class only arrived in Java 7.

My IDE won't compile jackson-core because of this. Only command line maven build seems to work.

IntelliJ seems more concerned with enforcing these props - https://github.com/FasterXML/jackson-core/blob/2.14/pom.xml#L40 - than maven itself is

@cowtowncoder
Copy link
Member Author

Ok, that is a good example of accidental inclusion of Java 7 things that default tooling (compilation is already with JDK 8) cannot detect, and that we haven't added anything to guard.
Earlier when JDK 6 was available on CI platforms this was easier to guard against.

So I am not against proposing to raise Jackson 2.14 (for example) baseline officially to Java 8.

@pjfanning
Copy link
Member

one extra question - FastDoubleParser has a copyright and is generally MIT licensed - how would you handle this if the classes were copied to jackson-core - a special notice file? - could I keep the license headers in the source files too?

example: https://github.com/wrandelshofer/FastDoubleParser/blob/java8/src/main/java/ch/randelshofer/fastdoubleparser/FastDoubleParser.java#L3

@cowtowncoder
Copy link
Member Author

@pjfanning Looks good. Just one question: are references to "ints" in this context still double values (like, 1.0 and 25.0), or do they relate to actual Java integer numbers?

@pjfanning
Copy link
Member

@cowtowncoder this issue is more about parsing and the FastDoubleParser can handle numbers with and without decimal points. When looking at serializing numbers, we can look whether the suggested alternatives like Schubfach output different values from Double.toString. Ultimately, I think any changes will be hidden behind config settings that are documented to warn users about potential diffs.

@re-thc
Copy link

re-thc commented Jun 7, 2022

Note that a similar but likely faster version has already been integrated into the JDK. See this PR for reference. A summary exists in this reddit post.

@plokhotnyuk
Copy link

plokhotnyuk commented Jun 7, 2022

@re-thc Also, Giulietti's "The Schubfach way to render doubles" was adopted for JDK 8 here and improved for JDK 11 here.

Below are benchmark results for serialization of doubles by different JSON parsers for Scala using JDK 19.

Before openjdk/jdk#3402

[info] Benchmark                                    (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfDoublesWriting.avSystemGenCodec          128  thrpt    5   49126.034 ±  752.732  ops/s
[info] ArrayOfDoublesWriting.borer                     128  thrpt    5   48017.213 ±  576.038  ops/s
[info] ArrayOfDoublesWriting.circe                     128  thrpt    5   53851.533 ±  727.258  ops/s
[info] ArrayOfDoublesWriting.circeJsoniter             128  thrpt    5  211738.017 ± 5150.114  ops/s
[info] ArrayOfDoublesWriting.dslJsonScala              128  thrpt    5   92222.619 ±  965.339  ops/s
[info] ArrayOfDoublesWriting.jacksonScala              128  thrpt    5   52508.342 ±  350.423  ops/s
[info] ArrayOfDoublesWriting.jsoniterScala             128  thrpt    5  284303.731 ± 3273.204  ops/s
[info] ArrayOfDoublesWriting.jsoniterScalaPrealloc     128  thrpt    5  298571.032 ± 3925.366  ops/s
[info] ArrayOfDoublesWriting.ninnyJson                 128  thrpt    5  212634.305 ± 5421.051  ops/s
[info] ArrayOfDoublesWriting.playJson                  128  thrpt    5   17558.393 ±   89.862  ops/s
[info] ArrayOfDoublesWriting.playJsonJsoniter          128  thrpt    5   35317.875 ± 1251.812  ops/s
[info] ArrayOfDoublesWriting.smithy4sJson              128  thrpt    5  236212.794 ± 4307.005  ops/s
[info] ArrayOfDoublesWriting.sprayJson                 128  thrpt    5   30913.349 ± 1243.390  ops/s
[info] ArrayOfDoublesWriting.uPickle                   128  thrpt    5   51697.516 ± 1157.040  ops/s
[info] ArrayOfDoublesWriting.weePickle                 128  thrpt    5   53931.179 ±  655.922  ops/s
[info] ArrayOfDoublesWriting.zioJson                   128  thrpt    5  126197.585 ±  863.407  ops/s

After openjdk/jdk#3402

[info] Benchmark                                    (size)   Mode  Cnt       Score       Error  Units
[info] ArrayOfDoublesWriting.avSystemGenCodec          128  thrpt    5  134274.744 ±  1186.555  ops/s
[info] ArrayOfDoublesWriting.borer                     128  thrpt    5  135609.524 ±  3682.101  ops/s
[info] ArrayOfDoublesWriting.circe                     128  thrpt    5  140578.748 ±   599.352  ops/s
[info] ArrayOfDoublesWriting.circeJsoniter             128  thrpt    5  218366.982 ±  2461.686  ops/s
[info] ArrayOfDoublesWriting.dslJsonScala              128  thrpt    5  101039.519 ±  2052.939  ops/s
[info] ArrayOfDoublesWriting.jacksonScala              128  thrpt    5  151821.728 ±  1679.314  ops/s
[info] ArrayOfDoublesWriting.jsoniterScala             128  thrpt    5  287645.795 ±  8382.984  ops/s
[info] ArrayOfDoublesWriting.jsoniterScalaPrealloc     128  thrpt    5  308345.371 ± 11930.089  ops/s
[info] ArrayOfDoublesWriting.ninnyJson                 128  thrpt    5  215322.691 ±  2858.767  ops/s
[info] ArrayOfDoublesWriting.playJson                  128  thrpt    5   23084.133 ±   499.170  ops/s
[info] ArrayOfDoublesWriting.playJsonJsoniter          128  thrpt    5   68504.541 ±  1246.185  ops/s
[info] ArrayOfDoublesWriting.smithy4sJson              128  thrpt    5  239066.484 ±  3021.682  ops/s
[info] ArrayOfDoublesWriting.sprayJson                 128  thrpt    5   55431.626 ±   999.962  ops/s
[info] ArrayOfDoublesWriting.uPickle                   128  thrpt    5  141348.749 ±  1193.024  ops/s
[info] ArrayOfDoublesWriting.weePickle                 128  thrpt    5  149880.063 ±  2034.722  ops/s
[info] ArrayOfDoublesWriting.zioJson                   128  thrpt    5  126290.331 ±  1874.128  ops/s

@cowtowncoder
Copy link
Member Author

Thank you for sharing @plokhotnyuk. Good to know JDK is improving as well.

@cowtowncoder cowtowncoder changed the title Consider number-decoding improvements from jsoniter or Lemire's "fast double parser" (esp. for double/float, BigInteger) Improve parsing of floating-point numbers Jun 22, 2022
@cowtowncoder cowtowncoder added 2.14 Issue planned (at earliest) for 2.14 and removed good first issue Issue that seems easy to resolve and is likely a good candidate for contributors new to project labels Jun 22, 2022
@cowtowncoder cowtowncoder added this to the 2.14.0 milestone Jun 22, 2022
@cowtowncoder
Copy link
Member Author

Ok. Thanks to @pjfanning we now have improved floating-point parsing functionality; merges are scattered (alas) through a few merges, purpose of which was to make it easier to merge to 3.0.

PRs include #747 and #766 for anyone interested in details.

@cowtowncoder cowtowncoder changed the title Improve parsing of floating-point numbers Improve performant of floating-point number parsing Jun 22, 2022
@cowtowncoder cowtowncoder changed the title Improve performant of floating-point number parsing Improve performance of floating-point number parsing Jun 22, 2022
@cowtowncoder cowtowncoder changed the title Improve performance of floating-point number parsing Improve performance of floating-point number parsing (Schubfach) Jun 24, 2022
@cowtowncoder cowtowncoder changed the title Improve performance of floating-point number parsing (Schubfach) Allow use of faster floating-point number parsing (Schubfach) with StreamReadFeature.USE_FAST_DOUBLE_PARSER Jul 28, 2022
pjfanning added a commit to pjfanning/jackson-core that referenced this issue Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.14 Issue planned (at earliest) for 2.14 performance Issue related to performance problems or enhancements
Projects
None yet
Development

No branches or pull requests

8 participants