Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor scala number deserialization (BigDecimal/BigInt) #572

Closed
pjfanning opened this issue Mar 31, 2022 · 8 comments
Closed

refactor scala number deserialization (BigDecimal/BigInt) #572

pjfanning opened this issue Mar 31, 2022 · 8 comments
Assignees
Labels

Comments

@pjfanning
Copy link
Member

pjfanning commented Mar 31, 2022

  • the java number deserialization is more efficient than the old scala code in jackson-module-scala
@pjfanning pjfanning added the 2.14 label Mar 31, 2022
@pjfanning pjfanning changed the title use jackson-core BigDecimalParser refactor scala number deserialization (BigDecimal/BigInt) Mar 31, 2022
@pjfanning pjfanning self-assigned this Apr 20, 2022
@plokhotnyuk
Copy link

You can get much safe and more efficient parsing of BigDecimal and BigInt values if you adopt the following code from jsoniter-scala:
https://github.com/plokhotnyuk/jsoniter-scala/blob/3b062f77f566d64b68b765ceed3738ad93d475dc/jsoniter-scala-core/native/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/JsonReader.scala#L1647-L1874

@pjfanning
Copy link
Member Author

Thanks @plokhotnyuk - your BigDecimal parser code is worth looking at.

I guess it would be more advantageous to Jackson users to port it to Java and use it in jackson-core. That lib has a BigDecimalParser but it only has the special case that your code has for handling numbers with a lot of digits (more than 500 in jackson-core code).

Is your Scala code based on eobermuhlner/big-math like jackson-core's code is?

@plokhotnyuk
Copy link

Yes, but also adding limits for mantissa length, scale, and math context too avoid vulnerabilities on subsequent (after parsing) operations, like those that was described here.

@pjfanning
Copy link
Member Author

Thanks @plokhotnyuk - v2.14 is due out soon so I don't want to delay it by taking on new items - I'll look at this for v2.15.

There might be scope in v2.14 to DoS cases. Would you be able to provide a couple of example numbers I could test? If you think such values are sensitive (that you don't want to provide ammo for malicious actors), you can DM me at https://www.linkedin.com/in/pj-fanning/

@plokhotnyuk
Copy link

Most of possible BigInt and BigDecimal values are vulnerable. Safe values have small mantissa size, scale, and math context values. Beside steps described in that link above you can use jsoniter-scala project benchmarks to measure speed of parsing depending of those parameters.

Below are results for different mantissa sizes:

[info] Benchmark                         (size)   Mode  Cnt          Score         Error  Units
[info] BigDecimalReading.jacksonScala         1  thrpt    5    8702801.402 ±  115050.742  ops/s
[info] BigDecimalReading.jacksonScala        10  thrpt    5    7503172.943 ±  277137.658  ops/s
[info] BigDecimalReading.jacksonScala       100  thrpt    5    1535525.956 ±   11051.667  ops/s
[info] BigDecimalReading.jacksonScala      1000  thrpt    5      76374.133 ±     360.840  ops/s
[info] BigDecimalReading.jacksonScala     10000  thrpt    5        926.579 ±      22.904  ops/s
[info] BigDecimalReading.jacksonScala    100000  thrpt    5          9.844 ±       0.059  ops/s
[info] BigDecimalReading.jacksonScala   1000000  thrpt    5          0.098 ±       0.001  ops/s
[info] BigDecimalReading.jsoniterScala        1  thrpt    5  100881269.172 ± 2705794.855  ops/s
[info] BigDecimalReading.jsoniterScala       10  thrpt    5   57269223.425 ± 1551127.833  ops/s
[info] BigDecimalReading.jsoniterScala      100  thrpt    5    8514257.007 ±   30748.373  ops/s
[info] BigDecimalReading.jsoniterScala     1000  thrpt    5     365600.944 ±    3715.462  ops/s
[info] BigDecimalReading.jsoniterScala    10000  thrpt    5       7343.474 ±      39.253  ops/s
[info] BigDecimalReading.jsoniterScala   100000  thrpt    5        163.458 ±       0.883  ops/s
[info] BigDecimalReading.jsoniterScala  1000000  thrpt    5          4.751 ±       0.010  ops/s

For getting them the following config for macros was used:
https://github.com/plokhotnyuk/jsoniter-scala/blob/1ce46b241aa2f30f19e567521d26f79726f4469a/jsoniter-scala-benchmark/shared/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/benchmark/JsoniterScalaCodecs.scala#L33

@pjfanning
Copy link
Member Author

@plokhotnyuk I'm experimenting with porting some of your code to Java. One thing that confuses me is the use of the magnitude param in JsonReader class. I don't quite get why that needs to be leaked outside the toBigDecimal308 function. Is that just a performance optimisation - so that you can reuse the same array over and over?

@plokhotnyuk
Copy link

plokhotnyuk commented Sep 22, 2022

Yes, It is a small reusable array of Int values that will be initialized on demand for faster parsing of numbers with more than 36 digits

@pjfanning
Copy link
Member Author

2.14.0-rc1 is out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants