Prevent inefficient internal conversion from `BigDecimal` to `BigInteger` wrt ultra-large scale #968

cowtowncoder · 2023-04-01T19:55:31Z

(note: somewhat related to #967)

Although we have reasonable protections against direct parsing/decoding of both BigDecimal (as of 2.15 release candidates), regarding "too long" numbers (by textual representation), it appears there may be one performance problem that only occurs if:

Incoming number is large JSON floating-point number, using scientific notation (i.e. not long textually); decoded internally as BigDecimal (or double, depending)
Due to target type being BigInteger, there is coercion (BigDecimal.toBigInteger())

but if so, performance can deteriorate significantly.
If this turns out to be true, we may need to limit magnitude (scale) of floating-point numbers that are legal to convert; this could be configurable limit (either new value in StreamReadConstraints, or derivative of max number length?) or, possibly just hard-coded value.

The text was updated successfully, but these errors were encountered:

cowtowncoder · 2023-04-01T19:55:52Z

/cc @pjfanning 2/2 of issues discussed separately.

pjfanning · 2023-04-01T21:39:29Z

It might be useful to clone the BigDecimalParser code but have the methods use BigInteger instead. This would avoid creating a BigDecimal and converting that to a BigInteger. Small duplication of code but it should be more performant.

cowtowncoder · 2023-04-04T00:03:24Z

@pjfanning Not sure it'd work since input uses engineering notation... is that legal for BigInteger?

pjfanning · 2023-04-04T18:16:24Z

The FastDoubleParser lib won't accept '1e20000000'. Do we need to support this value for BigInteger or do we need to ask the maintainer of FastDoubleParser lib to support this as a valid BigInteger?

new BigInteger("1e20000000") also fails.

Are we better off to modify jackson-core to fail if an Integer has 'e' notation?

cowtowncoder · 2023-04-04T21:51:21Z

@pjfanning It's little bit different than that: if e notation is used, we will always get JsonToken.VALUE_NUMBER_FLOAT, not VALUE_NUMBER_INT. So we do not really (try to) parse BigInteger from E notation ever; it will go via BigDecimal. And I don't think we want to try to change this because it then gets into complications of variations (whether engineering value is integral or not).

But it seems to me that since the problem is conversion of BigDecimal into BigInteger we could impose limit on maximum scale -- from little I tried, it seemed that that's the key.
Whether to make maximum scale magnitude (I am assuming both 20000000 and -20000000 are problematic although haven't tested) configurable or just hard-coded is a question.

cowtowncoder · 2023-04-04T23:03:44Z

One interesting note: I can only reproduce this with 2.15 -- 2.14 and 3.0 fail with different error; probably value overflow (I think given value exceeds Double range). That's not a real solution of course but fwtw specific performance problem is N/A for pre-2.15 I think.

pjfanning · 2023-04-04T23:09:22Z

Would it make sense to add a StreamReadConstraints setting for max absolute BigInt exponent? We can add a sensible limit but lets people, who know the risks and need to support big exponents, go ahead and change the config to suit themselves.

cowtowncoder · 2023-04-04T23:11:12Z

@pjfanning That sounds reasonable. One question I have is whether it should only add to this conversion (BigDec->BigInt) or BigDec in general. It seems like it's not necessarily dangerous for general BigDecimal.

And the other angle is that with scale of 1000 you get numeric string of ~1000 characters so in a way we could actually simply use existing value maxNumberLength() for conversion case: especially since we do not allow engineering notation for integer anyway?

pjfanning · 2023-04-04T23:17:46Z

I would suggest just adding it for (big) ints.

cowtowncoder · 2023-04-05T00:08:48Z

@pjfanning Makes sense. But I am still wondering if a new limit is even needed. Given that this is sort of an edge case (from floating-point number to integer), and since problematic scale magnitude is orders of magnitude bigger than maximum number length... that is,

1e999999

is 1 meg string when written out as "plain" BigInteger, and we by default only allow number strings of 1000 characters, we could consider one of:

Use a limit that is some multiple of maximum-number-length (10x ?)
Use a fixed but higher limit

since users can work around the issue of big scale by using BigDecimal target and handle post-processing on their own, if limit becomes onerous.

It is not that I couldn't add a new limit constant, but there is some maintenance overhead.

Also: I think that validation of scale limit, whatever it is, could be done via StreamReadConstraints, making it bit easier for us to add explicit override if needed.

I guess I can cobble up a PR to show what I am thinking, as PoC.

pjfanning · 2023-04-05T07:33:06Z

@plokhotnyuk - feel free to ignore this but we ran into an edge case where deserialization to BigInteger can be very slow if the input has a large exponent (eg '1e20000000'). jackson-core parses numbers like this as BigDecimals and then uses the .toBigInteger method on BigDecimal because new BigInteger(str) can't parse numbers with e notation. It is the .toBigInteger method on BigDecimal that is very slow.

You have shown great diligence about these problematic inputs in the past. I'm just wondering if you have any thoughts on the best way to handle them.

…erf reasons

cowtowncoder · 2023-04-05T23:23:08Z

Quick note: there's a PR (#980) to block specific issue but it would be good to know if there are other approaches to avoid having to block this
(although TBH allowing this notation could to asymmetric processing on output -- we force non-E ("plain") notation on output so it could easily explode output size until/unless we start limiting output sizes).

cowtowncoder added performance Issue related to performance problems or enhancements 2.15 Issue planned (at earliest) for 2.15 labels Apr 1, 2023

pjfanning mentioned this issue Apr 4, 2023

[DRAFT] try to reduce edge cases where BigInts are parsed as BigDecimals first #971

Closed

cowtowncoder added a commit that referenced this issue Apr 4, 2023

Minor re-factoring wrt #968

aed294f

cowtowncoder added a commit that referenced this issue Apr 4, 2023

Minor re-factoring wrt #968 (#974)

ab65d67

cowtowncoder changed the title ~~Investigate performance problem wrt internal conversion from BigDecimal to BigInteger wrt large exponents~~ Prevent inefficient internal conversion from BigDecimal to BigInteger wrt ultra-large scale Apr 5, 2023

cowtowncoder added this to the 2.15.0-rc3 milestone Apr 5, 2023

cowtowncoder added a commit that referenced this issue Apr 5, 2023

Fix #968: prevent some conversion from BigInteger to BigDecimal for p…

1713faa

…erf reasons

pjfanning mentioned this issue Apr 6, 2023

apply bigdecimal scale check before converting to big int FasterXML/jackson-databind#3863

Closed

cowtowncoder closed this as completed in ccf668b Apr 6, 2023

cowtowncoder added a commit that referenced this issue Apr 6, 2023

Move formerly failing #968 test to non-failing

3bad7c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent inefficient internal conversion from `BigDecimal` to `BigInteger` wrt ultra-large scale #968

Prevent inefficient internal conversion from `BigDecimal` to `BigInteger` wrt ultra-large scale #968

cowtowncoder commented Apr 1, 2023

cowtowncoder commented Apr 1, 2023

pjfanning commented Apr 1, 2023 •

edited

cowtowncoder commented Apr 4, 2023

pjfanning commented Apr 4, 2023 •

edited

cowtowncoder commented Apr 4, 2023

cowtowncoder commented Apr 4, 2023

pjfanning commented Apr 4, 2023

cowtowncoder commented Apr 4, 2023 •

edited

pjfanning commented Apr 4, 2023

cowtowncoder commented Apr 5, 2023

pjfanning commented Apr 5, 2023

cowtowncoder commented Apr 5, 2023

Prevent inefficient internal conversion from BigDecimal to BigInteger wrt ultra-large scale #968

Prevent inefficient internal conversion from BigDecimal to BigInteger wrt ultra-large scale #968

Comments

cowtowncoder commented Apr 1, 2023

cowtowncoder commented Apr 1, 2023

pjfanning commented Apr 1, 2023 • edited

cowtowncoder commented Apr 4, 2023

pjfanning commented Apr 4, 2023 • edited

cowtowncoder commented Apr 4, 2023

cowtowncoder commented Apr 4, 2023

pjfanning commented Apr 4, 2023

cowtowncoder commented Apr 4, 2023 • edited

pjfanning commented Apr 4, 2023

cowtowncoder commented Apr 5, 2023

pjfanning commented Apr 5, 2023

cowtowncoder commented Apr 5, 2023

Prevent inefficient internal conversion from `BigDecimal` to `BigInteger` wrt ultra-large scale #968

Prevent inefficient internal conversion from `BigDecimal` to `BigInteger` wrt ultra-large scale #968

pjfanning commented Apr 1, 2023 •

edited

pjfanning commented Apr 4, 2023 •

edited

cowtowncoder commented Apr 4, 2023 •

edited