Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing 19 digit longs #865

Merged
merged 2 commits into from Dec 20, 2022
Merged

Conversation

marschall
Copy link
Contributor

@marschall marschall commented Dec 15, 2022

Parsing 19 digit long values is significantly slower than parsing 18 digit long values as we are hitting a slow path that does a string allocation.

19 digit long values are quite common when a random number generator is used to generate long values.

We have to pay attention that we only use this fast path if the value is a valid long. This can be achieved by calling #inLongRange first.

In the following synthetic microbenchmark a throughput increase of about 33% with a drastic reduction in the allocation rate can be achieved.

@OutputTimeUnit(TimeUnit.SECONDS)
public class JacksonStdLongReadVanilla
{

    private static final JsonFactory JSON_FACTORY = new JsonFactory();

    private static final String JSON;

    static {
        StringBuilder json = new StringBuilder();
        Random random = new Random(0L);
        json.append('[');
        for (int i = 0; i < 1_000; i++) {
            if (i > 0) {
                json.append(',');
            }
            json.append(random.nextLong());
        }
        json.append(']');
        JSON = json.toString();
    }

    @Benchmark
    public void parseLong(Blackhole blackhole) throws IOException {
        try (JsonParser parser = JSON_FACTORY.createParser(JSON)) {
            parser.nextToken();

            JsonToken nextToken = parser.nextToken();
            while (nextToken == JsonToken.VALUE_NUMBER_INT) {
                blackhole.consume(parser.getLongValue());
                nextToken = parser.nextToken();
            }
        }
    }

}

If we have a look at the allocations before we see that it is dominated by byte[] allocations for String allocations

allocation-site-before

Resulting in many gcs

gcs-before

With the changes in this PR the byte[] and String allocations are gone

allocation-site-after

Resulting in no more gcs

gcs-after

Avoid slow path and allocation when parsing 19 digit longs.
@@ -152,6 +152,32 @@ public static long parseLong(char[] ch, int off, int len)
long val = parseInt(ch, off, len1) * L_BILLION;
return val + (long) parseInt(ch, off+len1, 9);
}

/**
* Parses an unsigned long made up of exactly 19 digits.
Copy link
Member

@pjfanning pjfanning Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks wrong - the code seems to handle signed numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the same approach that com.fasterxml.jackson.core.util.TextBuffer#contentsAsInt(boolean) uses, it calls com.fasterxml.jackson.core.io.NumberInput#parseInt(char[], int, int) that parses an optionally singed value unsigned by ignoring the sign and later flips the sign if it had a minus sign.

num = (num * 10) + (c - '0');
}
return negative ? -num : num;
}
Copy link
Member

@pjfanning pjfanning Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide a jmh test project to prove this is faster? you can extend https://github.com/pjfanning/jackson-number-parse-bench if you like - you obviously have the test case, just would be handy to be able to run it independently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is really such a performance drag, shouldn't there be an openjdk issue - or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is really such a performance drag, shouldn't there be an openjdk issue - or something similar?

Outsiders can't create OpenJDK issues. And even if, the discussions alone would probably take years. The time investment is massive with no guaranteed outcome. To give you a concrete example openjdk/jdk#6275.

Even if everything would go smoothly we would still have to wait years until Jackson updates the minimum version to one that has the fixes. Jackson currently has a base of JDK 8 which was released 8 years ago. So we're looking at a best case of 10 years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide a jmh test project to prove this is faster? you can extend https://github.com/pjfanning/jackson-number-parse-bench if you like - you obviously have the test case, just would be handy to be able to run it independently

I saw there is also https://github.com/FasterXML/jackson-benchmarks let me know what you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever suits you. I have a vague recollection that jackson-benchmarks is configured to run the benchmarks for a long time. There was something about the project that made me go off and create my own where I could reconfigure the settings more readily.

@pjfanning
Copy link
Member

Another thing that I'm concerned with is that with a custom long parser, we would need to have test coverage for a wide range of possible values.

The FastDoubleParser that we allow use of is

  • optional (and not the default)
  • FastDoubleParser has good test coverage
  • it is admittedly still quite experimental and users who opt in to using it will need to do so with caution

@marschall
Copy link
Contributor Author

Another thing that I'm concerned with is that with a custom long parser, we would need to have test coverage for a wide range of possible values.

NumberInput is already full of custom number parsing code, I refer you to the various #parseInt methods. There is also the custom BigDecimalParser. But I can extend the tests in NumberParsingTest.

@cowtowncoder
Copy link
Member

Although jackson-benchmarks is useful for comparing performance of various Jackson backends over relatively simple POJO type, it's not as good for micro-benchmarks for aspects like this. So I think I'd go with https://github.com/pjfanning/jackson-number-parse-bench.

@pjfanning
Copy link
Member

Could NumberInput.parseLong be changed to call parseLong19 where appropriate - it already uses parseInt for small nums. Would it be simpler just to have parseLong parse all numbers the new way (if the performance was not worse)?

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 20, 2022

Could NumberInput.parseLong be changed to call parseLong19 where appropriate - it already uses parseInt for small nums. Would it be simpler just to have parseLong parse all numbers the new way (if the performance was not worse)?

Right now method is quite particular expecting length of 10 - 18 characters and there'd be one more check (and branch).
Not sure it would affect optimizer that much; I don't have strong opinion where to modify it and add checks or keep separate method. Separate method would do well to follow the same pattern tho; decoding 2 ints, using int multiplications is faster than doing 19 long multiplications. I remember benchmarking that difference years ago and it was non-trivial.

@marschall
Copy link
Contributor Author

marschall commented Dec 20, 2022

Although jackson-benchmarks is useful for comparing performance of various Jackson backends over relatively simple POJO type, it's not as good for micro-benchmarks for aspects like this. So I think I'd go with https://github.com/pjfanning/jackson-number-parse-bench.

There you go pjfanning/jackson-number-parse-bench#2

I had to change build.gradle

repositories {
    mavenLocal()
}

jmh {
    includes = ['org.example.jackson.bench.LongParserBench']
    profilers = ['gc'] // or whatever you like best like jfr or async
}

On a Ryzen 7 PRO 5750GE and JDK 17.0.5 we go from

Benchmark                                                    Mode  Cnt      Score      Error   Units
LongParserBench.parseLong                                   thrpt   25  21860.758 ± 1076.882   ops/s
LongParserBench.parseLong:·gc.alloc.rate                    thrpt   25   1139.800 ±   56.144  MB/sec
LongParserBench.parseLong:·gc.alloc.rate.norm               thrpt   25  57410.238 ±    0.043    B/op
LongParserBench.parseLong:·gc.churn.G1_Eden_Space           thrpt   25   1136.527 ±   59.318  MB/sec
LongParserBench.parseLong:·gc.churn.G1_Eden_Space.norm      thrpt   25  57240.652 ±  766.795    B/op
LongParserBench.parseLong:·gc.churn.G1_Survivor_Space       thrpt   25      0.006 ±    0.002  MB/sec
LongParserBench.parseLong:·gc.churn.G1_Survivor_Space.norm  thrpt   25      0.303 ±    0.079    B/op
LongParserBench.parseLong:·gc.count                         thrpt   25    504.000             counts
LongParserBench.parseLong:·gc.time                          thrpt   25    334.000                 ms

to

Benchmark                                                    Mode  Cnt      Score      Error   Units
LongParserBench.parseLong                                   thrpt   25  31230.896 ± 1532.563   ops/s
LongParserBench.parseLong:·gc.alloc.rate                    thrpt   25     14.523 ±    0.713  MB/sec
LongParserBench.parseLong:·gc.alloc.rate.norm               thrpt   25    512.025 ±    0.029    B/op
LongParserBench.parseLong:·gc.churn.G1_Eden_Space           thrpt   25     15.785 ±   19.352  MB/sec
LongParserBench.parseLong:·gc.churn.G1_Eden_Space.norm      thrpt   25    544.799 ±  668.200    B/op
LongParserBench.parseLong:·gc.churn.G1_Survivor_Space       thrpt   25     ≈ 10⁻⁴             MB/sec
LongParserBench.parseLong:·gc.churn.G1_Survivor_Space.norm  thrpt   25      0.005 ±    0.011    B/op
LongParserBench.parseLong:·gc.count                         thrpt   25      7.000             counts
LongParserBench.parseLong:·gc.time                          thrpt   25     10.000                 ms

We go from 21860.758 ops/s to 31230.896 ops/s.
We go from an allocation rate of 1139 MB/sec to 14 MB/sec.
We go from 504 gcs taking 334 ms to 7 gc taking 10 ms.

I haven yet looked at JDK 8 results, it's possible the differences are greater there due to the lack of compact strings.

@marschall
Copy link
Contributor Author

Another thing that I'm concerned with is that with a custom long parser, we would need to have test coverage for a wide range of possible values.

NumberInput is already full of custom number parsing code, I refer you to the various #parseInt methods. There is also the custom BigDecimalParser. But I can extend the tests in NumberParsingTest.

NumberParsingTest already covered a lot of cases, I expanded them a bit.

*/
public static long parseLong19(char[] ch, int off, boolean negative)
{
// Note: caller must ensure length is [10, 18]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this comment line which doesn't match this code?

@cowtowncoder
Copy link
Member

Thank you @marschall -- I'll merge this. May do minor tweaks but looks good & can go in 2.15.0.

@cowtowncoder cowtowncoder merged commit 3188b19 into FasterXML:2.15 Dec 20, 2022
@marschall
Copy link
Contributor Author

Could NumberInput.parseLong be changed to call parseLong19 where appropriate - it already uses parseInt for small nums. Would it be simpler just to have parseLong parse all numbers the new way (if the performance was not worse)?

Right now method is quite particular expecting length of 10 - 18 characters and there'd be one more check (and branch). Not sure it would affect optimizer that much; I don't have strong opinion where to modify it and add checks or keep separate method. Separate method would do well to follow the same pattern tho; decoding 2 ints, using int multiplications is faster than doing 19 long multiplications. I remember benchmarking that difference years ago and it was non-trivial.

I had a look at this, it would require a bit more changes. Currently #parseInt only handles ints with length 9. ints with length 10 are parsed as long and then converted to int. Therefore we can't call #parseInt twice to parse long of length 19, only long of length 18.

If we we wanted to change this we would have to:

  • Introduce NumberInput#inIntRange
  • Change ParserBase#_parseIntValue() and ParserBase#_parseNumericValue(int) to call NumberInput#inIntRange for length 10 and based on this either call TextBuffer#contentsAsInt(boolean) or TextBuffer#contentsAsLong(boolean).
  • Change #parseInt to handle a length of up to 10

@cowtowncoder
Copy link
Member

I think we are ok for now; if anyone really wants to optimize things further, a new PR would be fine.
There's fine line between handling 10/19 lengths optimally vs handling shorter ones -- for most use cases I suspect it is more important to have close(r) to optimal parsing for shorter numbers than longer. That is: adding more checks (-> more branching in code) for different cases slows down existing cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants