Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonStringEncoder should take CharSequence parameters #265

Closed
mikaelstaldal opened this issue Mar 22, 2016 · 22 comments
Closed

JsonStringEncoder should take CharSequence parameters #265

mikaelstaldal opened this issue Mar 22, 2016 · 22 comments
Milestone

Comments

@mikaelstaldal
Copy link
Contributor

It would be useful if the public methods in com.fasterxml.jackson.core.io.JsonStringEncoder declare its input parameter as CharSequence rather than String.

To me, it seems to be trivial to fix since the code only uses methods on the input parameter which are in CharSequence (length and charAt).

@cowtowncoder
Copy link
Member

I can have a look at this. The main concern would be backwards compatibility; change of type here would be gnarly binary-incompatibility (esp. since it's source compatible), but addition of an alternate method could work.

But beyond this, performance via interface is probably inferior, so the first thing encoder would most likely do would be to use toString(). At which point caller could already do that...

If I find time I can do a micro-benchmark to see if my guess is correct wrt speed of charAt().

@mikaelstaldal
Copy link
Contributor Author

This should only be done if it can be done without using toString() on the CharSequence, otherwise there would be no point. This point of this is performance, to avoid superfluous copying of the string.

I am quite sure it can be done without using toString() on the CharSequence.

@cowtowncoder
Copy link
Member

@mikaelstaldal Functionally speaking it can certainly be done, but the point is to keep performance high. Changing access from String.charAt() to go via CharSequence interface may impose significant overhead of its own unless JIT compiler can speculative inline accessors for call-points.

Jackson core itself has no use for sources other than char[] or String, so I assume this functionality would be for external usage.

Beyond performance test showing there is no measurable difference, what would be needed would be new method(s) to avoid backwards-compatibility breakage, and leaving existing method in place.

@mikaelstaldal
Copy link
Contributor Author

Yes, I request this for external use. Specifically to be able to pass in a StringBuilder without having to do toString() on it (which would copy its content). The goal is both to avoid superfluous copying of the string, and to avoid allocating a new object (and thus put pressure on the GC).

@cowtowncoder
Copy link
Member

@mikaelstaldal Makes sense. Actually I think that despite duplication of code that is unfortunate in general, perhaps just cut'n pasting method with new name, more generic type, would be the way to go here? I'd merge such a PR.

@mikaelstaldal
Copy link
Contributor Author

Does the method need to have a new name? Can't we rely on overloading?

@cowtowncoder
Copy link
Member

@mikaelstaldal Unfortunately no: this is changing method signature in bytecode, and then any code compiled against older version will fail on class loading phase. This is unfortunate as compiler would not have problem with it. At the same time, it goes around any perf implications as well.

So: source compatible, but binary incompatible version could break any old code that used the method. While it's not necessarily while used outside of jackson-core itself I'm sure some code does use it.

@mikaelstaldal
Copy link
Contributor Author

OK, updated my PR.

@cowtowncoder cowtowncoder added this to the 2.8.0 milestone Mar 30, 2016
cowtowncoder added a commit that referenced this issue Mar 30, 2016
@cowtowncoder
Copy link
Member

Fixed as per merged #266

@mikaelstaldal
Copy link
Contributor Author

Nice.

However, there is one problem left. Both the new and the old method will inevitably allocate objects due to the way TextBuffer works. What is the reason for using TextBuffer rather than StringBuilder?

@cowtowncoder
Copy link
Member

@mikaelstaldal Multiple reasons, from allocation strategy (StringBuilder always does double-up resizing, which is costly for longer content), to reuse (TextBuffer can recycle its underlying buffering using ThreadLocal) and more efficient building (direct sets on char[] instead of append calls).
In theory TextBuffer does not have to allocate a String either; it is possible to access content using different mechanism (like direct write from its buffer).

So it is in many ways just a customized StringBuilder crafted for specific uses Jackson has (althought its heritage is from Woodstox XML processor).

@mikaelstaldal
Copy link
Contributor Author

Would it be possible to avoid object allocations in each call to JsonStringEncoder.quoteAsString?

@cowtowncoder
Copy link
Member

@mikaelstaldal as far as I recall that should only really allocate String, shouldn't it? Encoder itself should be reused.

@mikaelstaldal
Copy link
Contributor Author

It does an allocation here:

    at com.fasterxml.jackson.core.util.TextBuffer.carr(TextBuffer.java:739)
    at com.fasterxml.jackson.core.util.TextBuffer.resultArray(TextBuffer.java:726)
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsArray(TextBuffer.java:367)
    at com.fasterxml.jackson.core.io.JsonStringEncoder.quoteCharSequenceAsString(JsonStringEncoder.java:211)

@cowtowncoder
Copy link
Member

@mikaelstaldal Yes, if you ask for contents as an exact size buffer you can do that instead of creating a String (which would also allocate an underlying char[] as well as String object).
StringBuilder also makes a copy of the underlying buffer (earlier JDKs did transfer original buffer, but newer ones not); but in case of TextBuffer, reuse, the aggregation buffer (assuming all content fits in one) does get reused, so there is one and only one buffer allocation per encoded String, not two.

So I am not quite sure how else to reduce allocations. The only remaining additional thing would be to add a method that would copy contents to caller-provided char[]. That could be a useful addition, and quite easy to add.

@mikaelstaldal
Copy link
Contributor Author

I found out an elegant way to do it: #268

@mikaelstaldal
Copy link
Contributor Author

I tried 2.8.0-SNAPSHOT, and it works fine for me.

@mikaelstaldal
Copy link
Contributor Author

When do you plan to release the next version of Jackson (with this included)?

@cowtowncoder
Copy link
Member

@mikaelstaldal that will take a while, probably couple of months. First release candidates might go out in late May or early June, which would follow loose "2 minor releases per year" schedule. In the meantime perhaps you can cut'n paste class in your source code, and remove when 2.8 is out?

@mikaelstaldal
Copy link
Contributor Author

Yes I could do that, given that there are no copyright issues. The project I am working on is Apache Log4j. If I understand it correctly, Jackson uses the same license as we do (Apache 2.0)?

@mikaelstaldal
Copy link
Contributor Author

I could put in a comment saying:

This method is borrowed from [Jackson](https://github.com/FasterXML/jackson-core).

@cowtowncoder
Copy link
Member

@mikaelstaldal correct, Jackson is Apache License 2.0, and that comment would work just fine as attribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants