Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default for TokenStreamFactory.Feature.INTERN_FIELD_NAMES to false #378

Closed
cowtowncoder opened this issue May 20, 2017 · 4 comments
Closed
Labels
Milestone

Comments

@cowtowncoder
Copy link
Member

cowtowncoder commented May 20, 2017

(note: 3.0 will split JsonFactory into TokenStreamFactory base for all codecs, and JsonFactory just or json -- features to be part of former)

Since there are many users concerned about downsides of interning JSON property field names (and similarly for other dataformats, as per #332, it may make sense to consider changing default setting for INTERN_FIELD_NAMES feature for Jackson 3.x.

As a background, the main measurable benefit of using String.intern() on field names (and ONLY once usually, first time a new symbol is encountered -- it is not an on-going cost) is that matching of JSON property name to Java logical property can usually be done with simple identity check.
This speeds up data-binding by ~10% for many cases (see jackson-benchmarks).

But doing this will not usually have benefits if:

  1. Databinding is not used (unless caller makes some use of identity comparisons)
  2. If number of symbols used is big (due to overhead by JVM for managing big number -- hundreds of thousands or millions -- of Strings off-heap)

To me it seems reasonable that for use case of ObjectMapper implicitly creating a JsonFactory (or similar factory for other formats) default should remain the same. But I have no objection to using different default to "direct" usage.
There are multiple ways to achieve this: one possibility would just be to make ObjectMapper explicitly set the default if it directly creates the factory; but not change setting if it is passed one.
And then JsonFactory base class can just default to intern()ing being disabled by default.

Other than this I am also open to possibility of changing defaults for databinding use case, if all major framework developers choose explicitly disabled settings as their baseline.
(authors/contributors of major frameworks like Spring Boot, Dropwizard, Restlet etc etc please send your notes to mailing list, or directly to developers -- I am not looking forward to "voting" here; but I will add condensed notes here for any input we receive).

@cowtowncoder
Copy link
Member Author

I have worked on a performance optimization for field name handling, to try to avoid dual-lookup (first to map from byte sequence to String, then from String to property object), and an interesting side benefit here is that this should remove benefit of String.intern() at streaming level.
(it may or may not make sense within jackson-databind, but set of symbols there is bounded).

Assuming this development (addition of new variant of nextFieldName(), FieldNameMatcher abstraction) succeeds as it should (I have implemented it for majority of format modules) I will change the default setting for 3.0 so that interning is disabled by default for JsonFactory (and other format parser/generator factories).

So in the end I think there is a solution that works nicely for all use cases.

@cowtowncoder cowtowncoder changed the title Consider using different default setting for JsonFactory.Feature.INTERN_FIELD_NAMES for different usage Change default for JsonFactory.Feature.INTERN_FIELD_NAMES to false for Jackson 3.x Nov 28, 2017
@IRus
Copy link

IRus commented Nov 29, 2017

You can write own string pool to implement what String.intern() do. So than you may benefit from identity check, and doesn't harm overall JVM performance.

@cowtowncoder
Copy link
Member Author

@IRus Jackson has (and has had) its canonicalizer(s) since first versions. This mostly helps in avoiding allocation overhead, but can not quite be used for identity checks for couple of reasons (separate one for byte- and char-backed sources, out of necessity; databind has no access; these are per JsonFactory, not global to avoid classloader problems and overflow).
It is true that technically it would be possible to implement global singleton (over per-factory one), but that is fortunately not necessary after changes. Same benefits (for databind at least) can be achieved by sort of per-nameset canonicalizer passed by databind (via deserializers) and used by parser.

plokhotnyuk added a commit to plokhotnyuk/jsoniter-scala that referenced this issue Nov 30, 2017
@cowtowncoder cowtowncoder changed the title Change default for JsonFactory.Feature.INTERN_FIELD_NAMES to false for Jackson 3.x Change default for TokenStreamFactory.Feature.INTERN_FIELD_NAMES to false Dec 19, 2017
@cowtowncoder cowtowncoder added this to the 3.0.0 milestone Dec 19, 2017
@cowtowncoder
Copy link
Member Author

After updating all codecs to implement nextFieldName(FieldNameMatcher), corresponding databind changes, intern()ing is now disabled by default.

And with other changes databind is typically 10-20% faster, despite not being able to use identity checks. So this ended up being win/win situation after all.

Thank you everyone who helped convince me this was worth the effort. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants