Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve static factory method generic type resolution logic #2895

Conversation

carterkozak
Copy link
Contributor

This issue impacts deserialization with generic types using
the immutables library. Immutables
transforms from Json specific implementations to the core immutable
implementation using a static @JsonCreator method.

In cases where the input delegate deserialized object shares a
common generic type with the return type, we may be able to share
some subset of the binding context.

@carterkozak
Copy link
Contributor Author

For context, see immutables/immutables#1228

@carterkozak
Copy link
Contributor Author

This regression appears to have been introduced by 1014493

@cowtowncoder
Copy link
Member

Unfortunately use case here is invalid: type parameters of static methods do NOT related to type parameters of the class, even if name happens to be same. So T here, in factory method, is not bound to anything, as per Java language definition.
Unfortunately older Jackson versions may have incorrectly exposed such imaginary binding.

This may be a huge issue for many libraries but I do not see how incorrect behavior should be resurrected.

@cowtowncoder
Copy link
Member

As per my notes on #2894 I am bit torn here. On one hand there are no tests suggesting it as valid use case, and the fact that Java language does not really suggest any relationship (static members are not part of the class, and hence neither can use type bindings)... but on the other hand, usage of this non-feature may be already widespread enough that fixing this at this point would have nasty consequences.

Also, although it is bit incompatible with Java type handling there may not be actual harm in doing that, at least if there is strict limitation that only single "vanilla" Type variable (with no bounds) would behave this way.
This because these (otherwise) local declarations may have other purpose too.

Challenge here would be to detect specific case(s) where such extra, Jackson-specific binding should be materialized.

And to do it in a way that does not break something else.

@carterkozak
Copy link
Contributor Author

And to do it in a way that does not break something else.

That's always the trick :-)

Based on some brief analysis, I do think it's possible to solve this problem, however I'm not familiar enough with the implementation to be confident that I can implement it reasonably well. I'll spend some time tomorrow attempting a proof of concept for the simple case.

If our requested type is Foo<A, B> which provides a static JsonCreator: @JsonCreator static <C, D> Foo<C, D> fromJson(Bar<C, D> value), I believe we can bind types A and B to Bar for deserialization. There are a lot of edge cases, but we should be able to detect those and fall back to Empty.

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 20, 2020

@carterkozak Ok, just to be very cafeful here: case of

@JsonCreator static <C, D> Foo<C, D> fromJson(Bar<C, D> value)

declares local unbounded type parameters C and D, which are essentially equivalent to

@JsonCreator static  fromJson(Bar<?, ?> value)

and I would not try to do anything fancy here. It would not support anything more advanced than
essentially "untyped" case

The problem case, I think, which could be supported, is this:

class Bean<T>  {
   @JsonCreator static <T> Bean<T>  fromJson(OtherType<T> value) ...
}

in which from JVM perspective we just have equivalent of

   @JsonCreator static Bean<?>  fromJson(OtherType<?> value) ...

but where we would want to magically bind type bindings from application from somewhere else, say,

class Container {
    public Bean<ContainedValue> value;
}

such that call to factory method would appear as if it was declared as

@JsonCreator static Bean<ContainedValue>  fromJson(OtherType<ContainedValue> value) ...

@carterkozak
Copy link
Contributor Author

Hi @cowtowncoder, I've pushed my proof of concept, it's not in a very clean state at the moment, but I want to collect early feedback before pushing ahead with it.

I've updated the test to make it more complex using multiple type variables and reversing the order in the static function to ensure I'm not relying on implicit details. There are definitely some potential sharp edges, I've tried to fall back to the current implementation if I see anything out of the ordinary.

The @JsonCreator static <C, D> Foo<C, D> fromJson(Bar<C, D> value) case is handled based on the expected result, if we're attempting to deserialize a Foo<String, Integer> we can bind String to C and Integer to D.

What do you think?

@cowtowncoder
Copy link
Member

TBH, I am not sure I am comfortable adding heuristic coupling of a concept that does not exist in Java, with complex implementation that I am not sure I understand. This looks like exact kind of feature I will curse the minute I get bug reports about, unfortunately.

At this point I think what I am interested in is figuring out not so much new cases but specific breaking cases (things that happened to be resolved) from earlier versions. But not trying to extend logic to create new meta-binding from parameterized classes into implied bindings for static methods.

@carterkozak
Copy link
Contributor Author

I agree that we should minimize risk and avoid magic, however I would counter that we've seen different failures across dozens of our internal services upgrading from Jackson 2.11.1 to 2.11.2 and 2.11.3 which all relate to generic type handling. The generic delegate deserialization model has worked for years in a way that seemed reasonable, but happened to be the result of a couple complex bugs working together. While this change absolutely adds maintenance complexity, it reduces complexity in the hundreds of consumers that rely on this functionality in Jackson allowing them to upgrade, while handling generic type bindings in the way they were already assumed to work. Adding specific handling for a subset of cases may prove to be more onerous if we fail to capture all cases in the first attempt, and creates danger for consumers when expectations based on language features aren't met.

I am not sure I am comfortable adding heuristic coupling of a concept that does not exist in Java, with complex implementation that I am not sure I understand

The concept isn't necessarily checked at runtime, however it exists and is validated at compile time. We're relying on the same data and functionality that allows List<String> values = Arrays.asList("a", "b"); to compile while List<String> str = Arrays.asList(1, 2); does not.

The current implementation I've provided is terrible code in its current state. It needs comments, better variable names, and to be broken into several separate methods. It cannot be merged as is, it exists as a proof of concept at this point, but I'd be happy to clean it up to make it easier to follow :-)

We've been forced to pin jackson libraries to 2.11.1 for the time being as we figure out a path forward, and I'm afraid of becoming unable to take security patches and critical bugfixes. I hope I haven't come across as overbearing in this comment, my intent is to provide context based on failures I've encountered internally so we can work together to produce a robust, supportable fix.

I really appreciate your work on Jackson and I can't thank you enough for supporting it! :-)

@carterkozak
Copy link
Contributor Author

I've cleaned up the implementation with more documentation and examples to make it easier to follow. The MethodGenericTypeResolverTest test cases provide a clear demonstration of the functionality.
I'd be happy to discuss the details at your convenience, let me know if there are additional test cases or documentation you'd like to see.

I've verified the change on a few internal projects and it resolves the problems we've encountered.

Happy Friday :-)

@carterkozak carterkozak changed the title Test demonstrating the generic type deserialization regression in 2.11.3 Fix static JsonCreator generic type deserialization regression in 2.11.3 Oct 26, 2020
@pkoenig10
Copy link

I just want to echo @carterkozak's sentiment here. The recent changes to generic handling have caused a handful of issues when upgrading to 2.11.2 and 2.11.3 and have made us wary of taking further Jackson upgrades, since we often don't discover these issues until after we've upgraded a number of our internal projects.

I also sympathize with the desire to avoid complex features that are prohibitively hard to understand and maintain. But for a foundational library such as Jackson, there's significant value in avoiding backwards incompatible behavior breaks to avoid eroding consumer confidence in taking Jackson upgrades, if nothing else.

I truly do appreciate all the work you've put into maintaining Jackson and hope that we can agree on a reasonable and satisfactory solution to this issue. I'm happy to help in any way I can.

@cowtowncoder
Copy link
Member

At this point I would like to go with a minimal change that fixes observed cases where earlier versions worked (with implicit bindings for static factory methods). So most useful would be (now failing) cases that exhibit previously passing cases, but do not extend functionality.

I also want to minimize implicit bindings that are not based on JDK's type parameter resolution -- which rules out some of compiler-side things that may lead users to assume existence of bindings which are only there when compiled from sources.

@pkoenig10 do you think you could try to come up with a minimized example of breakage? Ideally I'd get multiple examples just to make sure fix is general enough.

@cowtowncoder
Copy link
Member

And just to expand a little bit on my first comment: I think I know part that changed and where some fix is needed -- where static methods are given empty type bind context. But I want to be sure that I address the specific existing use case: no existing tests covered those and there are multiple possibilities (there are at least 2 things I can think of, and some of my comments are relevant for one but not the other):

  1. Locally (in static method) define bounds constraints: those should be supported. If I broke something here, those should definitely be fixed -- I now realize that assigning empty type resolution context may be wrong if that means dropping such bounds
  2. Type bindings declared for surrounding class, which do not bind static members. I assumed this was the problem earlier, hence comments. If there is existing code that assumes binding by name, those can be reintroduced for 2.11 (and 2.12), and if necessary some configuration options may be introduced

So, I would like minimal reproduction case(s) that show issues, as starting point. PR here may become useful that way but I am not really comfortable digging into implementation before understanding basics.

@carterkozak
Copy link
Contributor Author

Right, the local bounds case is the change that has impacted all the cases I'm aware of. When old versions passed the existing type resolution context, that was incorrect, however it happened to work most of the time because it's common to reuse type variable names:

class Foo<T> {
  static <T> Foo<T> get(T val) { return new Foo<T>(val); }
}

However it would produce incorrect results if the static method used a different name from the base type:

class Foo<T> {
  static <U> Foo<U> get(U val) { return new Foo<U>(val); }
}

I'm uncomfortable with a long term solution that brings back the failure behavior because it's counter-intuitive and likely to result in runtime failures when variable names change. If type variables don't match the original class, it can result in incorrect types and class cast exceptions. That said, it's better than dropping the bindings entirely.

Type bindings declared for surrounding class, which do not bind static members. I assumed this was the problem earlier, hence comments. If there is existing code that assumes binding by name, those can be reintroduced for 2.11 (and 2.12), and if necessary some configuration options may be introduced

I'm not aware of any uses that relied on type binding based on the encapsulating class. This would not make sense because the type system does not constrain one based on the other.

@cowtowncoder
Copy link
Member

@carterkozak ah, ok. If we are talking about local bounds, that makes sense and my comments were off by quite a bit. Apologies for jumping to conclusions.

And the solution would be to build correct type resolution context, considering local bounds, and not reusing surrounding classes bindings, ideally... unless there are new failures from that as well (I hope not but would want to know).
And not using empty one either.

Going back to the example then:

class Foo<T> {
  static <T> Foo<T> get(T val) { return new Foo<T>(val); }
}

In this case, T would just amount to ?, which approximates to java.lang.Object. Is the new problem then failure to resolve type variable and exception (which would be wrong and should be fixed)?

There are of course then cases where bounds exist, and need to be considered, but I wanted to start with the basics.

@carterkozak
Copy link
Contributor Author

carterkozak commented Oct 27, 2020

If we are talking about local bounds, that makes sense and my comments were off by quite a bit. Apologies for jumping to conclusions.

Not a problem, thanks for working through this with me :-)

In this case, T would just amount to ?, which approximates to java.lang.Object. Is the new problem then failure to resolve type variable and exception (which would be wrong and should be fixed)?

There are two cases:

  1. If we requested a Foo.class via objectMapper.readValue(input, Foo.class), we get a raw Foo equivalent to Foo<?> or Foo<Object> where the value is the default type (String for strings, LinkedHashMap for object, and so on). I don't believe this case is impacted and still produces equivalent data.
  2. If we requested a Foo<Long> via objectMapper.readValue(input, new TypeReference<Foo<Long>>() {}) or as a concretely typed field in another object[note1], for example, for input "1", deserialization would be identical to the first case, and the generic component would be a String. The failure most often manifests as a ClassCastException when reading the generic component e.g. Long value = myFoo.get();

note1: For example:

class Example {
  // Long must be bound to the Foo static JsonCreator type variables,
  // I believe this is equivalent to requesting new TypeReference<Foo<Long>>() {})
  private Foo<Long> value;
}

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 28, 2020

(replaced earlier comment)

Ok. I think I am getting to understand the issue, and if so:

  1. Earlier behavior was due to name aliasing (as per Fix type resolution for static methods (regression in 2.11.3) #2894), so my understanding was not completely incorrect
  2. Suggestion for fix here takes different approach, trying to match Factory creator method's signature (and specifically return type) with declaring type since that is logically reasonable assumption

so that (2) would be a cleaner way to achieve what (1) happened to allow.

So far so good, I think. I will still need to digest implementation details here, and see if it could be used as solution.

In the meantime I am starting to lean towards partial undo of changes for 2.11, since bigger change should go in a new minor version. So this may need to be rebased against 2.12 anyway. But first I'll need to come up with a test, probably just based on #2894 just for 2.11.4 patch.

@carterkozak
Copy link
Contributor Author

I've added this commit with a simple reproducer that passed in 2.11.1 and 2.11.2, but failed in 2.11.3 (without the fix this PR also provides): 892b99a

cowtowncoder added a commit that referenced this pull request Oct 30, 2020
cowtowncoder added a commit that referenced this pull request Oct 30, 2020
@cowtowncoder
Copy link
Member

@carterkozak Ok so here are comments I promised.

First, as per earlier notes, I'd like this to go against 2.12: more tests are fine for 2.11, but hoping to limit changes there to reduce chances of regression.

Now that I think I understand the premise of changes, I agree: given a parameterized type like T, and the fact that factory methods are expected to produce instances of declared type (T) -- implicit contract of factory methods -- it does make sense to derive assignment compatible type, and then use that to create type variable binding(s) for parameter(s) to use. Names need not match and it may well be cleaner for user to select different names.
This patch would fix the dependency from 2.11 and before.
So far so good.

Beyond this, it would be nice if generic type handling could be (mostly?) added in TypeFactory as that is where about all of it currently exists. TypeFactory is also what handles potential "promotion" of types like Scala collections, via type modifiers, something that might occasionally be necessary for types used with factory methods (probably not a big concern tbh).
At the same time if this handling is very specific for use case, and not generally useful, there isn't much point in exposing it via TypeFactory API. I'll try to re-read code again to see if I can think of ways to integrate this within TypeFactory.

Other than that I just want to make sure I also understand the implementation logic, beyond its intent. This will be necessary in possible event someone finds edge cases in handling. It's not about not trusting code but being able to modify it in future, if necessary (of course you'd be the best person to do that but sometimes original code contributors move on and I'll get to handle issues).

So. I think we this change makes sense, conceptually, and I just need to digest implementation aspects.

I hope this makes sense? We can definitely sync up on Gitter if and when it makes sense, too.

@cowtowncoder cowtowncoder added the hacktoberfest-accepted Tag for issues, PRs that are explicitly marked as accepted for Hacktoberfest label Oct 31, 2020
@carterkozak
Copy link
Contributor Author

carterkozak commented Nov 2, 2020

On my todo list to put together tests for 2.11 today. Do you prefer PRs into master followed by back-port, or PRs into the oldest release branch and forward-port?

it would be nice if generic type handling could be (mostly?) added in TypeFactory as that is where about all of it currently exists.

I agree that the TypeFactory sounds like the most reasonable place for this logic, however in my read through it looks like the class primarily handles mapping from Type to JavaType, where JavaType has no concept of type variables. I defer to your judgement on the structure of the code here as you're much more familiar with it than I am!

Other than that I just want to make sure I also understand the implementation logic, beyond its intent. This will be necessary in possible event someone finds edge cases in handling. It's not about not trusting code but being able to modify it in future, if necessary

If there are any pieces that are difficult to reason through, they're likely either incorrect, or I owe you inline documentation. I added test cases for many of the edge cases as well, but I'm sure there's room for more test coverage.

So. I think we this change makes sense, conceptually, and I just need to digest implementation aspects.
I hope this makes sense? We can definitely sync up on Gitter if and when it makes sense, too.

Of course, thanks for taking the time to review this :-)


import static org.assertj.core.api.Assertions.assertThat;

public class MethodGenericTypeResolverTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jackson tests typically extend BaseMapTest.

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a bigger not, but while this could be used in master (since 3.0 requires Java 8), it can't be used for 2.12.
Could just use AtomicReference as it behaves in many ways the same way, as far as Jackson is concerned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public static class StubA {
private final String value;

@JsonCreator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding explicit (mode = JsonCreator.Mode.DELEGATING) here, as this is potentially ambiguous case (esp. as parameter name and field are both value) and could be detected as properties-based.

@cowtowncoder
Copy link
Member

Ok. Had one read over, ready to merge, maybe make some small changes but in general looks good.

Only one remaining thing: this would be needed against 2.12 -- I can merge 2.12 -> master.
Alternatively I can merge to master, cherry-pick, but either way one problem: java.util.Optional is Java 8 and so can not be used for 2.x, not even tests (we'll likely change this in 2.13 or so, but not yet).
I'd suggest just replacing with AtomicReference as it has similar behavior wrt Jackson, although that's up to you.

@carterkozak
Copy link
Contributor Author

I’ll make the changes shortly, thanks!

@carterkozak carterkozak changed the base branch from master to 2.12 November 3, 2020 21:43
@carterkozak carterkozak force-pushed the ckozak/test_reproducing_2_11_3_regression branch from 26699c8 to 2c0ad2d Compare November 3, 2020 22:20
@carterkozak
Copy link
Contributor Author

carterkozak commented Nov 3, 2020

I've squashed the commits and updated this PR into 2.12, incorporating your suggestions.

I had a bit of trouble because the 2.12 branch AnnotatedConstructor.toString uses Constructor.getParameterCount added in java 8. I can open another PR to remove that call if you like.
Edit: I've opened a separate PR #2918 to avoid sidetracking this change.

Previously TypeVariable handling worked in most cases due to
coincidence, in most cases the type variable names used in factory
methods and types use the same name, and order. However the way this
worked resulted in bugs in edge cases, and was relatively fragile
because it wasn't intended in the first place.

Now we attempt to map type variables from the requested type through
the factory function regardless of matching type variable names.
@carterkozak carterkozak force-pushed the ckozak/test_reproducing_2_11_3_regression branch from 2c0ad2d to a32b34b Compare November 3, 2020 22:41
@cowtowncoder cowtowncoder merged commit b619003 into FasterXML:2.12 Nov 7, 2020
@cowtowncoder cowtowncoder added this to the 2.12.0-rc2 milestone Nov 7, 2020
@cowtowncoder cowtowncoder changed the title Fix static JsonCreator generic type deserialization regression in 2.11.3 Improve static factory method generic type resolution logic Nov 7, 2020
cowtowncoder added a commit that referenced this pull request Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Tag for issues, PRs that are explicitly marked as accepted for Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants