nullness borderline cases - jspecify/jspecify GitHub Wiki
In base Java, especially before generic types, the trade-offs of choosing one type over another were quite stark. If your method needs to accept values up to 2^31 - 1
, it can accept int
, but if it needs to accept 2^31
as well, you're stuck using long
. Java erased types are absolutely strict.
With generic types, it gets a bit more fuzzy. We can make judgment calls, and suppress warnings when the compiler isn't sure we're safe. For example, to implement Collections.<T>emptySet()
, it can return the same Set<Object>
instance every time, even though the types aren't actually assignable.
In our case, most of the time it will be easy to decide what nullness a type usage should have, just by considering whether it's good or bad for null
to "pass through" that point. But there are many kinds of "borderline" cases where the answer isn't clear. If the tools were as strict as Java is with erased types, we'd have no choice: we'd simply have to choose nullable for every borderline case. But instead, we have the same flexibility we just described for generic types, so we're empowered to make the decision that works out best for users.
Here we have notes about how to handle situations where it is tough to tell which of @Nullable
or @NonNull
applies.
A java.lang.reflect.Method
instance might represent a static method or an instance method. Only in the first case is null
an acceptable argument! Yet we have to annotate the method one way for all.
Interestingly, though, the invoke
method never has to accept null
. In the static case, it promises to ignore the argument, and so it became conventional to pass null
. But code can just as easily pass a dummy value like "(ignored)"
. Given that, we can catch more bugs by leaving the parameter non-nullable.
(TODO: a better example of the general issue, that doesn't have this get-out-of-jail card.)
See PolyNull.
A simplified look at AtomicReference
's API:
class AtomicReference<V extends @Nullable Object> {
AtomicReference(V initialValue) {...}
V get() {...}
void set(V newValue) {...}
}
This says that we can:
- Create an
AtomicReference<String>
, where we'll have to pass a non-null string, and can only set non-null strings thereafter, and therefore values returned byget
will be non-null too - Create an
AtomicReference<@Nullable String>
, where we can pass null, and set null, but of course values returned byget
will be nullable too
So far... this sounds perfect! Both ways of using the class make perfect sense.
Unfortunately, there's a second constructor, for creating an uninitialized atomic reference:
class AtomicReference<V extends @Nullable Object> {
AtomicReference(V initialValue) {...}
AtomicReference() {...} // the value remains `null`!
V get() {...}
void set(V newValue) {...}
}
Ideally we'd like this no-arg AtomicReference
constructor to refuse to create atomic references with a non-null type argument: new AtomicReference<@Nullable String>()
should work, but new AtomicReference<String>()
should not. But Java generics don't work this way, and JSpecify takes after Java generics.
That means that even for an AtomicReference<String>
, get
could still return null
. Just avoiding the no-arg constructor makes you technically safe, but how would the analyzer know that? It seems we're forced to have get
return @Nullable V
. But then that erases most of the benefits of using an AtomicReference</*non-null*/ String>
in the first place.
By including this constructor, AtomicReference
has made itself vulnerable to the very same create-initialize gap that plagues object fields and array components. Like those cases, a bug results if, before the value is initialized, either the value is read or the instance escapes the context where it was created.
The logical way to avoid bugs here is to use initialization analysis -- whatever catches this bug for fields and arrays should be equally able to include this case as well. It doesn't make sense to saddle the nullness analysis with the responsibility for catching such bugs, because, as just discussed, it is incapable of doing so well.
Therefore, the original code should be considered acceptable as it was. Of course, using initialization analysis (that recognizes this case) is recommended!
Sometimes called the Map.get
case.
TODO; for now see issue 71.
This is a method that:
- definitely throws upon
null
input - but throws the same kind of exception as for other invalid inputs
Integer.parseInt(String)
is a good example: it throws NumberFormatException
whether the input is null
or "xyz"
.
Now, JSpecify specifications have nothing to do with exception types. Since this method has no desire to receive null
, the "correct" nullness for this parameter is non-null.
But annotating "correctly" doesn't always produce the best outcomes for users. It helps to look at the real-world consequences on users. This generally means balancing the overall harm from false positives against the overall harm from false negatives.
A parsing method has callers of two kinds: those with trusted input, and those with untrusted input. "Trusted" means the caller is sufficiently confident it will parse correctly. Logically that includes trusting it to be non-null. (Yes, the trust could be misplaced, and the code could propagate an unexpected unchecked exception, but this is no different from other cases of "programmer error".)
Setting aside nullness annotations/analysis, processing untrusted input looks like this:
try {
return Integer.parseInt(input);
} catch (NumberFormatException e) {
return -1; // or whatever other handling (that isn't just rethrowing `e`)
}
... whereas processing trusted input looks like this:
return Integer.parseInt(input);
Now what about nullness analysis? Assume that the parsing method is now @NullMarked
. On the caller side, assume the analyzer considers the input
expression to be nullable (because if non-null, the code above will both compile and run just fine).
Now we have two choices:
In both the trusted and untrusted cases above, the line return Integer.parseInt(input)
must change to either
Integer.parseInt(requireNonNull(input));
... or, to preserve the former behavior more precisely and avoid an unnecessary check:
@SuppressWarnings("nullness")
int result = Integer.parseInt(input);
return result;
Neither of which even changes anything appreciably at runtime, so the finding was a false positive.
Both original code snippets can continue to be used as-is. There is no finding and no false positive. What about false negatives? Well, the untrusted case will be handled just fine at runtime. But since input
is a nullable expression, there's a good chance it does have some risk of being null, and we missed out on giving a signal to the "trusting" programmer that they might be a little too trusting in this case.
Considering the incidence and severity of all the false positives from the non-null parameter, and the incidence and severity of the false negatives from the nullable parameter, which seems more damaging overall? Answering that question should be a useful guide to which way you should annotate the method.
You own a library API. A particular type usage in that API's member signatures includes null only in exceptional scenarios, that your clients should never find themselves in according to the API's broader contract. (Whether that contract is enforced or even enforceable.)
In these cases, “something has already gone wrong” (it might not be this client’s fault, but it doesn’t matter). Notice this is not the “Map.get case”.
Now if this were an analogous question of choosing a root type, there is no freedom; javac forces us to use the more general type. But our case is much more like parameterized types; we as well can suppress warnings, so we do have the flexibility to decide between the lesser of two evils:
- The type can be technically over-specific (non-null).
- The type can be over-general in practice (nullable).
The consequences of each option depends on how the type is being used:
- A provided type (where values flow out from a library to its clients)
- An accepted type (where values flow into a library from its clients)
- (I’ll disregard “inout” types here, as they should be rare and they have much bigger problems.)
TODO: a good example of each case
Should you make this provided type:
- "technically over-specific"? (non-null)
- General consequence: in the errant case, can cause null pollution in the client. An immediate defensive null check would catch it, but nothing would prompt the user to do that.
- Consequence to Kotlin client: in the errant case, at least that runtime null check will happen, so we “fail faster” and more consistently.
- "over-general in practice"? (nullable)
- General consequence: well-behaved clients suffer “type dullening” (like happens when calling a polynull method without polynull support). To reassert non-nullness they must suppress a finding, or make an unnecessary-feeling runtime check.
- Consequence to Kotlin client: same, but that first remedy is impossible; happily, the second remedy is super easy (
!!
).
Should you make this accepted type:
- "technically over-specific"? (non-null)
- General consequence: in the errant case, the client would have to suppress a finding in order to pass
null
to the library. - Consequence to Kotlin client: that's impossible, so there is simply no way to do this, short of altering the library jar!
- General consequence: in the errant case, the client would have to suppress a finding in order to pass
- "over-general in practice"? (nullable)
- General consequence: well-behaved clients may accidentally send
null
, which we can only hope is rebuffed at runtime. - Consequence to Kotlin client: same
- General consequence: well-behaved clients may accidentally send
Again, we can't pronounce one answer as always better than the other. But hopefully understanding these consequences will help you make your decision.