local analysis - jspecify/jspecify GitHub Wiki

This page contains a broad discussion of two kinds of static program analysis. It does so in the context of nullness analysis, as that is JSpecify's first project, but the ideas should apply more generally. It is probably not essential reading for anyone.

Whole-program analysis

One way to look for bugs is to apply whole program analysis to an entire application at once. We always have the entire bytecode available, do we not? And every null value had to start out as an aconst_null instruction somewhere (mumble mumble native code, uninitialized fields/arrays), did it not? It would seem, then, to be a matter of tracing everywhere that value could flow until something tries to dereference it (invokevirtual, getfield, etc.).

The idea makes sense, and you can feasibly achieve sound analysis this way: no false negatives, making you safe at runtime. But there are some serious flaws. Suppose a problem is found, and the chain of code connecting the null literal to the dereference is reported. That chain could be pretty long! Someone in that chain is the guilty party, who stuck null where it wasn't expected, but who's to say who that was?

Plus we're highly vulnerable to false positives. It's very common to find a piece of code where null safely won't flow, yet it's infeasible to statically/mechanically prove that it can't flow. Some could be remedied with assert foo != null, but suppose 99 implementations of an interface method did that for their return value, and one didn't!

It would be nice if that interface could declare "returning null is not allowed" (by truly remarkable coincidence, exactly what annotations like those in JSpecify enable!). For one thing, when a problem is found, that would help with narrowing down the "guilty party" in the chain. But also... maybe the owner of that 100th implementation could have been alerted to fix the problem already, by their own toolchain, before whole-program analysis had to find it?

Local analysis

Interestingly, if nullness expectations are being adequately expressed in APIs, we don't necessarily need whole-program analysis. Local analysis becomes a viable option. Our errant subclass we just discussed can find its own problems... and projects using that API shouldn't have to dig into all its implementations themselves either.

In local analysis, the analyzer checks a set of files (a “library”) at a time. The library's dependents are unknown. Its dependencies are available, but in compiled form only, and are not analyzed, only used to read their annotations.

In short, when analyzing Library X, the analyzer:

  • Trusts annotations found in X's dependencies
  • Validates X’s own annotations, so they will be trustworthy for X's dependents

It's true that an application is still only as correct as the libraries making it up are. But at least each library can act autonomously. As more and more of them implement local analysis, the application becomes steadily safer and safer from bugs.

JSpecify stance

It would be nice if our annotations are helpful to whole-program analysis, but we have been more focused on local analysis.

When analyzing a source file, a tool will never know whether the file's dependencies were analyzed in the same way, different way, or not at all. If the dependency annotations being used are coming from overlay files, it's especially unlikely this actual verification happened. Or, even if a dependency class is well-validated, but it's not a final class, it still might have unanalyzed subtypes and the same risks might exist.

When you depend on other code, its implementation might always fail to meet the promises of its specification (including its annotations). But this isn't actually the worst problem in the world! Because when it does fail, it's generally possible to pinpoint blame. There is some mismatch either between an implementation and its own contract, or between a contract and a supercontract, or between a contract expressed in overlay files and the code it's trying to describe. Put simply, when you discover an implementation clashing with its specification, you're halfway to a remedy already: bring them into line.