Compiler cache update design - HaxeFoundation/haxe GitHub Wiki

Scenario

A module file changed. We learn about this either by checking the file mtime, or via the server/invalidate method of our JSON-RPC API.

The question we now want to answer is: Does the change affect any of the module's dependents?

Outcomes

Currently, there are two possible outcomes of a cache check:

  • (OK) The module is unchanged, so we can reuse it.
  • (INVALID) The module did change, so we cannot reuse it and cannot reuse any of its dependents either.

I would like to add a third outcome:

  • (BAD) We thought that the module was unchanged, but then it turned out that we were wrong.

Heuristics

We can always parse a module file without any risk. This means that we can look at the parsed data and make assumptions. Such assumptions may include:

  • If any import or usings changed (TODO: how to actually detect that?), we are very likely to have an invalid module. Raise (INVALID).
  • Pair module types and their fields with what we have in the cache. If we can't resolve something, raise (INVALID). (TODO: What if we lost something?)
  • If a field's pmax - pmin is the same, it is probably ok. Keep going.
  • Otherwise, if the field is inline, raise (INVALID).
  • Otherwise, check the field expression pmin in relation to the field's pmin itself. If that changed, the change must be from a modifier or function argument (not part of the expression at this level). In that case, we can raise (INVALID). Otherwise, the only thing that probably changed is the expression itself and we can keep going.
  • ... and some more checks

If we survive all of these heuristics, we enter optimism mode.

Optimism mode

Here we assume that the module didn't change. Among other things, this means that we set its m_dirty <- None, which allows all dependents to reuse it. The goal now is to retype the module while retaining the identity of anything that is referenced from the outside. In practice, this should mostly come down to updating tclass_field instances instead of recreating them.

This kind of retyping will be very fast because we're optimists and reuse everything from the cache (unless it is invalid for other reasons). Because we update existing structures and don't recreate any, we don't run the risk of the infamous "Type name X is redefined from module X" error.

But what if it goes wrong?

Of course, it's very possible that our optimism was misplaced and that we actually did have a change that affects the typing of other modules. The first task here is to detect this, which should mostly be a matter of detecting if something like a cf_type changed. We can remember the old values here before retyping.

If we do detect that, we end up in (BAD) territory. The challenge now is to discard everything that we typed based on our invalid assumptions. With proper data design, this means removing the (BAD) module and all its dependents (recursively) from the typer/common context, and then trying again. Logically, this should be equal to an (INVALID) state raised during heuristics phase. Designing the context to support this might require a bit of work, but it should be possible.

However, as a first step, we can probably just try restarting the current request altogether with a fresh context, in which we mark the offending module in a way that doesn't allow for any retyping attempts.

Of course, we want to avoid this state as much as possible. To that end, we can track modules in our cache which caused this. We can then check during heuristics if the module in question had the tendency to be (BAD), and raise (INVALID) instead. This is then not worse than what we currently do.