Type inference - vilinski/nemerle GitHub Wiki

Table of Contents MSc thesis Bottom up type inference Free type variables Constraints Other sources of free type variables Deferral What exactly compiler does?

MSc thesis

The theoretical model of Nemerle is described in my MSc thesis. This page explains this from a more practical standpoint, though reading at least the first chapters of the thesis wouldn't hurt.

Bottom up type inference

Most languages, even plain good ol' C, support a form of bottom up type inference. It happens when you have an expression, let's say a function call f() and f's return type is int, then we know the type of f() is int. A more involved example would be Singleton of type T -> Set[T] (that is Singleton is a function taking element of type T and returning Set[T]). Then Singleton(f()) clearly has type Set[int]. This example is more interesting as the type of function call depends on types of arguments.

From this kind of reasoning there is only a single little step to allowing:

def s = Singleton (f ());

instead of:

Set<T> s = Singleton (f ());

That is omitting types of local variable declaration.

You can even do this in GCC compiling C++ (or C, but as it lacks templates it doesn't make much sense): <c>#define DEF(v,e) __typeof(e) v = e int main (void) { DEF(x, 21 + 21); return x; } </c>

Free type variables

The situation becomes more involved when there are type parameters that are used in return type but not in type of function parameters. For example consider MakeSet function returning Set[T] and taking no arguments.

def s = MakeSet ();

And what type does s have? It is Set[T] for some T. We do not yet know what kind of set will it be. But after adding next line with s usage:

def s = MakeSet ();
s.Add (42);

we know that T is int, because we have added element of type int to it. In fact we know this because Set[T].Add has type T -> void.

Such yet unknown types are called free type variables. They are free, that is yet-unbound, they are type variables so real types can be put in place of them later.

Nemerle type inference engine very often produces such type variables just to kill them a moment later by replacing them with a real type.

Note that type variables are not exactly type parameters used in method definition. The difference is that we need a fresh copy of all type parameters upon each call. And this copy is called free type variable. Otherwise we couldn't say:

def s1 = MakeSet ();
def s2 = MakeSet ();
s1.Add (42);
s2.Add ("42");

But with fresh copies used each time we have s1 : Set[T1] and s2 : Set[T2] and everything is OK.

Constraints

Let's have a look at the following classes:

class BaseClass {}
class Derived1 : BaseClass {}
class Derived2 : BaseClass {}

Now we create a set and add an element:

def s = MakeSet ();
s.Add (Derived1 ()); // add a new Derived1 instance

After this we should know, like in the int-example above, that s has type Set[Derived1]. But such conclusion would be premature, because we may want to:

s.Add (Derived2 ());

This is all OK, as long as we consider s to be of type Set[BaseClass]. Clearly if programmer adds Derived1 and later Derived2 then the intention was to have a set of BaseClass.

Therefore we cannot eagerly replace T with Derived1 upon first usage. Instead of this we just place a constrain that T have to be a supertype of Derived1. Next we place additional constrain that T is supertype of Derived2. Now because all supertypes of both Derived1 and Derived2 are also supertypes of BaseClass we can simplify these two constraints to a single one stating that T is a supertype of BaseClass.

Other sources of free type variables

There are other places where free type variables can be introduced. For example the null literal does it (in fact it additionally adds anti-value-type constraint), but probably the most interesting of such places are function parameters.

Nemerle supports type inference for parameters of local functions (that is functions defined and visible only inside some other function). We plan to extend this to private class members, there are no conceptual problems here, only implementation will be hard.

For example:

def foo (x) {
  x + 3
}

We first assign a free type variable as the type of x, and later infer, probably a bit too eagerly, that x has type int, because it is added to 3.

Deferral

There are some more problems with free type variables. For example:

def foo (x) {
  x + x
}

After seeing such a definition we do not know what type should x have. Would it be an int? A double? A string? This is because the addition operator is overloaded, it has more than one signature.

So we cannot type such a definition alone. But hey, who defines a function not to use it at all [1]? If we have this definition and usage like foo ("42") we know that x is string, therefore the + overload to be chosen is string * string -> string and therefore the return type of foo is also string.

Now, because we can assign only one type to x in the generated code, if after foo ("42") user writes foo (42.0), then we report an error. This is however not a problem in most circumstances.

Similar reasoning can be applied to:

def get_length (x) {
  x.Length
}

We cannot know what type does x.Length have, or even if it's valid at all until we know the type x. But we can tell it again by looking at the function usage get_length (array [1,2]).

What exactly compiler does?

Of course it is easy to say we look at, we can deduce, but the compiler requires somehow stricter algorithm. The basic idea here is that the compiler proceeds with type inference in the top-down, left-right order, until it sees something it cannot handle. Examples of such things include:

a member reference on an expression with yet unknown type
overloaded call, that still has more than one maximally specific solution
call to operator on operands with unknown types (because resolution rules for operators depend on types of operands)
a indexer reference on expression with unknown type
a situation when a macro decides that it doesn't yet have enough information (this is macro-specific, for example the foreach macro needs to known the type of the collection it is iterating over)

When such a situation is found, a special place-holder is put into the resulting tree and appropriate action is put into FIFO queue of delayed typings. This placeholder has a yet-unknown type, which can lead to more delayed typings, when it is used.

Once the entire typing process for a method is over, we look into our FIFO queue. We try to ask each action in it, to resolve itself. When it succeeds, it is removed from the queue, otherwise it is added again. When we process each of the elements initially in the queue, we check if the queue is empty. If it is, we have succeeded.

Otherwise, we check if we did resolve something, and if so, we start the queue iteration process again (because resolution of an expression can make resolution of some other expression possible).

If we didn't resolve anything, we ask the first element in the queue to explain to the user why did it fail. This also happens to be the first failed expression in the source program order, so the error reported is the top-most one.