CS 312 Lecture 21-22
Type inference and unification

In our SML programming we've been writing down types in function declarations. But if we leave the types off, the SML type-checker is able to figure out what the right type annotations should have been. This is called type inference or type reconstruction; In this lecture we'll see how it works. Notice that even a simple type checker does some amount of type inference; we don't have to write down types on every expression because it figures out a lot of the types itself. But it turns out that we can type-check the core of SML without any type declarations, which may seem surprising.

- val f = fn z => z+2
val f = fn : int->int
- val ident = fn x => x
val ident = fn : 'a -> 'a
- let fun square z = z*z
  in
    fn f => fn x => fn y =>
     if f x y then f (square x) y
     else f x (f x y)
  end
val it = fn : (int->bool->bool)->int->bool->bool

To see how this works, we'll start with a simple type checker for an ML-like language and extend it to support type inference. There are several things to notice about this code:

Instead of a function that checks type equality, we've implemented the comparison of two types using a function check_equal that raises an exception if the two types are not equal. This is okay because the type checker would always raise an exception if the types were unequal.
The environment has been made polymorphic so we can store different kinds of types into it. Just for fun, it is implemented in Env using closures, though any functional set implementation (e.g., red-black trees) would likely be at least as good.
The datatypes expr and decl give the language syntax, which includes both anonymous and recursive (named) functions.

<% ShowSMLFile("code/type-inference/type-checker.sml") %>

The key idea behind the unification-based type inference algorithm is to introduce type variables to stand in place of types that the algorithm hasn't figured out yet. If a new undeclared type is encountered (for example, in a let expression), it is added to the environment, bound to a type variable. During type checking, type variables are solved for as necessary. This works because the only time that the type checker above generates any constraints on types is when they are compared for equality. We can build a type inference algorithm by having the test for equality also simultaneously solve for type variables as needed to unify the two types being compared: that is, make them equal.

For example, if we compare the two types T1->bool and (int->T3)->T2, where T1, T2, and T3 are type variables, we can see that we can make these two types equal by picking types T1=int->T3 and T2=bool. We can think of these two equations as substitutions that, if applied to the types being compared, make them equal to one another. In this case, the result of applying the substitutions to both types is (int->T3)->bool.

There are many substitutions that would make these two types equal, because we can add T3=t to the substitution for any arbitrary type t and still unify the two types. Comparing these two types doesn't give us any information about T3, so we wouldn't want to do that. Therefore unification of the two types finds the weakest substitution that unifies the two types. A substitution is weaker than another if the stronger substitution can be described as applying the weaker substitution, followed by another non-trivial substitution. For example, any substitution of the form (T1=int->t, T2=bool,T3=t) can be achieved by first doing a substitution (T1=int->T3, T2=bool) and then a substitution (T3=t). The unification algorithm should find the weakest unifying substitution: (T1=int->T3, T2=bool).

The downside of unification is that it can lead to confusing error messages when an expression is not well-typed. For example:

- fn z => let val (x,y)=z in z(x) end
stdIn:2.4-24.5 Error: operator is not a function [tycon mismatch]
  operator: 'Z * 'Y
  in expression:
    z x

The types 'Z and 'Y are the way that SML reports an unsolved type variable in an error message. This error message tells us that SML tried to unify a tuple type 'Z*'Y against a function type and failed (as we would expect). SML hadn't figured out what the types of the tuple elements were, so it just reported the type variables instead.

A nice way to implement unification-based type inference is to represent type variables using ref cells. For an unsolved type variable, the ref cell points to NONE; once it is solved and set equal to some type t, the cell is updated to point to SOME(t). Here is an implementation of type inference using that technique:

<% ShowSMLFile("code/type-inference/type-inference.sml") %>

CS 312 Lecture 21-22 Type inference and unification

CS 312 Lecture 21-22
Type inference and unification