- val f = fn z => z+2 val f = fn : int->int - val ident = fn x => x val ident = fn : 'a -> 'a - let fun square z = z*z in fn f => fn x => fn y => if f x y then f (square x) y else f x (f x y) end val it = fn : (int->bool->bool)->int->bool->bool
To see how this works, we'll start with a simple type checker for an ML-like language and extend it to support type inference. There are several things to notice about this code:
check_equal
that raises
an exception if the two types are not equal. This is okay because the type
checker would always raise an exception if the types were unequal.Env using
closures, though any functional set implementation (e.g., red-black trees)
would likely be at least as good. expr and decl give the language syntax, which includes both
anonymous and recursive (named) functions.The key idea behind the unification-based type inference algorithm is to
introduce type variables to stand in place of types that the algorithm
hasn't figured out yet. If a new undeclared type is encountered (for example, in a
let expression), it is added to the environment, bound to a type variable. During type
checking, type variables are solved for as necessary. This works because the only time that
the type checker above generates any constraints on types is when they are
compared for equality. We can build a type inference algorithm by having the
test for equality also simultaneously solve for type variables as needed to unify
the two types being compared: that is, make them equal.
For example, if we compare the two types T1->bool and (int->T3)->T2,
where T1, T2, and T3 are type variables, we can see that we can make these two
types equal by picking types T1=int->T3 and T2=bool.
We can think of these two equations as substitutions that, if applied to
the types being compared, make them equal to one another. In this case, the
result of applying the substitutions to both types is (int->T3)->bool.
There are many substitutions that would make these two types equal, because
we can add T3=t to the substitution for any arbitrary
type t and still unify the two types. Comparing these two
types doesn't give us any information about T3, so we wouldn't want to do that.
Therefore unification of the two types finds the weakest substitution that unifies the
two types. A substitution is weaker than another if the stronger substitution
can be described as applying the weaker substitution, followed by another
non-trivial substitution. For example, any substitution of the form (T1=int->t,
T2=bool,T3=t) can be achieved by first doing a substitution
(T1=int->T3, T2=bool) and then a substitution (T3=t).
The unification algorithm should find the weakest unifying substitution: (T1=int->T3,
T2=bool).
The downside of unification is that it can lead to confusing error messages when an expression is not well-typed. For example:
- fn z => let val (x,y)=z in z(x) end stdIn:2.4-24.5 Error: operator is not a function [tycon mismatch] operator: 'Z * 'Y in expression: z x
The types 'Z and 'Y are the way that SML reports an
unsolved type variable in an error message. This error message tells us that SML
tried to unify a tuple type 'Z*'Y against a function type and
failed (as we would expect). SML hadn't figured out what the types of the tuple
elements were, so it just reported the type variables instead.
A nice way to implement unification-based type inference is to represent type
variables using ref cells. For an unsolved type variable, the ref cell points to
NONE; once it is solved and set equal to some type t,
the cell is updated to point to SOME(t). Here is an implementation
of type inference using that technique: