Records and subtyping The principle of subtyping is as follows. If t1 and t2 are two types such that t1 is a subtype of t2, written t1 <= t2, then the program can use a value of type t1 whenever the program expects a value of type t2. If you regard types as describing sets of values, then subtyping describes a subset relation between sets of values. You can express the notion of subtyping in using a typing rule that is referred to as the "subsumption rule": TE |- e : t t <= t' -------------------- TE |- e : t' The rule says that if e has type t and t is a subtype of t', then e is also of type t'. In other words, if the program expects a value of type t', I can pass in a value of type t and use the above rule to "convert" it to a value of type t. The subtyping relation is both reflexive and transitive. Reflexivity says that each type is a subtype of itself. And transitivity says that if t1 is a subtype of t2, and t2 is a subtype of t3, then t1 is a subtype of t3. These properties are fairly straightforward if you think of subtypes as subsets. Consider records and record types. A record consists of a set of labeled fields. Its type includes the types of the fields in the record. For instance: type Point = { x:int, y:int } is the type of a record with two fields, x and y, both of type int. And: type Point3D = { x:int, y:int, z:int} is the type of a record with three fields, x, y, and z. Because Point3D contains all of the fields of Point, and those have the same type as in Point, it makes sense to say that Point3D is a subtype of Point: Point3D <= Point. The idea is that Point3D includes all of the features (i.e. fields) of Point, hence any piece of code that requires a Point will only refer to fields x and y, so it can use a Point3D instead. The general subtyping rule can be written as follows: ----------------------------------------------------- {l1:t1, ..., ln:tn} <= {l1:t1, ..., ln:tn, ln+1:tn+1} How about allowing the corresponding fields to be in a subtype relation? For instance, if A <= B and C <= D, then is { x:A, y:C } a subtype of { x:B, y:D }? It turns out that it is safe to do so only if the fields of the record are immutable, e.g., in a purely functional setting, or for methods of classes, since those cannot be updated (as for why this may be unsafe for mutable fields, see why subtyping of reference types is unsafe). The general rule is: t1 <= t1' ... tn <= tn' -------------------------------------------- {l1:t1, ..., ln:tn} <= {l1:t1', ..., ln:tn'} I can combine the above two rules into one single rule: t1 <= t1' ... tn <= tn' --------------------------------------------------------- {l1:t1, ..., ln:tn} <= {l1:t1', ..., ln:tn', ln+1':tn+1'} How about sums (variant types)? If A and B are two types, is A a subtype of A+B, or vice-versa? The intuition is that a piece of code that expects a value of type A+B can use either an A or a B. But a piece of code that expects a value of type A may not work if a value of type A+B is passed in since it may get a value of type B. Hence, it makes sense to say that A <= A + B. Furthermore, if C <= D, then is A + C a subtype of A + D, or vice-versa? Well, a piece of code that works for A + D can accept either A's or D's. So it will work for C's, too, since C is a subtype of D. Hence, it will work for A + C. Therefore, A + C <= A + D. If I write variants using an ML-like syntax: type variant = [l1 of t1 | ... | ln of tn] where l1, .., ln are the labels of the options, and t1, ..., tn are their types. The typing rule is: t1 <= t1' ... tn <= tn' ---------------------------------------------- [l1 of t1 | ... | ln of tn] <= [l1 of t1' | ... | ln of tn' | ln+1' of tn+1'] Let's look at function types now. Consider two function types t1 -> t2, and t1' -> t2'. What is the subtyping relation that t1, t2, t1', t2' should satisfy to be able to claim that t1' -> t2' <= t1 -> t2? Consider a piece of code "fun g = \f : t1 -> t2. \x : t1. f(x)". This function has type: (t1 ->t2) -> t1 -> t2. Now, let's take h : t1' -> t2' such that t1' -> t2' <= t1 -> t2. Because of this subtyping relation, it means that I can pass in h as an argument to g and the code should work fine. Take v a value of type t1 and try "g h v = h(v)". So h has been passed in a value of type t1. Since h has type t1'->t2', I can conclude that t1 <= t1'. Furthermore, the result type of "g h v" should be t2, as indicated by the type of g. But the evaluation yields h(v), which is of type t2', as indicated by the type of h. To be able to conclude that this is always of type t2, I must require that t2' <= t2. Putting the two pieces together, we get the typing rule for function types: t1 <= t1' t2' <= t2 ---------------------- t1' -> t2' <= t1 -> t2 Note that the subtyping relation between primed and non-primed types in the premise and in the conclusion is opposite for the argument, and is the same for return types. We say that function subtyping is contra-variant in the parameter, and co-variant in the return type.