Records and subtyping

The principle of subtyping is as follows. If t1 and t2 are two types
such that t1 is a subtype of t2, written t1 <= t2, then the program
can use a value of type t1 whenever the program expects a value of
type t2. If you regard types as describing sets of values, then
subtyping describes a subset relation between sets of values.

You can express the notion of subtyping in using a typing rule that is
referred to as the "subsumption rule":

  TE |- e : t  t <= t'
  --------------------
      TE |- e : t'

The rule says that if e has type t and t is a subtype of t', then e is
also of type t'. In other words, if the program expects a value of
type t', I can pass in a value of type t and use the above rule to
"convert" it to a value of type t.

The subtyping relation is both reflexive and transitive. Reflexivity
says that each type is a subtype of itself. And transitivity says that
if t1 is a subtype of t2, and t2 is a subtype of t3, then t1 is a
subtype of t3. These properties are fairly straightforward if you
think of subtypes as subsets.

Consider records and record types. A record consists of a set of
labeled fields. Its type includes the types of the fields in the
record. For instance:

  type Point = { x:int, y:int }

is the type of a record with two fields, x and y, both of type
int. And:

  type Point3D = { x:int, y:int, z:int}

is the type of a record with three fields, x, y, and z. Because
Point3D contains all of the fields of Point, and those have the same
type as in Point, it makes sense to say that Point3D is a subtype of
Point: Point3D <= Point. The idea is that Point3D includes all of the
features (i.e. fields) of Point, hence any piece of code that requires
a Point will only refer to fields x and y, so it can use a Point3D
instead.

The general subtyping rule can be written as follows:

 -----------------------------------------------------
 {l1:t1, ..., ln:tn} <= {l1:t1, ..., ln:tn, ln+1:tn+1}

How about allowing the corresponding fields to be in a subtype
relation? For instance, if A <= B and C <= D, then is { x:A, y:C } a
subtype of { x:B, y:D }? It turns out that it is safe to do so only if
the fields of the record are immutable, e.g., in a purely functional
setting, or for methods of classes, since those cannot be updated (as
for why this may be unsafe for mutable fields, see why subtyping of
reference types is unsafe).  The general rule is:

         t1 <= t1'  ... tn <= tn'
 --------------------------------------------
 {l1:t1, ..., ln:tn} <= {l1:t1', ..., ln:tn'}
 
I can combine the above two rules into one single rule:

                t1 <= t1'  ... tn <= tn'
 ---------------------------------------------------------
 {l1:t1, ..., ln:tn} <= {l1:t1', ..., ln:tn', ln+1':tn+1'}


How about sums (variant types)? If A and B are two types, is A a
subtype of A+B, or vice-versa? The intuition is that a piece of code
that expects a value of type A+B can use either an A or a B. But a
piece of code that expects a value of type A may not work if a value
of type A+B is passed in since it may get a value of type B. Hence, it
makes sense to say that A <= A + B.

Furthermore, if C <= D, then is A + C a subtype of A + D, or
vice-versa? Well, a piece of code that works for A + D can accept
either A's or D's. So it will work for C's, too, since C is a subtype
of D. Hence, it will work for A + C. Therefore, A + C <= A + D.

If I write variants using an ML-like syntax:

  type variant = [l1 of t1 | ... | ln of tn] 

where l1, .., ln are the labels of the options, and t1, ..., tn are
their types. The typing rule is:

             t1 <= t1'  ... tn <= tn'
 ----------------------------------------------
 [l1 of t1 | ... | ln of tn] <= 
 [l1 of t1' | ... | ln of tn' | ln+1' of tn+1']


Let's look at function types now. Consider two function types t1 ->
t2, and t1' -> t2'. What is the subtyping relation that t1, t2, t1',
t2' should satisfy to be able to claim that t1' -> t2' <= t1 -> t2?

Consider a piece of code "fun g = \f : t1 -> t2. \x : t1. f(x)". This
function has type: (t1 ->t2) -> t1 -> t2.

Now, let's take h : t1' -> t2' such that t1' -> t2' <= t1 ->
t2. Because of this subtyping relation, it means that I can pass in h
as an argument to g and the code should work fine. Take v a value of
type t1 and try "g h v = h(v)". So h has been passed in a value of
type t1. Since h has type t1'->t2', I can conclude that t1 <= t1'.

Furthermore, the result type of "g h v" should be t2, as indicated by
the type of g. But the evaluation yields h(v), which is of type t2',
as indicated by the type of h. To be able to conclude that this is
always of type t2, I must require that t2' <= t2.

Putting the two pieces together, we get the typing rule for function
types:

   t1 <= t1'  t2' <= t2
  ----------------------
  t1' -> t2' <= t1 -> t2

Note that the subtyping relation between primed and non-primed types
in the premise and in the conclusion is opposite for the argument, and
is the same for return types.  We say that function subtyping is
contra-variant in the parameter, and co-variant in the return type.