Typing Booleans and Conditionals

Let's add booleans and conditional statements to our typed lambda
calculus. We need to extend our types with a boolean type:

e ::= ... | true | false | e1 \/ e2 | if e1 e2 e3

t ::= ... | bool

Now we have both booleans and integers in the language. Certain
constructs use integer operands, others use boolean operands. For
instance, an addition expects two integers, while an or operation
expects two booleans. If you look at the semantic rules, you'll see
that the program execution runs into an error when these conditions
are not met. We can rule out such errors using appropriate typing
rules. We'll take the set of typing rules for the simply typed lambda
calculus, and augment it with a couple more rules for the new
constructs:

TE |- true : bool

TE |- false : bool


TE |- e1 : bool   TE |- e2 : bool
---------------------------------
      TE |- e1 \/ e2 : bool


TE |- e1 : bool   TE |- e1 : t   TE |- e3 : t
---------------------------------------------
             TE |- if e1 e2 e3 : t


The rule for if expressions is more interesting. It says that
condition e1 must be a boolean, and that expressions e2 and e3
(representing the two branches of the if) must have the same type. The
idea is that we need to statically identify the type t that the if
expression produces. This value is the result of evaluating e2 or e3,
depending on which branch is taken. Since we use a static reasoning
and we don't know which branch will be taken, we must assume that each
branch will produce a value of this type t.

It can be shown that these rules are sound. That is, the execution of
each well-typed expression is error-free. The proof techniques are the
same as before, using type preservation and progress (Try it!).


Now let's add the fix-point operator:

  e ::= ... | fix e

The evaluation rule is fix e -> e (fix e). Because of the application
e (fix e), e must be a function of type t -> t' and "fix e" must be of
type t. To preserve types during evaluation, we need "fix e" and "e
(fix e)" to have the same type. So t = t'. Finally, we want fix to
model recursive functions, so "fix e" should be a function, say of
type t1 -> t2. The typing rule becomes:

  TE |- e : (t1->t2) -> (t1->t2)
  ------------------------------
      TE |- fix e : t1->t2


Product types


Let's add pairs and selection operators (fst, snd) to our
languages. The type of each pair can be described using a _product
type_; this is the product of the types of the two components.

 e ::= ... | (e1,e2) | fst e | snd e
 
 t ::= ... | t1 x t2

The typing rules are fairly straightforward:


  TE |- e1 : t1   TE |- e2 : t2 
  -----------------------------
     TE |- (e1,e2) : t1 x t2


  TE |- e : t1 x t2
  -----------------
  TE |- fst e : t1


  TE |- e : t1 x t2
  -----------------
  TE |- fst e : t1


The rule for tuples introduces (or constructs) product types. The
following two rules eliminate (or destruct) product types. The
argument e of fst or snd must be a tuple (with product type, as
indicated in the premise); then each selector chooses the type of the
appropriate component.

In practice, languages such as ML support n-tuples,
e.g. (e1,..,en). The selector operator is #k, and chooses the k-th
component of its argument tuple. That is, #k (e1,..,ek,..,en) = ek.
Therefore fst is equivalent to #1, and snd is equivalent to #2.

Structures and records are similar to tuples, but use labels to refer
to their fields. In contrast, tuples refer to their components using
integer indices. Record types are product types with labeled
components.

For instance, in ML I can declare records using curly
brackets, and use #f to select field f of a record. Because this is a
functional setting, tuples and records are immutable -- their fields
cannot be updated.

 type Point = {x:int, y:int, z:int}

 val p : Point = {x=1, y=3, z=2}
 val n : int = #y p

In Pascal (and derived languages, such as Modula-3 or Ada), records
are declared with a special keyword. Selection is done with a field
access operation "e.f". Of course, in the imperative setting, record
fields can be updated, using assignments "e1.f := e2".

 TYPE Point = RECORD
        x, y, z : INTEGER
      END;

 VAR p : Point := (x:1, y:3, z:2);
 VAR n : INTEGER := p.y;

In C or C++, records are declared with the keyword struct, but are
otherwise similar to those in Pascal:

 struct Point { 
   int x, y, z;
 };

 struct Point p = {1, 3, 2};
 int n = p.y;

Structure initializers don't use field names, but rely on the order in
which fields occur in the structure declaration.


Sum types

Now consider we want to add a construct similar to variant records or
ML datatypes. If a type t is a variant of two types t1 and t2, then a
variable of type t can contain at run-time either a value of type t1
or a value of type t2. Such a type is referred to as a _sum type_ t1 +
t2. We also need a construct similar to a case or a switch, to examine
a variable and perform different actions based on its actual type. We
can add such constructs in our language as follows:

  e ::= ... | inl[t1+t2] e | inr[t1+t2] e |
        case e1 of inl(x) => e2 | inr(y) => e3

Inl takes a value of type t1 and maps it to a value of variant type
t1+t2. Inr takes a value of type t2 and maps it to a value of type
t1+t2. (Note: inl/inr are injective functions from t1/t2 to t1+t2,
hence their name; "l" and "r" refer to "left" and "right",
respectively). The case construct takes a variant value e1. If this
value is of type t1, then it binds the value of e1 to inl(x) and
executes e2; if it is a value of type t2, it binds it to inr(y) and
evaluates e3. The typing rules are:


         TE |- e : t1 
  ----------------------------
  TE |- inl[t1+t2] e : t1 + t2


         TE |- e : t2 
  ----------------------------
  TE |- inr[t1+t2] e : t1 + t2


  TE |- e : t1+t2  TE[x->t1] |- e2 : t  TE[y->t2] |- e3 : t 
  ----------------------------------------------------------
      TE |- case e1 of inl(x) => e2 | inr(y) => e3 : t


Note the duality between product and sum types. For products, there is
one introduction rule, and two elimination rule. For sums, there are
two introduction rules, and one elimination rule.


As mentioned above, in ML these correspond to datatypes. The
difference is that in a datatype component types are labeled with
names. Consider a type that describes integers or booleans:

  datatype IntOrBool = I of int | B of bool;

  let val e = if (read_input()) then I(100) else B(true);
  in
    case e of 
         I(x) => print ("int value = " ^ Int.toString(x))
       | B(y) => print ("bool value = " ^ Bool.toString(y))
  end

We can map this example to the inr/inr constructs and sum types as
follows. Type t1 is int, t2 is bool, t1+t2 is IntOrBool. Then type
constructors correspond to injection functions: I(100) is
inl[int+bool] 100, and B(true) is inr[int+bool] true. Note that the
[t1+t2] annotation is encoded in the type constructor name. Finally,
the ML case corresponds to "case (e) of inl(x) => ... | inr(y) =>
...".

In the Pascal family of languages (except Modula-3), sum types occur
in the form of variant records.  They use a tag to discriminate
between the different possible values in the variant. The above
example can be written as follows:

  TYPE IntOrBool = RECORD    
         CASE kind : 1..2 OF
         1 : (I : INTEGER);
         2 : (B : BOOLEAN)
         END;

  VAR e : IntOrBool;

  IF read_input() 
  THEN BEGIN e.kind := 1; e.I := 100 END
  ELSE BEGIN e.kind := 2; e.B := true; END;

  CASE e.kind OF
     1 : BEGIN Write("integer value = "); Writeln(e.I) END;
     2 : BEGIN Write("boolean value = "); Writeln(e.B) END
  END


In C and C++, unions are the equivalent of variants. For instance, the
following union contains either an integer or a boolean:

  union IntOrBool { 
     int I; 
     bool B;
  }

Usually, such constructs are placed in an enclosing structure
containing a tag that identifies which kind of value the union
contains, similar to the tag in Pascal variant records:

  struct TaggedIntOrBool {
     int kind;
     union IntOrBool { 
       int I; 
       bool B;
     }
  }

  struct TaggedIntOrBool e;

  if (read_input()) 
       { e.kind = 1; e.I = 100; }
  else { e.kind = 2; e.B = true; }

  switch (e.kind) {
    case 1: printf("integer value = %d", e.I); break;
    case 2: printf("boolean value = %s", e.B ? "true" : "false"); break;
  }

However, unlike the type-safe ML datatypes and case constructs,
variant records and unions are type-unsafe: it is the user's
responsibility to update tags manually, to enforce consistency between
tags and values in the variant, and to check the tag before reading
values out. In fact, the implementation of ML datatypes uses tags,
does all this work automatically, and the type system ensures that no
type violations will occur at run-time.