Typing Booleans and Conditionals Let's add booleans and conditional statements to our typed lambda calculus. We need to extend our types with a boolean type: e ::= ... | true | false | e1 \/ e2 | if e1 e2 e3 t ::= ... | bool Now we have both booleans and integers in the language. Certain constructs use integer operands, others use boolean operands. For instance, an addition expects two integers, while an or operation expects two booleans. If you look at the semantic rules, you'll see that the program execution runs into an error when these conditions are not met. We can rule out such errors using appropriate typing rules. We'll take the set of typing rules for the simply typed lambda calculus, and augment it with a couple more rules for the new constructs: TE |- true : bool TE |- false : bool TE |- e1 : bool TE |- e2 : bool --------------------------------- TE |- e1 \/ e2 : bool TE |- e1 : bool TE |- e1 : t TE |- e3 : t --------------------------------------------- TE |- if e1 e2 e3 : t The rule for if expressions is more interesting. It says that condition e1 must be a boolean, and that expressions e2 and e3 (representing the two branches of the if) must have the same type. The idea is that we need to statically identify the type t that the if expression produces. This value is the result of evaluating e2 or e3, depending on which branch is taken. Since we use a static reasoning and we don't know which branch will be taken, we must assume that each branch will produce a value of this type t. It can be shown that these rules are sound. That is, the execution of each well-typed expression is error-free. The proof techniques are the same as before, using type preservation and progress (Try it!). Now let's add the fix-point operator: e ::= ... | fix e The evaluation rule is fix e -> e (fix e). Because of the application e (fix e), e must be a function of type t -> t' and "fix e" must be of type t. To preserve types during evaluation, we need "fix e" and "e (fix e)" to have the same type. So t = t'. Finally, we want fix to model recursive functions, so "fix e" should be a function, say of type t1 -> t2. The typing rule becomes: TE |- e : (t1->t2) -> (t1->t2) ------------------------------ TE |- fix e : t1->t2 Product types Let's add pairs and selection operators (fst, snd) to our languages. The type of each pair can be described using a _product type_; this is the product of the types of the two components. e ::= ... | (e1,e2) | fst e | snd e t ::= ... | t1 x t2 The typing rules are fairly straightforward: TE |- e1 : t1 TE |- e2 : t2 ----------------------------- TE |- (e1,e2) : t1 x t2 TE |- e : t1 x t2 ----------------- TE |- fst e : t1 TE |- e : t1 x t2 ----------------- TE |- fst e : t1 The rule for tuples introduces (or constructs) product types. The following two rules eliminate (or destruct) product types. The argument e of fst or snd must be a tuple (with product type, as indicated in the premise); then each selector chooses the type of the appropriate component. In practice, languages such as ML support n-tuples, e.g. (e1,..,en). The selector operator is #k, and chooses the k-th component of its argument tuple. That is, #k (e1,..,ek,..,en) = ek. Therefore fst is equivalent to #1, and snd is equivalent to #2. Structures and records are similar to tuples, but use labels to refer to their fields. In contrast, tuples refer to their components using integer indices. Record types are product types with labeled components. For instance, in ML I can declare records using curly brackets, and use #f to select field f of a record. Because this is a functional setting, tuples and records are immutable -- their fields cannot be updated. type Point = {x:int, y:int, z:int} val p : Point = {x=1, y=3, z=2} val n : int = #y p In Pascal (and derived languages, such as Modula-3 or Ada), records are declared with a special keyword. Selection is done with a field access operation "e.f". Of course, in the imperative setting, record fields can be updated, using assignments "e1.f := e2". TYPE Point = RECORD x, y, z : INTEGER END; VAR p : Point := (x:1, y:3, z:2); VAR n : INTEGER := p.y; In C or C++, records are declared with the keyword struct, but are otherwise similar to those in Pascal: struct Point { int x, y, z; }; struct Point p = {1, 3, 2}; int n = p.y; Structure initializers don't use field names, but rely on the order in which fields occur in the structure declaration. Sum types Now consider we want to add a construct similar to variant records or ML datatypes. If a type t is a variant of two types t1 and t2, then a variable of type t can contain at run-time either a value of type t1 or a value of type t2. Such a type is referred to as a _sum type_ t1 + t2. We also need a construct similar to a case or a switch, to examine a variable and perform different actions based on its actual type. We can add such constructs in our language as follows: e ::= ... | inl[t1+t2] e | inr[t1+t2] e | case e1 of inl(x) => e2 | inr(y) => e3 Inl takes a value of type t1 and maps it to a value of variant type t1+t2. Inr takes a value of type t2 and maps it to a value of type t1+t2. (Note: inl/inr are injective functions from t1/t2 to t1+t2, hence their name; "l" and "r" refer to "left" and "right", respectively). The case construct takes a variant value e1. If this value is of type t1, then it binds the value of e1 to inl(x) and executes e2; if it is a value of type t2, it binds it to inr(y) and evaluates e3. The typing rules are: TE |- e : t1 ---------------------------- TE |- inl[t1+t2] e : t1 + t2 TE |- e : t2 ---------------------------- TE |- inr[t1+t2] e : t1 + t2 TE |- e : t1+t2 TE[x->t1] |- e2 : t TE[y->t2] |- e3 : t ---------------------------------------------------------- TE |- case e1 of inl(x) => e2 | inr(y) => e3 : t Note the duality between product and sum types. For products, there is one introduction rule, and two elimination rule. For sums, there are two introduction rules, and one elimination rule. As mentioned above, in ML these correspond to datatypes. The difference is that in a datatype component types are labeled with names. Consider a type that describes integers or booleans: datatype IntOrBool = I of int | B of bool; let val e = if (read_input()) then I(100) else B(true); in case e of I(x) => print ("int value = " ^ Int.toString(x)) | B(y) => print ("bool value = " ^ Bool.toString(y)) end We can map this example to the inr/inr constructs and sum types as follows. Type t1 is int, t2 is bool, t1+t2 is IntOrBool. Then type constructors correspond to injection functions: I(100) is inl[int+bool] 100, and B(true) is inr[int+bool] true. Note that the [t1+t2] annotation is encoded in the type constructor name. Finally, the ML case corresponds to "case (e) of inl(x) => ... | inr(y) => ...". In the Pascal family of languages (except Modula-3), sum types occur in the form of variant records. They use a tag to discriminate between the different possible values in the variant. The above example can be written as follows: TYPE IntOrBool = RECORD CASE kind : 1..2 OF 1 : (I : INTEGER); 2 : (B : BOOLEAN) END; VAR e : IntOrBool; IF read_input() THEN BEGIN e.kind := 1; e.I := 100 END ELSE BEGIN e.kind := 2; e.B := true; END; CASE e.kind OF 1 : BEGIN Write("integer value = "); Writeln(e.I) END; 2 : BEGIN Write("boolean value = "); Writeln(e.B) END END In C and C++, unions are the equivalent of variants. For instance, the following union contains either an integer or a boolean: union IntOrBool { int I; bool B; } Usually, such constructs are placed in an enclosing structure containing a tag that identifies which kind of value the union contains, similar to the tag in Pascal variant records: struct TaggedIntOrBool { int kind; union IntOrBool { int I; bool B; } } struct TaggedIntOrBool e; if (read_input()) { e.kind = 1; e.I = 100; } else { e.kind = 2; e.B = true; } switch (e.kind) { case 1: printf("integer value = %d", e.I); break; case 2: printf("boolean value = %s", e.B ? "true" : "false"); break; } However, unlike the type-safe ML datatypes and case constructs, variant records and unions are type-unsafe: it is the user's responsibility to update tags manually, to enforce consistency between tags and values in the variant, and to check the tag before reading values out. In fact, the implementation of ML datatypes uses tags, does all this work automatically, and the type system ensures that no type violations will occur at run-time.