Polymorphism Polymorphism refers to the ability of a piece code to operate on values of different types. Polymorphism applies to various language constructs, including functions, datatypes, objects, or modules. For instance, a polymorphic function is one that can be invoked with different kinds of arguments. And a polymorphic datatype is one that contains elements of unspecified types. There are kinds of polymorphism: - parametric polymorphism: the code is written without knowledge of the actual type of the arguments and operates on any kind of arguments. Examples include polymorphic functions in ML, or generics in Java 1.5. - subtyping polymorphism: the code works on values whose type may be any subtype of a known type - ad-hoc polymorphism. This usually refers to code that appears to be polymorphic to the programmer, but the actual implementation is not. A typical example is overloading -- using the same function name for functions with different kinds of parameters. Although the same looks like a polymorphic function to the code that uses it, there are actually multiple function implementations (none being polymorphic) and the compiler invokes the appropriate one. Templates in C++ can be considered in this category, as the compiler instantiates them, rather that using one single polymorphic version. Parametric polymorphism With parametric polymorphism, types "parameterized" on unknown types, i.e. defined in terms of unknown types. Consider the identity function: \x . x. The type inference in ML (or from last lecture) yields a type of the form 'a -> 'a, where 'a is an unknown type. Here, the type of id is parameterized on the unknown type 'a. To make explicit the fact that 'a can be any type, we can write the type of id as: \forall 'a . 'a -> 'a. To apply id to an integer, we have to instantiate 'a to int. To apply id to a boolean, we must instantiate 'a to bool. To study parametric polymorphism, we will the typed lambda calculus with constructs that describe polymorphic functions and types. This augmented calculus is know as the polymorphic lambda calculus, or "System F": (expr) e ::= n | x | \x : t . e | e1 e2 | /\ 'a . e | e [t] (types) t ::= int t1 -> t2 | 'a | \forall 'a . t (values) v ::= n | \x : t . e | /\ 'a . e The expression /\ 'a . e is a type abstraction: it takes type 'a as a parameter, and yields the value of e given that type. The expression e [t] is a type instantiation: it instantiates the polymorphic type of e with type t. Note that instantiation does not require the program to keep run-time type information, or to perform type checks at run-time; it is rather a way to statically check type safety in the presence of polymorphism. Finally, types include polymorphic types \forall 'a . t, which universally quantify occurrences of 'a in t over all possible types. In this language, the polymorphic identity function is written as: poly_id = /\ 'a . \x : 'a . x And we can apply this function to integers via type instantiation: poly_id[int] is \x : int . x, with type int->int The evaluation rules for the polymorphic system as the same as in the typed lambda calculus, augmented with new rules for evaluating the new constructs: e -> e' ------------- e[t] -> e'[t] (/\ 'a . e) [t] -> e[t/'a] Type checking expressions is slightly different than before. Besides the type environment E, we also need to keep track of a set D of type variables 'a. The reason is to ensure each type variable 'a is bound to an enclosing /\ 'a abstraction. The typing judgments are of the form D, E |- e : t. The rules are as follows: D, E |- n : int D, E |- n : int E(x) = t -------------- D, E |- x : t D, E |- e1 : t->t' D, E |- e2 : t ------------------------------------ D, E |- e1 e2 : t' D, E[x -> t] |- e : t' ---------------------------- FTV(t) subset D D, E |- \x : t . e : t -> t' In the last rule, FTV(t) represents the set of free type variables of t. The rule ensures that all free type variables of t are bound to enclosing /\'s. The remaining rules are: D U {'a}, E |- e : t ---------------------------------- 'a not in D D, E |- /\ 'a . e : forall 'a .t D, E[x -> t] |- e : forall 'a . t' ----------------------------------- FTV(t) subset D D, E |- \x : e[t] : t'[t/'a] The side condition in the rule for type abstractions forbids shadowing type parameters (what can go wrong if we omit this condition?). The rule for type instantiations requires that the free type variables of t are all bound. With these rules, we can check that (/\ 'a . x : 'a . x) [int] 2 is well-typed. We can also apply poly_id to itself, after the appropriate type instantiation: (poly_id[\forall 'a . 'a ->'a]) poly_id -> (\x : \forall 'a . 'a ->'a . x) poly_id -> poly_id In real languages such as ML, programmers don't have to annotate their programs with things like /\ 'a . e or e[t]. Both are automatically inferred by the compiler (although the user can specify the former if he wishes). For instance, We can write "fun f x = (x,x)", and have ML figure out that we mean "fun f (x : 'a) : 'a * 'a = (x, x)". Or we can write the latter directly and ML will type-check it. Conceptually, the system inserts an universal quantifiers /\ 'a for all type variables 'a around the outermost enclosing expression. Then, if we want to apply f to an integer, we don't have to make the instantiation explicit and write "f[int] 2", but we can directly write "f 2" and have the system infer the "[int]". In Java 1.5, generics provide support for parametric polymorphism. For instance, we can write a class that is parameterized on an unknown reference type T: class Pair { T x, y; Pair(T x, T y) { this.x = x; this.y = y; } T fst(Pair p) { this.x = p.x; return p.x; } } This is a class that contains a pair of two elements of unknown, but same type T. The parameterization /\ T is implicit around the class declaration. Since Java does not support type inference, type instantiations are required. Type instantiations are done by writing the actual type in angle brackets: Pair p; p = new Pair(Boolean.TRUE, Boolean.FALSE); Boolean x = p.fst(p);