CS412/413 Spring 2000 Introduction to Compilers and Translators

Static Semantics for Iota


NOTE: This page uses some dynamic fonts for some of the math symbols.. If the turnstile symbol (`) looks more like a back tick (`) than this (|-), try downloading and installing the TrueType TeX math symbols font. Make sure dynamic fonts are turned on on your browser. Netscape users on Linux may experience problems, apparently because dynamic fonts are not implemented correctly.

This handout presents static semantics for the Iota programming language. The static semantics mirror the syntactic specifications given in that definition; the rules given here can be applied straightforwardly to the result of parsing using those specifications. The static semantics capture the essence of type-checking Iota. Certain details are ignored, such as the interpretation of type expressions, which is straightforward in Iota in any case.

The goal of the type checker is to determine whether a module M is well-typed. This judgment is written as ` M, where M is the text of the module. To check a module, we first construct an environment in which all of the module "uses" definitions and the components of the module themselves appear. This environment construction is performed by the function module-env, which is defined below. Suppose that U represents a single "use" declaration (of the form id.id or id = id.id) and D represents a single definition within a module. A module is type-safe if each of its definitions Di is type-safe in the module environment:

M = uses U1...Un D1...Dm

module-env({U1, ..., Un, D1, ..., Dm}) ` Di defn (i21..m)


            

` M

In this rule, the word  defn is placed after the variable D to distinguish the judgment that a definition is type-safe from other judgments that the static checker makes: the judgement A ` D  defn means that the definition D is type-safe in the environment A.

The Iota type system has two special types, none and unit, which do not appear in the syntax of the language.  The none type is the type of expressions or statements which do not return control to their enclosing context. The unit type is the type of expressions or statements which return control to the enclosing context, but produce no value.

The function module-env performs the gathering of information about variables and functions that are visible at the top level, and can be defined recursively. Here, the function interface-type(i, n) finds the type of the identifier n in the interface named i. Assuming that interface-env is a function that constructs the environment for the interface i, T = interface-type(i, n) means n : T 2 interface-env(i). The symbol table interface-env(i) is easily computed in a manner similar to module-env.

module-env({id1.id2 , U2, ..., Un, D1, ..., Dm}) =
        { id2 : interface-type(id1, id2) } + module-env({U2, ..., Un, D1, ..., Dm})
module-env({id1 = id2.id3 , U2, ..., Un, D1, ..., Dm}) =
        { id1 : interface-type(id2, id3) } + module-env({U2, ..., Un, D1, ..., Dm})
module-env({id1 : T1, D2, ..., Dm}) =
        { id1 T1} + module-env({D2, ..., Dm})
module-env({id0(id1 : T1, id2 : T2, ..., idn : Tn) = E, D2, ..., Dm}) =
        { id0 : T1 × T2 × ... × Tn!unit } + module-env({D2, ..., Dm})
module-env({id0(id1 : T1, id2 : T2,...,idn : Tn) : T = E, D2, ..., Dm}) =
        { id0 : T1 × T2 × ... × Tn!T} + module-env({D2, ..., Dm})
module-env({}) = {}

There are two kinds of definitions in a module: variable definitions and function definitions. Variable definitions in the module are automatically considered to be safe: A ` id : T defn. Function definitions are less trivial. The rule for checking functions that return or do not return a value is as follows:

A + {a1 : T1, ..., an : Tn} + {return : Tr} ` E: Tr


A ` id(a1 : T1, ..., an : Tn) : Tr = E defn 


A + {a1 : T1, ..., an : Tn} + {return : unit} ` E: Tr


A` id(a1 : T1, ..., an : Tn) = E defn 

These rules say that to check the type safety of a function definition, we check that the the body of the function, E, has the expected type Tr that is the declared return type of the function. In the case of a function that does not return a value, the expression E may have any type Tr, since its value, if any, is discarded. The type of the expression E is determined in an environment that includes all of the argument variables ai of the function, bound to their corresponding types Ti, which is necessary because E may mention these variables. In order to permit a function to end with a return statement, we have the following rule as well:


A + {a1 : T1, ..., an : Tn} + {return : unit} ` E : none


A` id(a1 : T1, ..., an : Tn) = E defn 

Note that in the preceding rules, environments are extended by using an operator +. The addition of two environments is the union of those environments as long as the two environments do not contain any definitions with the same name. If two environments contain definitions with the same name, the addition of the environments is undefined. This approach differs from that in Appel, where the second environment takes precedence.

To support return statements, the environment contains a binding for the pseudo-identifier return; this binding keeps track of the return type of the function and is used to check return statements that occur within the function.  Notice the use of the special type unit as the return type of a function which returns no value. You can think of unit as a type that has only one value, which represents normal evaluation of an expression.

Type-Checking Expressions

As in lecture, the judgment for type-checking both expressions and statements will have the same form: A ` E : T and A ` S : T, respectively. Here, E and S represent an arbitrary expression or statement, and T represents the type of the expression or statement's value.

Judgments for literal expressions are trivial:

A ` string_constant : string
A ` integer_constant : int
A ` true : bool
A ` false : bool

Inference rules for the various binary and unary operations also are straightforward. The rules for the operators -, /, *, and % are like the first rule for +.

A ` E1 : int (There are similar rules for -, *, /, %)
A ` E2 : int

A ` E1 + E2 : int

 

A ` E1 : string
A ` E2 : string

A ` E1 + E2 : string

 

A ` E1 : bool The rule for the | operator is similar
A ` E2 : bool

A ` E1 & E2 : bool

 

A ` E1 : T
A ` E2 : T

A ` E1 == E2 : bool

 

A ` E1 : int

There are similar rules for the operator >  and for type string as the type of E1 and E2

A ` E2 : int

A ` E1 < E2 : bool

 

A ` E : bool           A ` E : int


A ` ! E : bool A ` - E : int
  
A ` E : string           A ` E : array[T]


A ` length E : int A ` length E : int

Identifiers representing variables are type-checked simply by looking them up in the environment. Array indexes representing variables also are straightforward.

A ` E1 : array[T] A ` E1 : string
id : T  2 A A ` E2 : int A ` E2 : int



A ` id : T A ` E1 [ E2 ] : T A ` E1 [ E2 ] : int

There is also a special array constructor expression in Iota (this was omitted from the semantics before Prelim 1):

A ` E1 : int

A ` E2 : T


A ` new T [ E1 ] ( E2 ) : array[T]

A function call is checked by ensuring that the types of each of the actual expressions supplied as arguments are the same as the declared types of the formal arguments. Note that this rule looks in the environment to find the signature of the function named id.

id : T1 × T2 × ... × Tn! T  2 A
A ` Ei : Ti    (i 2 1..n)

A ` idE1,..., En )  : T

Type-Checking Statements

In the following rules, the letter S will be used to denote a single statement. First, let us consider rules associated with the lists of statements. The type of an empty list of statements is unit:

A ` ( ) : unit

The type of a statement list is the same as the type of the final statement. Variable declarations in the sequence apply to all subsequent statements in the block.

A ` S : T


A ` ( S ) : T

 

A ` S1 : T1

A ` S1 : T1

S1 is not a declaration

S1 = id : T [ = E ]

A ` ( S2 ; ... ; Sn ) : Tn

A + { id : T } ` ( S2 ... Sn ) : Tn


A ` ( S1 ; ... ; Sn ) : Tn

A ` ( S1 ; ... ; Sn ) : Tn

Now we will consider type-checking some more interesting statements.  A statement that is a variable declaration has type unit if the variable is not initialized by the declaration, or the type (and value) of the variable if it is initialized:

A ` E : T

A ` id : T : unit A ` id : T = E    : T

There are two kinds of assignment statements: assignments to ordinary variables, and assignments to array elements:

T not a function type A ` E1 : array[T]
id : T  2 A A ` E2 : int
A ` E : T A ` E3 : T


A ` id = E : T A ` E1 [ E2 ] = E3 : T

Now let us consider Iota's statements for control flow. The if statement requires that its two arms (if present) have the same type.

A ` E : bool
A ` E : bool A `  S1 : T
A ` S : T A `  S2 : T


A ` if ( E ) S : unit A ` if ( E ) S1 else S2 T

If the two arms have different types, the whole if statement has type unit:

A ` E : bool
A ` S1 : T1
A ` S2 : T2
T1 not equal to T2

A ` if ( E S1 else S2unit

An alternative strategy for formalizing the type-checking of if statements is to consider all well-typed statements to have type unit in addition to what we would consider to be their principal types. With this formalization, we do not need the previous rule for arms which have different types, because  the following general inference rule ensures that both statements S1 and S2 have at least the type unit in common:

A ` S : T

A ` S : unit

Using this rule, an if statement always can be given type unit regardless of the types of the two arms. Note that with this rule, it is possible for a statement to be given two types: its principal type, and the type unit. The principal type for an expression or statement is its most specific type; in a language in which an expression can have more than one type, it is important that it have a principal type. In this case, the ambiguity about its type, because a statement or expression is well-typed as long as there is any way to prove it well-typed. Practically it is not a problem, either, because the type checker can always determine the principal type of a statement and then relax its type to some other type (i.e., unit) if that is what the containing context demands.

A while statement always has the type unit, and requires that its test expression has type bool. If the while statement has the loop condition true, then it can never terminate normally and may be given the type none. (This also allows a function to end with an infinite loop.)

A ` E : bool
A ` S : T A ` S : T

(while)
(infinite loop)
A ` while ( E ) : unit A ` while ( true ) : none

A return statement is checked by ensuring that the return type matches the declared return type of the containing function; this declared return type is stored in the symbol table under the identifier return. A return statement has the type none rather than the type unit, because this statement transfers control to a location other than the context in which it occurs. If Iota supported a break statement, it would be typed similarly. Note that this means that statements such as int x = (return "hello") will not type-check even though they cannot generate dynamic type errors; thus, the rules are overly conservative.

A ` E : T
T is not unit
return : unit 2 A return : T  2 A


A ` return : none A ` return E : none