NOTE: This page uses some dynamic fonts for some of the math symbols.. If the turnstile symbol (`) looks more like a back tick (`) than this (|-), try downloading and installing the TrueType TeX math symbols font. Make sure dynamic fonts are turned on on your browser. Netscape users on Linux may experience problems, apparently because dynamic fonts are not implemented correctly.
This handout presents static semantics for the Iota programming language. The static semantics mirror the syntactic specifications given in that definition; the rules given here can be applied straightforwardly to the result of parsing using those specifications. The static semantics capture the essence of type-checking Iota. Certain details are ignored, such as the interpretation of type expressions, which is straightforward in Iota in any case.
The goal of the type checker is to determine whether a module M is well-typed. This judgment is written as ` M, where M is the text of the module. To check a module, we first construct an environment in which all of the module "uses" definitions and the components of the module themselves appear. This environment construction is performed by the function module-env, which is defined below. Suppose that U represents a single "use" declaration (of the form id.id or id = id.id) and D represents a single definition within a module. A module is type-safe if each of its definitions Di is type-safe in the module environment:
M = |
|
module-env({U1, ..., Un, D1, ..., Dm}) ` Di defn (i21..m) |
|
|
|
` M |
In this rule, the word defn is placed after the variable D to distinguish the judgment that a definition is type-safe from other judgments that the static checker makes: the judgement A ` D defn means that the definition D is type-safe in the environment A.
The Iota type system has two special types, none and unit, which do not appear in the syntax of the language. The none type is the type of expressions or statements which do not return control to their enclosing context. The unit type is the type of expressions or statements which return control to the enclosing context, but produce no value.
The function module-env performs the gathering of information about variables and functions that are visible at the top level, and can be defined recursively. Here, the function interface-type(i, n) finds the type of the identifier n in the interface named i. Assuming that interface-env is a function that constructs the environment for the interface i, T = interface-type(i, n) means n : T 2 interface-env(i). The symbol table interface-env(i) is easily computed in a manner similar to module-env.
module-env({id1.id2
, U2, ..., Un, D1,
..., Dm})
=
{ id2 : interface-type(id1,
id2) } + module-env({U2, ..., Un,
D1, ..., Dm})
module-env({id1
= id2.id3 ,
U2, ..., Un, D1, ..., Dm})
=
{ id1 : interface-type(id2,
id3) } + module-env({U2, ..., Un,
D1, ..., Dm})
module-env({id1
: T1, D2, ..., Dm})
=
{ id1 : T1}
+ module-env({D2, ..., Dm})
module-env({id0(id1
:
T1, id2 :
T2, ..., idn : Tn) =
E,
D2, ..., Dm})
=
{
id0 : T1 × T2
× ... × Tn!unit } + module-env({D2,
..., Dm})
module-env({id0(id1
:
T1, id2 :
T2,...,idn : Tn) :
T = E,
D2, ..., Dm})
=
{
id0 : T1 × T2
× ... × Tn!T} +
module-env({D2, ..., Dm})
module-env({})
= {}
There are two kinds of definitions in a module: variable definitions and function definitions. Variable definitions in the module are automatically considered to be safe: A ` id : T defn. Function definitions are less trivial. The rule for checking functions that return or do not return a value is as follows:
A + {a1 : T1, ..., an : Tn} + {return : Tr} ` E: Tr |
|
A ` id(a1 : T1, ..., an : Tn) : Tr = E defn |
|
|
A` id(a1 : T1, ..., an : Tn) = E defn |
These rules say that to check the type safety of a function definition, we check that the the body of the function, E, has the expected type Tr that is the declared return type of the function. In the case of a function that does not return a value, the expression E may have any type Tr, since its value, if any, is discarded. The type of the expression E is determined in an environment that includes all of the argument variables ai of the function, bound to their corresponding types Ti, which is necessary because E may mention these variables. In order to permit a function to end with a return statement, we have the following rule as well:
|
|
A` id(a1 : T1, ..., an : Tn) = E defn |
Note that in the preceding rules, environments are extended by using an operator +. The addition of two environments is the union of those environments as long as the two environments do not contain any definitions with the same name. If two environments contain definitions with the same name, the addition of the environments is undefined. This approach differs from that in Appel, where the second environment takes precedence.
To support return
statements, the environment contains a binding
for the pseudo-identifier return; this binding keeps track of the return type of the function and is used to check return
statements that occur within the function. Notice the use of the
special type unit as the return type of a function which returns no value.
You can think of unit as a type that has only one value, which represents
normal evaluation of an expression.
As in lecture, the judgment for type-checking both expressions and statements will have the same form: A ` E : T and A ` S : T, respectively. Here, E and S represent an arbitrary expression or statement, and T represents the type of the expression or statement's value.
Judgments for literal expressions are trivial:
A ` string_constant :
string
A ` integer_constant :
int
A ` true
:
bool
A ` false
: bool
Inference rules for the various binary and unary operations also
are straightforward. The rules for the operators -
,
/
, *
, and %
are like the first rule for +.
A ` E1 : int | (There are similar rules for - ,
* , /, %) |
A ` E2 : int | |
| |
A ` E1 +
E2 : int
|
A ` E1 : string |
A ` E2 : string |
|
A ` E1 +
E2 :
string
|
A ` E1 : bool | The rule for the | operator is similar |
A ` E2 : bool | |
| |
A ` E1 &
E2 :
bool
|
A ` E1 : T |
A ` E2 : T |
|
A ` E1 ==
E2 :
bool
|
A ` E1 : int |
There are similar rules for the operator
|
A ` E2 : int | |
| |
A ` E1 <
E2 : bool
|
A ` E : bool | A ` E : int | |
|
| |
A ` !
E : bool |
A ` -
E : int | |
A ` E : string | A ` E : array[T] | |
|
| |
A ` length
E : int |
A ` length
E :
int |
Identifiers representing variables are type-checked simply by looking them up in the environment. Array indexes representing variables also are straightforward.
A ` E1 : array[T] | A ` E1 : string | |||
id : T 2 A | A ` E2 : int | A ` E2 : int | ||
|
|
| ||
A ` id : T | A ` E1 [ E2 ] : T | A ` E1 [ E2 ] : int |
There is also a special array constructor expression in Iota (this was omitted from the semantics before Prelim 1):
A ` E1 : int |
A ` E2 : T |
|
A `
new T [ E1 ] ( E2
) : array[T] |
A function call is checked by ensuring that the types of each of the actual expressions supplied as arguments are the same as the declared types of the formal arguments. Note that this rule looks in the environment to find the signature of the function named id.
id : T1 × T2 × ... × Tn! T 2 A |
A ` Ei : Ti (i 2 1..n) |
|
A ` id ( E1,..., En ) : T |
In the following rules, the letter S will be used to denote a single statement. First, let us consider rules associated with the lists of statements. The type of an empty list of statements is unit:
A ` ( ) : unit
The type of a statement list is the same as the type of the final statement. Variable declarations in the sequence apply to all subsequent statements in the block.
A ` S : T |
|
A ` ( S ) : T
|
A ` S1 : T1 |
A ` S1 : T1 | |
S1 is not a declaration |
S1 = id : T [ = E ] | |
A ` ( S2 ; ... ; Sn ) : Tn |
A + { id : T } ` ( S2 ... Sn ) : Tn | |
|
| |
A ` ( S1 ; ... ; Sn ) : Tn |
A ` ( S1 ; ... ; Sn ) : Tn |
Now we will consider type-checking some more interesting statements. A statement that is a variable declaration has type unit if the variable is not initialized by the declaration, or the type (and value) of the variable if it is initialized:
A ` E : T | ||
| ||
A ` id : T : unit | A ` id : T = E : T |
There are two kinds of assignment statements: assignments to ordinary variables, and assignments to array elements:
T not a function type | A ` E1 : array[T] | |
id : T 2 A | A ` E2 : int | |
A ` E : T | A ` E3 : T | |
|
| |
A ` id = E : T | A ` E1 [ E2 ] = E3 : T |
Now let us consider Iota's statements for control flow. The
if
statement requires that its two arms (if present) have the same
type.
A ` E : bool | ||
A ` E : bool | A ` S1 : T | |
A ` S : T | A ` S2 : T | |
|
| |
A ` if (
E ) S : unit |
A ` if (
E ) S1 else
S2 : T |
If the two arms have different types, the whole if
statement has
type unit:
A ` E : bool |
A ` S1 : T1 |
A ` S2 : T2 |
T1 not equal to T2 |
|
A ` if (
E ) S1 else
S2 : unit |
An alternative strategy for formalizing the type-checking of if
statements is to consider all well-typed statements to have type unit
in addition to what we would consider to be their
principal types. With this formalization, we do not need the previous
rule for arms which have different types, because the following general
inference rule ensures that both statements S1 and S2 have
at least the type unit in common:
A ` S : T |
|
A ` S : unit |
Using this rule, an if
statement always can be given type unit
regardless of the types of the two arms. Note that with
this rule, it is possible for a statement to be given two types: its principal
type, and the type unit. The principal type for an expression
or statement is its most specific type; in a language in which an expression can
have more than one type, it is important that it have a principal type. In this
case, the ambiguity about its type, because a statement or expression is well-typed as long as there is
any way to prove it well-typed. Practically it is not a problem,
either, because the type checker can always determine the principal type of a statement
and then relax its type to some other type (i.e., unit) if that is what the containing
context demands.
A while
statement always has the type unit, and
requires that its test expression has type bool. If the while
statement has the loop condition true
, then it can never terminate
normally and may be given the type none. (This also allows a function to
end with an infinite loop.)
A ` E : bool | ||||
A ` S : T | A ` S : T | |||
|
(while) |
|
(infinite loop) | |
A `
while ( E ) S : unit |
A `
while ( true ) S : none |
A return
statement is checked by ensuring that the return type
matches the declared return type of the containing function; this declared
return type is stored in the symbol table under the identifier
return. A return
statement has the type none
rather than the type unit, because this statement transfers control to a
location other than the context in which it occurs. If Iota supported a break
statement, it would be typed similarly. Note that this means that statements
such as int x = (return "hello")
will not type-check even
though they cannot generate dynamic type errors; thus, the rules are overly
conservative.
A ` E : T | ||
T is not unit | ||
return : unit 2 A | return : T 2 A | |
|
| |
A ` return :
none |
A ` return
E : none |