CS 412/413 Spring 2001 Introduction to Compilers

Iota⁺ Language Definition

The Iota⁺ language extends Iota by adding support for object-oriented programming. It is backwards compatible with Iota. As with the Iota language specification given earlier, if you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision. However, you should contact the course staff to resolve ambiguities.

At the high level, Iota and Iota⁺ are similar: a program is composed of some number of modules that are described by interfaces. However, both modules and interfaces support additional kinds of declarations. New expression and statement forms also are supported in Iota⁺. Rather than describe the entire Iota⁺ language from scratch, this document describes only the additions and changes to the Iota language.

We also have a collection of Iota⁺ example programs.

Language grammar and overview

We will take the same approach as in the earlier language definition: grammatical productions are intermixed with descriptions of the new language features they correspond to.

Modules and definitions

As in Iota, a module implementation file starts with a declaration of what external components it accesses (uses), then continues with a series of definitions. uses declarations are needed for all components whose names occur in the module, including interface types and constants. The syntax for a uses declaration is the same as in Iota.

A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons. All top-level definitions are visible throughout the module; there is no need for forward declarations. Two definitions may not use the same name; nor may they conflict with any identifier defined by a uses declaration.

In addition to definitions of functions and variables, as in Iota, new kinds of definitions are supported: interface type definitions, class definitions, and constant definitions. In addition, variable definitions are extended to support initialization expressions that are evaluated when the program starts. The initialization expressions within a module are evaluated in the order in which they occur within the module. When a program contains more than one module, the initialization expressions of the modules are evaluated in some order that is specified at link time. However, all of the initialization expressions of one module in the order are evaluated before the initialization expressions of the next module in the order are begun.

defn ::= formal [ = expr ]
    |    fn_header = expr
    |    id = constant
    |    interface-type
    |    class

Interfaces and declarations

The syntax of an interface also is extended to support new kinds of declarations. In addition to variable and function declaration, an interface supports interface type declarations and constant definitions.

decl ::= formal
    |    fn_header
    |    interface-type
    |    id = constant

fn_header ::= id ( [ formals ] ) [ : type ]

An interface also may begin with a series of uses declarations, which were not permitted in Iota.

interface ::= [ uses ] decl^*

These declarations are useful because they allow an interface to talk about interface types and symbolic constants defined in the interfaces of other modules. Each compilation unit in the language -- module or interface -- may have a set of uses declarations. The uses declarations of other compilation units are not in force in a compilation unit unless it explicitly contains these declarations. For example, if a module uses an interface A that uses an interface B, identifiers in B are not automatically in scope in the module. The only exception to this rule is that the identifiers introduced by uses declarations of the interface to a module are in scope in the module as well, just as the interface types and constants defined by the module's interface are in scope. This rule ensures that a module is consistent with its own interface.

Implicit reference. A module must declare that it uses any identifier that it refers to by name. However, it need not declare that it uses a type that arises as the type of some expression, unless that type is named explicitly. This situation can occur with values of interface types. For example, consider the following interface named A.ii:

interface I {
    clone(): I
}
make() : I
consume(x: I)

Suppose a module contains a uses declaration

uses A.make,
A.consume

, but does not explicitly use A.I. Within the module, it will not be legal to name the type A.I. It will be legal to write the expression consume(make().clone()) -- even though type-checking that expression requires that the compiler understand the type A.I -- because the expression does not name the type A.I explicitly.

Cycles. The uses declarations in all the interfaces in a program define a dependency graph among these interfaces, in the sense that one interface depends on another if it uses an identifier from the second. It is a static error for the graph of interface dependencies in the program to contain a cycle. For example, two interfaces may not each contain a uses declaration that mentions an identifier in the other interface.

Reimporting. The identifiers introduced by a uses declaration in one interface may also be referenced by a uses declaration in another interface. For example, if interface foo contains a declaration uses bar.x, bz = bar.z, then another interface baz can import the identifiers foo.x and foo.bz by including the declaration uses foo.x, foo.bz. If an interface type is imported under multiple names, the compiler treats the different names as denoting the same type.

Constants. Both interfaces and modules may contain definitions of constants. When an identifier is declared to have a constant value, the compiler treats that identifier everywhere as representing that value, exactly as if the constant expression were used in place of the identifier. Constants only may be integers, booleans, strings, or the new constant null. The type of the identifier is not declared explicitly; it is determined from the constant. Note that it is illegal to assign to a constant identifiers!

Interface type and class declarations

In addition to the built-in types supported by Iota, Iota⁺ supports two new kinds of types: class types and interface types. Both class types and interface types are denoted by using their declared names -- or in the case of interface types that are referenced through a uses declaration, by the name introduced in the declaration. An interface type defined in an interface file may be referenced by both interface files and module files. A class or interface type may be used anywhere that a built-in type may be used; in particular, it may be used as a variable type, a function or method argument type, a return type, and as a parameter to an array type.

type ::= id
    |    object
    |    int
    |    string
    |    bool
    |    array [ type ]

A class or interface type supports various operations that are described in the declaration of the type. The compiler permits exactly the operations described in the type declaration and the declarations of any supertypes of the type. An interface type declaration permits only the declaration of object methods:

interface-type ::= interface id [ extends id ] { fn_header^* }

The method declarations look exactly like function declarations at the top level of an interface; however, they are methods rather than functions. The object of the interface type is implicitly an argument to the method.

An interface type I optionally may declare that it extends another interface type I'. In this case, the signature of each method of I must conform to the signature of the method of the same name in I' (if any), and similarly in every other interface that it transitively extends. Unlike in Java, different methods of interface and class types never have the same name; method overloading is not permitted. If a subtype I of the interface type I' has a method m with the same name as some method m' declared in I', then the signature of m must conform to the signature of m'. Conformance requires that the number of arguments in m and m' must be the same. In addition, the type of each argument in m must be the same as, or a supertype of, the argument in the corresponding position in m'. The names of arguments need not be identical in the two interface types. The return type of m must the same as or a subtype of the return type of m'. If m' has no return type, then m may declare any return type or simply not declare a return type.

Just as an interface type declaration is similar to an entire Iota interface, a class definition is similar to an entire Iota module. A class definition may contain definitions of fields (instance variables) and methods. Unlike in Java, fields and methods do not have visibility modifiers such as public and private; such modifiers are unnecessary because interfaces determine visibility. Components of a module that are declared in the module's interface are public, and other components are automatically private. Note that unlike in Java, classes do not have final fields or methods, nor are there static fields or methods. In Iota⁺, the role of static fields and methods is fulfilled by module variables and functions, which are not attached to a specific class.

class ::= class id [ implements id ] { classdefn^* }
classdefn ::= formal
| fn_header = expr

When class C declares that it implements an interface type I, it must implement every method that I declares. The signatures of class methods must conform to the signatures of the interface that it extends, transitively. This checking is performed in exactly the manner described above for interface subtype declarations. Neither a class nor an interface may introduce a new member (field or method) with the same name as a member of a supertype. However, the signature of a supertype method may be refined as described above. Field types may not be refined since fields are mutable.

The type system and subtyping

The type system of Iota⁺ is more complex than that of Iota. In addition to the new class types and interface types that may be declared by programmers and the type unit, we introduce several new types: none, Null, object, and bool _ object. The type none is used to represent the type of statements or expressions that cannot terminate normally. The type Null is the type of the expression null—the null object reference. The type object is the supertype of all class and interface types, and of string and array types. The type bool _ object represents a value that is either of type bool or object.

The subtype relation on types is a preorder, shown in the following diagram:

All types in the Iota⁺ type system except none are subtypes of unit. This subtype relation makes sense because a containing context expecting a unit will discard whatever value it receives; therefore, every value may be used as a unit. As in Java, null references are considered to be members of every object type. Therefore, the type Null is a subtype of every every class or interface type, and also a subtype of the built-in types string and array[T]. However, it is not a subtype of the built-in types bool and int. The new type none represents non-termination or termination that does not return control to its containing context. It is not a subtype of any other type, because it does not provide a value. (It would be sound to consider none a subtype of all other types, including unit; however, it is not considered a subtype in Iota⁺ in order to allow the compiler to catch some programming errors.) For example, the return statement is given the type none, since it does not return control to its containing context. Consider the following statement:

x: int = if (b) ( return "message" ) else 0;

This statement cannot result in a run-time type error, because no value ever will be assigned to x if the first arm is executed. It is legal in Iota⁺ even though it is illegal in Iota.

Iota⁺ allows expressions of the type object to be used in conditional expressions. We thus need an additional type for expressions that can be either bool or object. This type is written as bool _ object.

Note that Iota⁺ gives no way for the programmer to talk explicitly about the types unit, bool _ object, Null, and none. These types are part of the type system but may not be used in declarations. Therefore, a variable never has one of these types. Their names are not keywords or reserved words and may be used as legal identifiers. However, the type object may be denoted in a program using the reserved identifier object, just like the types int and bool.

Method definitions

A method is defined as an expression that is evaluated at the time that it is called, initializing the variables of the formal arguments with the actual values passed. As in Java, when a value of class type or interface type is passed, the called method receives references to the same objects that the calling code passed. A method body differs from a function body in that the additional implicit argument this, representing the receiver object, is in scope within the method body. Unlike the other arguments, this is not an assignable variable.

The syntax for formal arguments of both functions and methods is extended to allow a more compact denotation of multiple formal arguments in a row that share the same type. For example, the argument list x,y: int, z: bool is equivalent to the argument list x:int, y:int, z:bool.

formals ::= formals , formal
| formal
formal ::= ids : type

ids ::= id | ids , id

Expressions

Iota⁺ adds several new kinds of expressions to Iota, with the following forms:

primary ::=    ...
     |    this
     |    primary . function_call
     |    cast ( expr , type )
     |    new type

variable ::=    ...
    |    primary . id

constructor ::= new type [ expr ] ( expr )

binary_op ::=    ...     |     <=      |    >=     |     !=

constant ::=     ...
         |    null

The expression this may be used within method bodies, as described earlier, to denote the receiver object. In an ordinary function body, it is a static error to use this expression. Note that the identifier this is a keyword and cannot be used as the name of an ordinary variable.

The expression new C creates a new object of the class C, where C is the name of the class. When a new object of a class is created, all of its fields are initialized to their default values. For all types that are supertypes of Null, the default value of the type is null. For integers, the default value is 0, and for booleans, it is false. Unlike in Java, classes have no constructors. By convention, classes declare a method init that performs proper initialization of an object, establishing necessary representation invariants. The pretty-printer sample code above contains some examples of this convention.

The dot operator (.) is used to indicate an access to one of the members of an object, either a field or a method. These accesses operate as in Java. The static type of the left-hand-side of the dot operator (the receiver) is used to determine whether the field access or method invocation is legal. A field access can be legal only if the type of the receiver is a class type, because interface types have no fields. In the case of a method invocation, the receiver type may be either a class type or an interface type. The arguments passed to the method must agree with the types of formal arguments of the method, as declared in the static type of the receiver, or, if the static type of the method does not declare the method, in its closest (lowest) supertype that declares the method.

Within a method, the fields and methods of the receiver object are automatically placed in scope and can be accessed as if they were module variables or functions. For example, if the class of the receiver object contains a field named a, then the expressions this.a and a are equivalent within that method body. No scoping conflict between fields and methods of the object and other identifiers is permitted. The names of fields of a class must differ from the names of all visible variables, functions, classes and interface types.

The binary operators are extended to complete the usual set of six relational operators, with the usual semantics. Note that booleans can be compared using the relational operators, and that false is considered to be less than true. Strings are compared lexicographically, unlike in Java. Objects of array, class, and interface types only support the == and != relational operators, which implement pointer equality: two objects are considered equal only if they are exactly the same object. Two values may be tested for equality or inequality even if they do not have exactly the same static type. Two values may be compared as long as they have a common supertype that is neither unit or none or bool _ object. This rule allows a value of static type object to be compared with a value of static type string; in this case, the two values are equal only if they are exactly the same object. This implies that null is not equal to any string. Also, we define null to be equal to itself.

Thus, two objects that turn out to be strings at run time may be compared lexicographically by comparing them at the type string; their pointers may be compared by comparing them at the type object, as in the following example:

s1: string = "hello"; o1: object = s1;
s2: string = "hello"; o2: object = s2;
s1 == s2; // true
o1 == o2; // false

The cast operation provides a way to change the type of an object reference. It takes two arguments: an expression and a new type. The first argument is the expression whose value is to be cast; the second argument is the desired type of the cast expression. This expression requires that both the static type of the expression being cast and the desired target type are subtypes of object. The validity of the cast expression is checked at runtime: invalid casts evaluate to the null object. Note that since string and array[T] are subtypes of object it is possible to write casts involving these types; for implementation reasons, these casts are treated somewhat differently from other casts. It is legal to cast successfully from an object whose actual (run-time) type is a string or array type to that same actual type. However, it is an unchecked run-time error for a cast to or from a string or array type to fail: the behavior of the program is unspecified. That is, it is an error to cast to string or array[T] if the object's actual type is not that type. It is also an unchecked run-time error to cast a string or array object to any type other than its run-time type.

The new constant expression null returns the null object, whose type is Null, as described earlier. Since Null is a subtype of every object type, the null object is a member of every object type. The effect of this membership is that every access to an object member (field or method) must check first whether the object is the null object. If so, the program halts immediately with an appropriate error message. This check also is required for all string and array operations, such as array and string index expressions and string concatenation. Array and string index expressions are also required in Iota⁺ to be checked at run time to ensure that the index is not out of bounds.

In certain contexts where a value of type bool is expected in Iota, a value of any subtype of object may be used instead. The value is automatically treated as a boolean in the following way: null is treated as false, and all other values are treated as true. The contexts where an automatic conversion is performed are the following: the test expressions of if and while statements, and the arguments to the logical operators !, & and |. For example, the following two if statements are equivalent:

if (o != null) ...
if (o) ...

Note that there is no automatic conversion from integers to booleans, as there is in the C language.

Statements

Iota⁺ adds a few more statement forms to Iota. The new increment and decrement statements increase or decrease the value of an integer variable by one. Unlike in C and Java, the value of the statement is the result of the increment. The statement v++ is not precisely equivalent to the statement v=v+1, however, because the location of the variable v is computed only once. For example, the statement a[(i++)]++ is equivalent to (i++; a[i]++) because the index expression (i++) is evaluated just once.

stmt ::= ...
     |    variable ++
     |    variable --
     |    break

The new statement break terminates the closest enclosing while statement, just as in Java. A break statement may occur only within a while statement.

The syntax of assignment is extended to permit multiple assignment within a single statement. An assignment may take the form a = b = c = E, where a-c are variables, and E is the expression to be assigned to each of the variables. The production stmt ::= variable = expr is replaced by a new production stmt ::= assignment. The non-terminal assignment is defined as follows:

assignment ::= variable = expr
| variable = assignment

An assignment contains a sequence of variables , each of which is to be assigned the value returned by the expression. Note that assignment is right-associative: first, the expression is evaluated, then each of the expressions denoting variables are evaluated in turn from right to left, with each assignment completed before the evaluation of the previous variable.

In Iota, a local variable declaration may include an initializing expression. In Iota⁺, multiple local variables may be declared in the same statement, because the production stmt ::= id : type [ = expr ] is replaced by the production stmt ::= var_decl, with var_decl defined as follows:

var_decl ::= formal = expr
| formal = assignment
| formal

For example, the statement

i,j:
int = 1

introduces two variables i and j, and initializes them both to 1. (Note that this semantics differs from C and Java.) The right-hand side expression is evaluated only once, not once for each variable being initialized. The right-hand side expression may be an assignment itself, as in the statement i,j: int = k = m = 0.

Certain statements in Iota⁺ have the type none: break and return statements. Certain kinds of while statements are also given this type: statements of the form while (true) S, where S does not contain any break statements that exit the loop. In each of these cases, the statement following the statement in question would not receive control, and therefore the statement is given the type none. Other while statements are given the type unit, just as in Iota.

If an if statement has two arms, one with some type T and other with the type none, the type of the statement is the type T. For example, the following statements have the type bool and none, respectively.

if (b) return "Hello"
else false;

if (b) return 0
else   return 1

(One way to interpret this rule is that in this particular context, none is considered to be a subtype of all other types.) To help the Iota⁺ programmer avoid programming mistakes, none is not ordinarily considered a subtype of other types; in particular, a statement whose type is none may not be followed by any other statement, since that following statement could not be executed. In other words, if such a statement occurs within a block of statements, it must occur at the end of the block.

This typing has the advantage that it permits functions to end with return statements, making programming in Iota⁺ less confusing for programmers accustomed to C or Java. In addition, the body of a function may have the type none in addition to its declared type; this is the other context in which none is considered a subtype of all types T. For example, the following function definitions are both legal in Iota⁺, and equivalent:

inc1(x: int): int = ( return x+1 )
inc2(x: int): int = x+1

The first is legal in Iota⁺ because the type of the function body is none, and the return statement also returns an expression of the declared type int.

Keywords

Iota⁺ introduces a number of new keywords that have been mentioned in this document. The keywords of the language include all the keywords of Iota (int, string, bool, array, uses, if, else, while, return, new, length, true and false), and the new keywords mentioned here ( break, cast, class, extends, implements, interface, null, object, this ).

Iota⁺ also adds new multi-character non-alphabetic tokens, such as ++, --, <=, >=, !=.

Summary of new features

The following list summarizes all the new features of Iota⁺, for convenience in checking whether the compiler properly implements the language.

uses in interface with acyclic constraint
Interface type definitions in interface file
Class definitions in module file
Initialization expressions for module variables
Constant definitions in both interfaces and modules
Declaration of fields and methods within a class
Declaration of methods within an interface
extends clause for interfaces types
implements clause for classes
Full contra/covariant subtyping for methods
New none, object, Null, and bool _ object types
null constant/null checking
new C expression
Default values for all fields and variables: 0, false, or null
Convention of init() method
this expression
Array and string index bounds checking
cast expression
Use of object types as booleans by automatic conversion
Multiple variables with same type in formals list
Operator "." to access fields or methods
Fields and methods of receiver object in scope within methods
Complete relational operators
Increment/decrement statements
Break statement
Multiple assignment (a = b = c)
Typing of return and break as none
Typing of while(true) S as none
No statement following statements typed as none

Standard libraries

Iota⁺ provides the same two standard modules for programming: io and conv. However, the io module is extended with some additional functionality. It also adds new interfaces for input and output stream objects, and provides objects stdin and stdout that may be used to read and write data from and to the standard input and output device. The new declarations in the interface are as follows:

interface OutputStream {
    print(s: string)
        // output the string s
    putc(c: int)
        // output the character whose ASCII value is c
    flush()
        // make sure any pending output is flushed
}

interface InputStream {
    readln(): string
        // read a line from the input
    getc(): int
        // read a single character from the input
    eof(): bool
        // return whether the end of file has been reached on the input
}

stdout: OutputStream
stdin: InputStream

In addition, a new file module is provided for file I/O:

uses io.InputStream, io.OutputStream

interface FileOutput extends OutputStream {
    close()
}
interface FileInput extends InputStream {
    close()
}

read(filename: string): FileInput
  // Open the named file for reading.  Return null on failure.
create(filename: string): FileOutput
  // If the file does not exist, create it and return a FileOutput that
  // will write to the file.  If the file cannot be created, return null.
append(filename: string): FileOutput
  // If the file can be opened, return a FileOutput that appends to it.
  // Otherwise, return null.
remove(filename: string): bool
  // Remove the file and return true if successful

The following modules have also been made available: