CS412/413 Spring 2000 Introduction to Compilers and Translators

Iota+ Language Definition


In the fifth programming assignment, we will be extending our Iota compilers to support a more sophisticated language called Iota+.  This language extends Iota by adding support for object-oriented programming.  It is backwards compatible with Iota.  As with the Iota language specification given earlier, if you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision.  However, you should contact the course staff to resolve ambiguities.

At the high level, Iota and Iota+ are similar: a program is composed of some number of modules that are described by interfaces.  However, both modules and interfaces support additional kinds of declarations.  New expression and statement forms also are supported in Iota+.  Rather than describe the entire Iota+ language from scratch, this document describes only the additions and changes to the Iota language.

Examples

PrettyPrinter (180 lines)
New io and file interfaces

Language grammar and overview

We will take the same approach as in the earlier language definition: grammatical productions are intermixed with descriptions of the new language features they correspond to.

Modules and definitions

As in Iota, a module implementation file starts with a declaration of what external components it accesses (uses), then continues with a series of definitions.  uses declarations are needed for all components referenced from external interface files, including interface types and constants.  The syntax for a uses declaration is the same as in Iota.

A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons.  All top-level definitions are visible throughout the module; there is no need for forward declarations.  Two definitions may not use the same name; nor may they conflict with any identifier defined by a uses declaration.

In addition to definitions of functions and variables, as in Iota, new kinds of definitions are supported: interface type definitions, class definitions, and constant definitions.  In addition, variable definitions are extended to support initialization expressions that are evaluated when the program starts.  The initialization expressions within a module are evaluated in the order in which they occur within the module.  When a program contains more than one module, the initialization expressions of the modules are evaluated in some order that is specified at link time.  However, all of the initialization expressions of one module in the order are evaluated before the initialization expressions of the next module in the order are begun.

defn ::= formal [ = expr ]
    |    fn_header = expr
    |    id = constant
    |    interface-type
    |    class

Interfaces and declarations

The syntax of an interface also is extended to support new kinds of declarations.  In addition to variable and function declaration, an interface supports interface type declarations and constant definitions.

decl ::= formal
    |    fn_header
    |    interface-type
    |    id = constant

fn_header ::= id ( [ formals ] ) [ : type ]

An interface also may begin with a series of uses declarations, which were not permitted in Iota.

interface ::= [ uses ] decl*

These declarations are useful because they allow an interface to talk about interface types and symbolic constants defined in the interfaces of other modules.  Each compilation unit in the language -- module or interface -- may have a set of uses declarations.  The uses declarations of other compilation units are not in force in a compilation unit unless it explicitly contains these declarations.  The only exception to this rule is that the identifiers introduced by uses declarations of the interface to a module are in scope in the body of the module as well, just as the interface types and constants defined by the module's interface are in scope.  This rule ensures that a module is consistent with its own interface.  

The uses declarations in all the interfaces in a program define a dependency graph among these interfaces, in the sense that one interface depends on another if it uses an identifier from the second.  It is a static error for the graph of interface dependencies in the program to contain a cycle.  For example, two interfaces may not each contain a uses declaration that mentions an identifier in the other interface. (Implementation of this rule is optional.)

The identifiers introduced by a uses declaration in one interface may also be referenced by a uses declaration in another interface. (This is optional for Spring 2000.) For example, if interface foo contains a declaration uses bar.x, bz = bar.z, then another interface baz can import the identifiers foo.x and foo.bz by including the declaration uses foo.x, foo.bz.

Both interfaces and modules may contain definitions of constants.  When an identifier is declared to have a constant value, the compiler treats that identifier everywhere as representing that value, exactly as if the constant expression were used in place of the identifier.  Constants only may be integers, booleans, strings, or the new constant null. The type of the identifier is not declared explicitly; it is determined from the constant. Note that constant identifiers may not be assigned to!

Interface type and class declarations

In addition to the built-in types supported by Iota, Iota+ supports two new kinds of types: class types and interface types.  Both class types and interface types are denoted by using their declared names, or in the case of interface types referenced through a uses declaration, the appropriate name introduced by the declaration.  An interface type defined in an interface file may be referenced by both interface files and module files; however, a uses declaration is necessary if the referring module is different from the defining module.  A class or interface type may be used anywhere that a built-in type may be used; in particular, it may be used as a variable type, a function or method argument type, a return type, and as a parameter to an array type.

type ::= id
    |    object
    |    int
    |    string
    |    bool
    |    array [ type ]

A class or interface type supports various operations that are described in the declaration of the type.  The compiler permits exactly the operations described in the type declaration and the declarations of any supertypes of the type.  An interface type declaration permits only the declaration of object methods:

interface-type ::= interface id [ extends id ] { fn_header* }

The method declarations look exactly like function declarations at the top level of an interface; however, they are methods rather than functions.  The object of the interface type is implicitly an argument to the method.

An interface type I optionally may declare that it extends another interface type I'.  In this case, the signature of each method of I must conform to the signature of the method of the same name in I' (if any), and similarly in every other interface that it transitively extends.  Unlike in Java, different methods of interface and class types never have the same name; method overloading is not permitted. If a subtype I of the interface type I' has a method m with the same name as some method m' declared in I', then the signature of m must conform to the signature of m'.  Conformance requires that the number of arguments in m and m' must be the same.  In addition, the type of each argument in m must be the same as, or a supertype of, the argument in the corresponding position in m'.  The names of arguments need not be identical in the two interface types.  The return type of m must the same as or a subtype of the return type of m'.  If m' has no return type, then m may declare any return type or simply not declare a return type.

Just as an interface type declaration is similar to an entire Iota interface, a class definition is similar to an entire Iota module.  A class definition may contain definitions of fields (instance variables) and methods.  Unlike in Java, fields and methods do not have visibility modifiers such as public and private; such modifiers are unnecessary because interfaces determine visibility.  Components of a module that are declared in the module's interface are public, and other components are automatically private.  Note that unlike in Java, classes do not have final fields or methods; nor are there static fields or methods.  In Iota+, the role of static fields and methods is fulfilled by module variables and functions, which are not attached to a specific class.

class ::= class id [ implements id ] { classdefn* }
classdefn ::= formal
     |    fn_header = expr

When class C declares that it implements an interface type I, it must implement every method that I declares.  The signatures of class methods must conform to the signatures of the interface that it extends, transitively.  This checking is performed in exactly the manner described above for interface subtype declarations. Neither a class nor an interface may introduce a new member (field or method) with the same name as a member of a supertype.  However, the signature of a supertype method may be refined as described above.

The type system and subtyping

The type system of Iota+ is more complex than that of Iota.  In addition to the new class types and interface types that may be declared by programmers and the pseudo-types unit and none, we introduce the type Null to represent the type of the expression null, representing a null object reference.  The new type object is the supertype of all class and interface types, and of string and array types. As in Iota, the pseudo-type none is used to represent the type of statements or expressions that cannot terminate normally.

All types in the Iota+ type system are considered formally to be subtypes of unit.  This subtype relation makes sense because a containing context expecting a unit will discard whatever value it receives; therefore, every value type may be used as a unit.

As in Java, null references are considered to be members of every object type. Therefore, the type Null is considered to be a subtype of every class or interface type, and also a subtype of the built-in types string and array[T].  However, it is not a subtype of the built-in types bool and int.

The pseudo-type none represents non-termination, and is considered to be the subtype of every other type in the type system, including Null.  It is the bottom of the pre-order (actually, lattice) on types defined by the subtype relation.  The top of the pre-order is the type unit.  The type none is the universal subtype because an expression that does not terminate normally will never produce a value that its containing context does not expect.  For example, the return statement is given the type none, since it does not terminate with an ordinary value.  Consider the following statement:

x: int = ( return "foo" )

This statement cannot result in a run-time type error, because no value ever will be assigned to x.

Iota+ allows expressions of the object type to be used in conditional expressions.  We thus need an additional pseudo-type for expressions that can be either bool or object.  This pseudo-type will be denoted bool _ object.

Note that Iota+ gives no way for the programmer to talk about the types unit, bool _ object, Null, and none.  These types are part of the type system but may not be used in declarations.  Their names are not keywords or reserved words and can thus potentially be used as legal identifiers in an Iota+ program.  However, the type object may be denoted in a program using the keyword object, just as with the types int and bool.

The subtype hierarchy of Iota+ just described is diagrammed as follows:

Method definition

A method is defined as an expression that is evaluated at the time that the function is called, initializing the variables of the formal arguments with the actual values passed. As in Java, when a value of class type or interface type is passed, the called method receives references to the same objects that the calling code passed.  A method body differs from a function body in that the additional implicit argument this, representing the receiver object, is in scope within the method body.  Unlike the other arguments, this is not an assignable variable.

The syntax for formal arguments of both functions and methods is extended to allow a more compact denotation of multiple formals arguments in a row sharing the same type.  For example, the argument list x,y: int, z: bool is equivalent to the argument list x:int, y:int, z:bool.

formals ::= formals , formal 
     |             formal
formal ::= ids : type

ids ::= id     |    ids , id

Expressions

Iota+ adds several new kinds of expressions to Iota.

Expressions may take one of several new forms.

expr ::= ...
     |  new type

primary ::= ... 
     |    this
     |    primary . function_call
     |    cast ( expr , type )

variable ::= ...
    |    primary . id

constructor ::= new type [ expr ] ( expr )

The expression this may be used within method bodies, as described earlier, to denote the receiver object.  In an ordinary function body, it is a static error to use this expression.  Note that the identifier this is a keyword and cannot be used as the name of an ordinary variable.

The expression new C creates a new object of the class C, where C is the name of the class.  When a new object of a class is created, all of its fields are initialized to their default values.  For all types that are supertypes of Null, the default value of the type is null.  For integers, the default value is 0, and for booleans, it is false.  Unlike in Java, classes have no constructors. By convention, classes declare a method init that performs proper initialization of an object, establishing necessary representation invariants.  The pretty-printer sample code above contains some examples of this convention.

The dot operator (.) is used to indicate an access to one of the members of an object, either a field or a method.  These accesses operate as in Java.  The static type of the left-hand-side of the dot operator (the receiver) is used to determine whether the field access or method invocation is legal.  A field access can be legal only if the type of the receiver is a class type, because interface types have no fields.  In the case of a method invocation, the receiver type may be either a class type or an interface type.  The arguments passed to the method must agree with the types of formal arguments of the method, as declared in the static type of the receiver, or, if the static type of the method does not declare the method, in its closest (lowest) supertype that declares the method.

Within a method, the fields and methods of the receiver object are automatically placed in scope and can be accessed as if they were module variables or functions.  For example, if the class of the receiver object contains a field named a, then the expressions this.a and a are equivalent within that method body. No scoping conflict between fields and methods of the object and other identifiers is permitted.  The names of fields of a class must differ from the names of all visible variables, functions, classes and interface types.

The binary operators are extended to complete the usual set of six relational operators, with the usual semantics.  Note that booleans can be compared using the relational operators, and that false is consider to be less than true. Strings are compared lexicographically, unlike in Java (This is true in Iota as well). Objects of array, class, and interface types only support the == and != relational operators, which implement pointer equality: two objects are considered equal only if they are exactly the same object.  Two values may be tested for equality or inequality even if they do not have exactly the same static type.  Two values may be compared as long as they have a common supertype that is neither unit or none or bool _ object.  This rule allows a value of static type object to be compared with a value of static type string; in this case, the two values are equal only if they are exactly the same object.

The pseudo-function cast provides a way to change the type of an object reference.  It takes two arguments: an expression and a new type.  The first argument is the expression whose value is to be cast; the second argument is the desired type of the cast expression.  This expression requires that both the static type of the expression being cast and the desired target type are subtypes of object.  The validity of the cast expression is checked at runtime: invalid casts evaluate to the null object.

binary_op ::= ...     |     <=      |    >=     |     !=

The new constant expression null returns the null object, whose type is Null, as described earlier.  Since Null is a subtype of every object type, the null object is a member of every object type.  The effect of this membership is that every access to an object member (field or method) must check first whether the object is the null object.  If so, the program halts immediately with an appropriate error message.  This check also is required for all string and array operations, such as array and string index expressions and string concatenation.

constant ::= ...
         |    null

Array and string index expressions are required in Iota+ to be checked at run time to ensure that the index is not out of bounds.

In certain contexts where a value of type bool is expected in Iota, a value of any subtype of object may be used instead.  The value is automatically converted to a boolean in the following way: null is converted to false, and all other values are converted to true.  The contexts where an automatic conversion is performed are the following: the test expressions of if and while statements, and the arguments to the logical operators !, & and |.  For example, the following two if statements are equivalent:

if (o != null) ...
if (o) ...

Note that there is no automatic conversion from integers to booleans, as in the C language.

Statements

Iota+ adds a few more statement forms to Iota.  The new increment and decrement statements increase or decrease the value of an integer variable by one. Unlike in C and Java, the value of the statement is the result of the increment. The statement v++ is not precisely equivalent to the statement v=v+1, however, because the location of the variable v is computed only once. For example, the statement a[(i++)]++ is equivalent to (i++; a[i]++) because the index expression (i++) is evaluated just once.

stmt ::= ... 
     |    variable ++
     |    variable --
     |    break

The new statement break terminates the closest enclosing while statement, just as in Java.  A break statement may occur only within a while statement.

The syntax of assignment is extended to permit multiple assignment within a single statement.  An assignment may take the form a = b = c = E, where a-c are variables, and E is the expression to be assigned to each of the variables.  The production stmt ::= variable = expr is replaced by a new production stmt ::= assignment.  The non-terminal assignment is defined as follows:

assignment ::= variable = expr
     |    formal = expr
     |    variable = assignment
     |    formal = assignment

Note that assignment is a right-associative operator: first, the expression is evaluated, then each of the expressions denoting variables are evaluated in turn from right to left, with each assignment completed before the evaluation of the previous variable.

In Iota, a local variable declaration may include an initializing expression.  In   Iota+, multiple local variables may be declared in the same statement, because of the redefinition of the formal non-terminal.  For example, the statement i,j: int = 1 introduces two variables i and j, and initializes them both to 1.  (Note that this differs from C and Java.) The right-hand side expression is evaluated only once, not once for each variable being initialized. The right-hand side expression may be an assignment itself, and the declaration and initialization of new variables may serve as assignments as well. Note that a statement may still consist of a variable declaration without initialization, as in Iota: the production stmt ::= id : type [ = expr ] is replaced by the production stmt ::= formal

Certain statements in Iota+ have the type none: break statements and return statements.  In addition, an if statement produces the type none if both of its arms have that type, as in the following example:

if (b) return 0
else return 1

However, the type of this if statement results simply from the application of the usual rule for checking if statements statically.

A statement whose type is none may not be followed by any other statement, since that following statement could not be executed.  In other words, if such a statement occurs within a block of statements, it must occur at the end of the block.

This typing has the advantage that it permits functions to end with return statements, which avoids confusing programmers used to C or Java syntax.  For example, the function

inc(x: int): int = ( return x+1 )

is legal because the type of the function body is none, which is a subtype of int, and the return statement returns the proper type.

Keywords

Iota+ introduces a number of new keywords that have been mentioned in this document.  The keywords of the language include all the keywords of Iota ( array, bool, false, if, int, length, return, string, true, uses, while), and the new keywords mentioned here ( break, cast, new, null, object, this ).

Note also that Iota+ adds new multi-character non-alphabetic tokens, such as ++, --, <=, >=, !=.

Summary of new features

The following list summarizes all the new features of Iota+, for convenience in checking whether the compiler properly implements the language.

Standard libraries

Iota+ provides the same two standard modules for programming: io and conv.  However, the io module is extended with some additional functionality.  It also adds new interfaces for input and output stream objects, and provides objects stdin and stdout that may be used to read and write data from and to the standard input and output device.  The new declarations in the interface are as follows:

interface OutputStream {
    print(s: string)
        // output the string s
    putc(c: int)
        // output the character whose ASCII value is c
    flush()
        // make sure any pending output is flushed
}

interface InputStream {
    readln(): string
        // read a line from the input
    getc(): int
        // read a single character from the input
    eof(): bool
        // return whether the end of file has been reached on the input
}

stdout: OutputStream
stdin: InputStream

In addition, a new file module is provided for file I/O:

uses io.InputStream, io.OutputStream

interface FileOutput extends OutputStream {
    close()
}
interface FileInput extends InputStream {
    close()
}

read(filename: string): FileInput
  // Open the named file for reading.  Return null on failure.
create(filename: string): FileOutput
  // If the file does not exist, create it and return a FileOutput that
  // will write to the file.  If the file cannot be created, return null.
append(filename: string): FileOutput
  // If the file can be opened, return a FileOutput that appends to it.
  // Otherwise, return null.
remove(filename: string): bool
  // Remove the file and return true if successful