io.ta (n) 1: the 9th letter of the Greek alphabet (i) 2: an infinitesimal particle or amount
Iota is a simple language for which we will be writing compilers in CS412/413. The letter iota is the Greek equivalent of the letter "I"; you can think of this language as "J--". This language definition will be updated periodically as the need for clarifications or corrections arises. If you see ambiguity in this specification, you have the flexibility to resolve it in the manner you wish as long as you can justify the reasonableness of your decision. You also may contact the course staff to resolve ambiguities. Later in the course, some extensions will be added to Iota that will make it into an object-oriented language a little like the Java programming language; the extended version of Iota is known as Iota+.
A program in Iota is composed of some number of modules. A module collects together some functions and state in the form of variables. Components of a module may be used from other modules if they are exported in the module's interface. A module is defined in two files: an implementation file and an interface file. The implementation file provides all the actual code associated with the module, and the interface file declares which parts of the module are accessible from other modules. The name of a module is determined by the name of the file containing its definition. A module named M has an implementation file named M.im and an interface file named M.ii.
A module must define everything that its interface declares, but it may also define additional variables and functions that are not accessible from outside the module.
fibonacci(x: int): int = ( if (x < 2) 1 else fibonacci(x - 1) + fibonacci(x - 2) ) square(x: int):int = x*xAn interface for the same module might look as follows:
fibonacci(x: int): int /* compute the x'th Fibonacci number */ square(x: int):int /* compute the square of x */Here is some Iota code to find the greatest common divisor of two numbers:
gcd(x: int, y: int): int = ( while (!(x == 0)) ( if (!(x < y)) x = x % y else ( // swap x and y temp:int = x; x = y; y = temp; ) ); y )Code for a quicksort routine and a Hello World program is also given at the end of this document. Examples of interfaces are presented in the description of the standard Iota libraries.
The non-terminal uses represents a declaration that the code of this module will use an external component. There are two kinds of use declarations. A use of the form M.N, where M and N are identifiers, means that the identifier N in module M may be used in this module under the name N. A use of the form V = M.N means that the identifier N in module M may be used in this module under the name V. All identifiers are simple identifiers; qualified identifiers of the form M.N may appear only inmodule ::= [ uses ] defn*
uses ::=uses
use ( , use )*
use ::= id . id | id = id . id
uses
declarations. The latter form is helpful in avoiding name
conflicts between different modules. Note that to understand what kind
of component N is, the interface file for the other module M
is checked.
A module contains one or more definitions of variables or functions. The definitions may not be separated by semicolons. All top-level definitions are visible throughout the module; there is no need for forward declarations. Two definitions may not use the same name; nor may they conflict with any identifier defined by a use declaration.
A definition may define two kinds of functions: functions that return a value, and functions that do not. Functions of the former sort are indicated by writing the type of their return value after the closing parenthesis of the formal arguments.
defn ::= id : type
| id ( [ formals ] ) [ : type ] = expr
int
,
string
,
and bool
. Also, for any type T,
the type
array[
T]
represents
an array of that type. So we can write the types array[bool]
and array[array[string]]
.
No record (struct) or object types are supported; they will appear in Iota+.
As in Java, the built-in types int
, string
, and
bool
are all immutable. A variable of this type can be reassigned
to a different value of the type, but the value itself cannot be modified.
Arrays, as in Java, are mutable values. Array elements can be assigned
new values, changing the contents of the array. The length of an array
is not mutable, just as with Java.
A formal declares an argument to a function and its type. A function may be called only if the actual arguments to the function have a type that is compatible with (in Iota, the same as) the type of the formal arguments.type ::=
int
|bool
|string
|array
[ type ]
formals ::= formals , formal | formal
formal ::= id : type
interface ::= decl*
decl ::= id : type
| id ( [ formals ] ) [ : type ]
Expressions may take one of several forms:
There are three kinds of constants in Iota: integer constants, string constants, and boolean constants. The forms of integers and strings are described below in Lexical considerations. The non-terminal primary represents an expression with high precedence -- an expression that can be used as the left-hand-side of an array index. These expressions include variable names, array index expressions, and function call expressions (which are described below).expr ::= constant
| primary
| expr binary_op expr
| unary_op expr
| constructor
constant ::= string
| integer
|true
|false
primary ::= variable
| function_call
| stmt_expr
variable ::= id
| primary [ expr ]
Operators in Iota may be binary operators such as the operator +, which
are written in the usual form, e.g. "1+2
". There are also unary
operators such as negation (e.g., -2
) and logical negation (e.g.,
!done
).
Operators have the same precedence and associativity that they do in Java,
and unless specified, the same meaning.
The operator + may be be used to add two integers or to concatenate two strings. The operators -, *, /, and % operate only on integers. A divisor or modulus of zero is checked, and results in the program terminating. The operators & and | operate only on booleans; they are short-circuit operators as in Java. The operator == operates on any two values of the same type. For strings, it tests whether the two strings contain exactly the same characters, which is different from Java. For arrays, the operatorbinary_op ::=
+
|-
| * | / | % | & | | |==
|<
|>
unary_op ::= - | ! | length
==
tests whether they
are the same array object (pointer equality). The <
and >
operators may be used to compare integers in the ordinary way; it may also be
used to compare strings to determine their lexicographic ordering. The unary operator length
reports the length of arrays and strings, as an integer.
Brackets may be used to index both arrays and strings, extracting either the component of the array with that index, or the ASCII code of the character with that index, respectively. Both arrays and strings are zero-indexed, so the largest index that may be used is one less than the length of the array or string. Out-of-bounds errors are checked for and terminate the program with an error if encountered.
constructor ::=
new
type [ expr ] ( expr )
A new array object is created by an array constructor expression. The keyword
new
is
followed by element type, the number of elements in the array, and the
expression to initialize the elements with. For example, the expression
(new
int[5](1)) creates an array[int] containing 5 elements, which
all are initialized to contain the integer 1.
An important feature of array constructors is that the expression that
denotes the value to be stored into the array elements behaves as if it is
evaluated separately for every distinct array index. This semantics would make
no difference for the example above; however, it matters in the case of arrays
of arrays. The following expression produces an array of arrays of
integers—effectively a matrix—and initializes every element to 0. It is important that
the expression new int[2](0)
be evaluated twice so that there are
actually four separate integer cells in the final data structure.
a: array[array[int]] = new array[int][2](new
int[2](0));
A function call looks the same as in Java. The optional list of expressions is separated by commas. The types of the expressions must match the declared types of the corresponding formal arguments in the interface of the module that the identifier comes from.
function_call ::= id ( [ exprs ] )
exprs ::= exprs , expr
| expr
The possible statement forms are shown below. A list of statements in parentheses can be used as a statement, as can a simple (non-parenthesized) expression. The value of this statement is the value of the expression.stmt_expr ::= ( [ stmt ( ; stmt )* ] [ ; ] )
A new variable may be introduced by a statement, as in the third production. As in Java, this variable remains in scope in subsequent statements until the closing brace of the tightest enclosing pair of braces that surround this statement. The new variable may optionally be initialized with an assignment to an expression. If not, the value of the variable is undefined until its first assignment. An assignment to an existing variable (which may be an array index) may be used as a statement in Iota, but not as an expression (unless it is placed in braces). The value of an assignment or variable initialization is the assigned value.
An if statement evaluates the initial expression and executes the following statement if the expression evaluates to true. The expression must have type bool. If the expression evaluates to false, the else clause is evaluated if present. The statement has a value only if it has an else clause, both possible statements have values, and their type is the same -- this type is then the type of the whole if statement. Thus, a Java statement such asstmt ::= expr
| id: type [ = expr ]
| variable = expr
| if ( expr ) stmt [ else stmt ]
| while ( expr ) stmt
| return
| return expr
x = (x < y) ? x : y;can be expressed in Iota as
x = (if (x < y) x else y)A while statement operates as in Java. It has no value. The expression tested must have boolean type.
A return statement may either return a value or not. A return
statement in a function with no return type may not return a value; a return
statement in a function with a return type must return a value. A return
statement has no value itself.
Since the scope of each variable temp is limited to its respective block of statements, the variables may have the same name.( temp: int = x1; x1 = y1; y1 = temp ) ( temp: int = x2; x2 = y2; y2 = temp )
In the case of an identifier that is imported through the "uses id = id . id" syntax, only the left-hand side of the assignment is considered to be the globally visible name; the identifiers on the right-hand-side are not globally visible and therefore may be the same as other identifiers in the module. The reason for the id = id . id syntax is to allow external identifiers to be renamed in cases where they might otherwise create a conflict.
Integer literals (integer, above) are either 0 or a digit in the range 1..9, followed by a sequence of digits in the range 0..9. Negative integers are expressed as a unary negation operator followed by a positive integer constant. Integers in Iota have the same range as in Java. The largest integer literal is 2147483648 (231). All integer literals from 0 to 2147483647 may appear anywhere an integer literal may appear, but the literal 2147483648 may appear only as the operand of the unary negation operator -.
String literals (string, above) are largely as defined for Tiger (Appel,
p. 526), with a couple of exceptions. A new escape sequence \N
represents
the line separator character sequence for the system on which the compiler is
run. This sequence can be obtained by evaluating System.getProperty("line.separator")
;
on Windows machines it stands for the two ASCII characters CR (13) and LF (10)
in that order. As
defined in Appel, the escape sequence \^
c, where c
is an arbitrary character, stands for a control character. To clarify this
definition, c may be any character whose ASCII code is in the range
64..126; the escape sequence stands for the control character whose ASCII code
is in the range 0..31 and is equal to the ASCII code of c, modulo 32.
Boolean literals must be either the keyword true or the keyword false.
Identifiers must begin with an alphabetic character. Following the initial
alphabetic character may be any sequence of alphabetic characters, numeric
characters, or the underscore character ( _ ). Uppercase and lowercase
alphabetic characters are both considered alphabetic and are distinguished,
so x
and X
are different identifiers.
Keywords in the language (int,
string, bool,
array,
uses,
if,
else,
while,
return,
new,
length,
true
and false) may not be used as identifiers.
main(args: array[string]): intIota requires that a function named main in any module must have exactly this signature. When the Iota program is run, this function will be evaluated, passing any command-line arguments to the argument variables args. The expression (length args) can be used to determine the number of arguments. The return status of the program is defined by the return value of main. When a set of modules are combined to form a program, there may be only one function named main among all the modules.
print(s: string) // output the string s to the standard output device printi(i: int) // output the integer i as a formatted integer to standard // output. Equivalent to print(itos(i)) putc(c: int) // output the character whose ASCII value is c to standard output readln( ): string // read a line from the standard input device getc( ): int // read a single character from standard input eof(): bool // return whether the end of file has been reached on standard input
itos(i: int): string // Produce a formatted version of i. stoi(s: string, err: int): int // Parse s as a formatted integer. // Return err if the string could not be parsed. itoc(i: int): string // Produce a string containing the single character whose ASCII code is i. atos(a: array[int]): string // Produce a string containing the characters whose ASCII codes are in a, // in proper sequence stoa(s: string): array[int] // produce an array containing the ASCII codes of all the characters in s, // in proper sequence
quicksort.ii:
quicksort(a: array[int], low: int, high: int) /* Put the array elements a[low]..a[high] into ascending order. Requires that low and high be valid array indices and that low <= high. */
quicksort.im:
quicksort(a: array[int], low: int, high: int) = ( if (!(low < high)) return; mid: int = partition(a, low, high); quicksort(a, low, mid); quicksort(a, mid + 1, high) ) partition(a: array[int], low: int, high: int): int = ( /* Reorder the elements in a into two contiguous groups. If ret is the return value, then the first group is the elements in low..ret, and the second group is the elements in ret+1..high. Each element in the second group will be at least as large as every element in the first group. */ x: int = a[low]; i: int = low - 1; j: int = high + 1; while (true) ( while (a[(j = j - 1)] > x) (); while (a[(i = i + 1)] < x) (); if (i < j) ( temp: int = a[i]; a[i] = a[j]; a[j] = temp; ) else return j ); 0 // dummy value )
uses io.print main (args: array[string]) : int = ( print ("Hello World!\N"); 0 )