CS412/413
Introduction to Compilers and Translators
Spring 2000
Cornell University Computer Science Department

Programming Assignment 5: Iota+


Due Monday, April 24


In this programming assignment, you will build a static checker and code generator for the language Iota+, which is defined in the handout http://www.cs.cornell.edu/courses/cs412/2000sp/iota/iota+.html.  Your program will determine whether the input source file is a legal module, in conjunction with any interface files that it depends on, and report any lexical, syntactic, or semantic errors in the module or interfaces.  If the module is legal, it will be compiled to assembly code in the same fashion as for Programming Assignment 4.  We expect you to build upon the code that you wrote for Programming Assignments 1-4, and to fix any problems with your code generation that interferes with code generation for correct Iota+ programs.

To help you complete this assignment in a timely fashion, we are providing parsing code for you to base your development on.  We recommend but do not require that you implement the programming assignment using the provided code, which includes  lexical analysis and most of the syntactic phase.  The parsing code is written using Java CUP, an LR parser generator, which will make extending the grammar much simpler than it would be in a recursive-descent parser.  To use this code, you will need to integrate your implementation with the provided code and do the following pieces of work:

Few, if any, modifications to assembly code generation should be required, unless there are bugs in your assembly code generation that need to be fixed.  We still do not ask that register allocation be implemented, for example.

Compiler requirements

The name of the compiler class and its syntax for invocation remain the same as in Programming Assignment 4. You are still required to implement the -dump_ast, -dump_ir, -dump_cir, and -dump_aa options from previous programming assignments

Global variable initialization is now supported, which leads to the question: where does the initialization code go?  Initialization of the global variables in each module will be performed by a special function that is invoked when the program starts.  We have provided some extra support to make this feasible:

  1. You need to download the new version of the Iota Standard Library, and a new jar file: Iota0.jar. The jar file contains the program Iota.Prelinker that will take object file names, extract the module name from them and generate assembler code for a function named _iota__init, which simply calls each of the modules' _foo$init functions.  The Iota Standard Library main function will now call iota__init before calling your iota__main.
  2. In each module foo.mod, your compiler will generate a function named _foo$init that will perform all global variable initialization.  It should look like the following:
                  PUBLIC _foo$init
                  _foo$init PROC NEAR
                      push ebp
                      mov ebp, esp
                      ; global initialization code here
                      pop ebp
                      ret
                  _foo$init ENDP
    

    We are not worried about any naming clashes because the '$' character, while invalid as an Iota identifier character, is valid as a MASM identifier character.

  3. You will link your programs in the usual way.  The ilink script has been modified to invoke Iota.Prelinker before linking your modules together, to ensure that iota__init is generated properly each time.

Runtime support

The object file iota.lib contains the code for the standard modules io, file and conv that are defined in the language specification.  This object file has been extended to include support for the new io objects described in the io and file interfaces.

Object representation

Iota+ adds class types to the set of types supported.  To allow standardization, we have selected a mandatory representation for values of class type, which is the format described in class for class hierarchies with single inheritance.  An example of this representation is depicted below.  An object begins with a single word pointing to the dispatch vector for the object's class.  The fields of the object follow in the order they are declared in the class.  All fields, even booleans, take up a single 4-byte word.  The dispatch vector contains pointers to the code for each of the methods of the object.  The methods include not only the methods explicitly declared in the class of the object, but also methods from each of the interface types, if any, that the class declares.  The methods are numbered starting from zero, with each method having an index one greater than the previous one in the same class or interface type.  The first method in a class or interface has an index one greater than the last method in the immediate supertype, or zero if there is no immediate supertype.

[Object Layout Image]

For example, consider the following class and interface declarations, assuming that sqrt and atan2 are defined in some appropriate fashion:

       interface Point {
           x() : int
           y() : int
       }
       class Cartesian implements Point {
           x_, y_ : int
           x(): int = x_
           y(): int = y_
           r(): int = sqrt(x*x + y*y)
           theta(): int = atan2(y, x)
       }
These definitions result in an object of type Cartesian having the memory layout shown in the picture above.

The special value null may be used as a value of any object type.  It is represented by a word containing the address 0.  To create storage for a real object, it will be necessary to call the routine iota__malloc, passing the size of the object in bytes.  This routine returns the address of the allocated storage.  The fields of the object then can be zeroed to give them all their default values, and the pointer to the dispatch vector must be installed.  Since all objects of a particular class share the same dispatch vector, there is no need to allocate space for the dispatch vector dynamically.  The dispatch vector is global storage and should be placed in the data segment along with other global variables.

Methods are implemented in a manner similar to functions.  They take an additional first argument that is the object this, on which the method was invoked.  Another difference is that method names are mangled differently.  In the generated assembly code, the label beginning the code of a method m in a class C is C__m.  Methods are not exported as public identifiers, so the name of the containing module does not need to be part of the mangled name.

The special cast expression performs a dynamic cast of an expression to a new object type.  The Iota+ language definition requires the cast be checked for validity, similarly to Java.  If the check fails, the cast operation should evaluate to null.  This check can be implemented in a number of ways.  For example, when casting to a class type, the dispatch vector pointer can be compared against the dispatch vector of the class.  Implementing a cast to an interface type is more complex.

Extra credit

Your compiler must implement the cast expression but it is not required to return null if the cast fails.  If the cast fails, the behavior is unspecified.  In other words, you may pass through the pointer blindly without performing the runtime test.  You may implement the cast expression as specified above for 5% extra credit.  To efficiently implement cast for interface types you may have to change the object layout and/or add methods to the dispatch vector for all objects of class type.  If you do this, you should also modify the runtime library, iota.lib, so that any object it constructs is compatible with your changes. The source code for iota.lib may be found on in the course directory on goose.

Unlike in Programming Assignments 3 and 4, you are required to implement array and string bounds checking, string comparisons, and short-circuit boolean operations.

Partial implementation

The supplied code that partially implements Programming Assignment 5 is available in Windows and Unix formats (the only difference being the line separator characters in the source files).  Please read the file README-pa5.txt for information about this code.  We will make known any updates to this code.

What to turn in

Turn in on paper:

Turn in electronically:

Electronic submission instructions

Your electronic submission is expected at the same time as your written submission: at the beginning of class on the due date.  Electronic submissions after 10AM will be considered a day late.  To submit Programming Assignment 5, please drop your files in \\goose\courses\cs412-sp00\grpX\pa5, where grpX is your group identifier.  Please organize your top-level directory structure as follows :

Note: Failure to submit your assignment in the proper format may result in deductions from your grade.