CS 412/413
Introduction to Compilers
Spring 2002

Assignment 5: Code Generation

due: Monday,  April 22, in class


In this programming assignment, you will build a code generator for the IC language. At the end of this assignment you will have a working optimizing compiler. We expect you to build upon the code that you wrote for previous assignments.

Assignment Description

Your program will have a main class, IC.Compiler, that compiles IC programs to Pentium assembly code. When invoked from the command line, the compiler expects to receive a single argument that is the name of a source file containing a module specification, with an extension of ".im". The compiler will parse and type-check the program as in programming assignments 1 and 2, it will convert the program to low IR as in programming assignment 3, and will perform optimizations from assignment 4, as specified by the command line arguments. It will then perform the following steps: 

  1. Generate abstract Pentium assembly code

  2. Perform register allocation to convert the abstract assembly to real assembly

  3. Write the resulting assembly code to a ".s" file 


When invoked on a source code with the command line "java IC.Compiler file.im", the compiler will generate an assembly file "file.s" that can be assembled using the "as" assembler (gnu assembler) to produce an object file "file.o". We are supplying a library file "libic.a" that implements the io and conv interfaces described in the language definition and other utility functions that you will need for code generation.

Your compiler must support a number of dump options for debugging purposes. It must continue to support the all the command line arguments from the previous assignments, and support the following new command line arguments: 

  1. -dump_aa option prints the abstract assembly for each function (separately), along with the names of the function being printed. 
  2. -regalloc to perform register allocation by graph coloring; without this flag, the compiler will perform trivial register allocation which spills all the variables to the stack;
  3. -dump_ra option prints the register assignments for each function, in case the -regalloc option was also specified. The output for each function must print one line for each register, specifying which variables were allocated to that register, or that no variable is allocated to the register.

Invoking the assembler and linker

Your compiler will produce assembly files (with an .s extension) from correspondingly named files with a .im extension. You can then use the assembler to convert these assembly files into object files (.o files) and the linker to convert a set of such object files into an executable file (.exe extension).

To invoke the assembler, the following command line is used for an assembly file "file.s":

as -o file.o file.s

Supposing that this module is the only module in the program, it can be linked using the gnu linker "ld",  as follows:

ld -o file.exe file.o /lib/crt0.o libic.a -lcygwin -lkernel32

The library file libic.a is a collection of .o files bundled together, containing the code for the standard modules io and conv that are defined in the language specification, along with run-time support for garbage collection.

Abstract Assembly and Runtime Support

You will start with the low-level IR representation of the program. You will then build a DAG representation for the computation in each basic block and next use it to generate abstract assembly code. For this, use the Maximal Munch algorithm to perform tiling over the DAG structure. You will be generating 32-bit Pentium code, so all variables and registers will take a full 32 bits of storage. Although it would be possible to generate code that stores characters and booleans more compactly, these data types will be stored in 32-bit words as well. 

Your generated assembly code must provide the appropriate support for stack frames, as discussed  in class. You must generate the appropriate code before invoking a function, at the beginning of the invoked function (prologue), at the return points (epilogue), and in the caller, after the call. In the generated code, you will use comments to indicate the instruction sequences for each of these fragments. Function names in the the assembly code must be preceded by an underscore (as a minimal scheme for name mangling). For instance, procedure foo will be invoked with a "call _foo" assembly instruction. The main function of the program has to be translated into a function _ic_main in the generated assembly code. For the allocation routine and to the string concatenation and string comparison routines you will use two underscores, as in the previous assignments (e.g. __allocate).

The various registers are by convention considered to be caller-save or callee-save as follows: eax, ecx, and edx are caller-save, and ebx, esi, edi, ebp, esp are callee-save. This means that a function can freely modify the registers eax, ecx, and edx, but must assume that their contents have been destroyed if it in turns calls a function. And it may call another function and know that the callee-save registers have not been modified; however, if it modifies these registers itself, it must restore them to their original values before returning. Your register allocation algorithm must determine which of these registers need to be saved. Use comments in the generated assembly code to indicate the instructions that save the registers in the caller or in the callee.

Arrays and strings will be stored in the heap; to create new arrays and strings, you will use the __allocate routine, as required in the previous assignments. String constants and global variables will be allocated statically in the data segment. You will also provide run-time support for array bounds checks and null pointer checks. You will modify your low IR translation  to insert these checks before each array access. You must also ensure that array references are initialized to null values.

Register Allocation

Your compiler will implement the graph-coloring register allocation algorithm described in the Tiger Book, except register coalescing and removal of redundant mov instructions. Rather than being allocated immediately to stack locations, local variables are allocated to temporaries that are spilled to stack locations only if necessary. In this part of the assignment you will use results of live variable analysis implemented in the previous project.

You will use the result of register allocation to deal properly with callee-save and caller-save registers. If any callee-save registers are used during a function, they must be saved on the stack and restored from it before the function returns. Similarly, the content of caller-save registers must be assumed to be destroyed by all function calls. The register allocation algorithm will cause this spilling automatically if the def and use for call instructions are set up properly.


What to turn in

Turn in on paper:

Turn in electronically:

Electronic submission instructions

Place your files in \\goose\courses\cs412-sp02\grpX\pa5, where grpX is your group identifier.  Use the same directory structure for the electronic submission as in the previous assignments : 
1) src\ - all of your source code and class files;
2) doc\ - documentation, including your write-up and a README.TXT containing information on how to compile and run your project;
3) test\ - test cases you used in testing your project.