In this programming assignment, you will build a code generator for the IC language. At the end of this assignment you will have a working optimizing compiler. We expect you to build upon the code that you wrote for previous assignments.
Your program will have a main class,
IC.Compiler, that compiles
IC programs to Pentium assembly code. When invoked from the command line, the compiler
expects to receive a single argument that is the name of a source file containing a module specification, with an extension of
.im". The compiler will
parse and type-check the program as in programming assignments 1 and 2, it will convert the program to
low IR as in programming assignment 3, and will perform optimizations from
assignment 4, as
specified by the command line arguments. It will then perform the following steps:
Generate abstract Pentium assembly code
Perform register allocation to convert the abstract assembly to real assembly
Write the resulting assembly code to a "
When invoked on a source code with the command line "
IC.Compiler file.im", the compiler will generate an assembly file
file.s" that can be assembled using the "
assembler (gnu assembler) to produce an object file "
file.o". We are supplying a library
libic.a" that implements the
conv interfaces described in the language
definition and other utility functions that you will need for code generation.
Your compiler must support a number of dump options for debugging purposes. It must continue to support the all the command line arguments from the previous assignments, and support the following new command line arguments:
-dump_aaoption prints the abstract assembly for each function (separately), along with the names of the function being printed.
-regallocto perform register allocation by graph coloring; without this flag, the compiler will perform trivial register allocation which spills all the variables to the stack;
-dump_raoption prints the register assignments for each function, in case the
-regallocoption was also specified. The output for each function must print one line for each register, specifying which variables were allocated to that register, or that no variable is allocated to the register.
Your compiler will produce assembly files (with an .s extension) from correspondingly named files with a
.im extension. You can then use the assembler to convert these assembly files into object files
(.o files) and the linker to convert a set of such object files into an executable file (.exe extension).
To invoke the assembler, the following command line is used for an assembly file "
as -o file.o file.s
Supposing that this module is the only module in the program, it can be linked using the gnu linker "
ld", as follows:
ld -o file.exe file.o /lib/crt0.o libic.a -lcygwin -lkernel32
The library file
libic.a is a collection of
.o files bundled together, containing the code for the standard modules
conv that are defined in the language specification, along with run-time support for garbage collection.
You will start with the low-level IR representation of the program. You will
then build a DAG representation for the computation in each basic block and next
use it to generate abstract assembly code. For this, use the Maximal Munch
algorithm to perform tiling over the DAG structure. You will be generating 32-bit Pentium code, so all variables and registers will take a full 32 bits of storage. Although it would
be possible to generate code that stores characters and booleans more compactly, these data types will be stored in 32-bit
words as well.
Your generated assembly code must provide the appropriate support for stack frames, as discussed in class. You must generate the appropriate code before invoking a function, at the beginning of the invoked function (prologue), at the return points (epilogue), and in the caller, after the call. In the generated code, you will use comments to indicate the instruction sequences for each of these fragments. Function names in the the assembly code must be preceded by an underscore (as a minimal scheme for name mangling). For instance, procedure
foo will be invoked with a "
assembly instruction. The main function of the program has to be translated into
_ic_main in the generated assembly code. For the
allocation routine and to the string concatenation and string comparison
routines you will use two underscores, as in the previous assignments (e.g.
The various registers are by convention considered to be caller-save or callee-save as follows: eax, ecx, and edx are caller-save, and ebx, esi, edi, ebp, esp are callee-save. This means that a function can freely modify the registers eax, ecx, and edx, but must assume that their contents have been destroyed if it in turns calls a function. And it may call another function and know that the callee-save registers have not been modified; however, if it modifies these registers itself, it must restore them to their original values before returning. Your register allocation algorithm must determine which of these registers need to be saved. Use comments in the generated assembly code to indicate the instructions that save the registers in the caller or in the callee.
Arrays and strings will be stored in the heap; to create new arrays and strings, you will use the __allocate routine, as required in the previous assignments. String constants and global variables will be allocated statically in the data segment. You will also provide run-time support for array bounds checks and null pointer checks. You will modify your low IR translation to insert these checks before each array access. You must also ensure that array references are initialized to null values.
Your compiler will implement the graph-coloring register allocation algorithm described in the Tiger Book, except register coalescing and removal of redundant mov instructions. Rather than being allocated immediately to stack locations, local variables are allocated to temporaries that are spilled to stack locations only if necessary. In this part of the assignment you will use results of live variable analysis implemented in the previous project.
You will use the result of register allocation to deal properly with callee-save and caller-save registers. If any callee-save registers are
used during a function, they must be saved on the stack and restored from it before the function returns. Similarly, the
content of caller-save registers must be assumed to be destroyed by all function calls. The register allocation algorithm will
cause this spilling automatically if the def and use for call instructions are set up properly.
your files in
is your group identifier. Use the same directory structure for the electronic
submission as in the previous assignments :
src\ - all of your source code and class files;
doc\ - documentation, including your write-up and a
containing information on how to compile and run your project;
test\ - test cases you used in testing your