Due Wednesday, April 4
In this programming assignment, you will build a simple code generator for the language Iota. At the end of this assignment, you will have a working compiler for most of the Iota language. We expect you to build upon the code that you wrote for Programming Assignments 1-3, and to fix any problems with your lexical analysis, parsing, or semantic analysis that interfere with code generation for correct Iota programs.
Your program will have a main class, Iota.Compiler, that compiles Iota code to assembly code. When invoked from the command line, the compiler will implement at least the following minimal specification: it should expect to receive a single argument that is the name of a source file containing a module specification, with an extension of ".im". The compiler will parse and type-check the program as in programming assignments 1 and 2, and convert the program to canonical IR as in programming assignment 3. It will then perform the following steps:
.asm
" file.obj
"
file.When invoked on a source code with the command line "java Iota.Compiler
foo.im
", the compiler
will generate an assembly file "foo.asm
" that
can be assembled using MASM to produce an object file
"foo.obj
". We are supplying a library file
"iota.lib
" that implements the io
and conv
interfaces described in the language definition,
as well as some other useful utility functions that you will need for
code generation.
Your compiler must support a number of dump options for debugging purposes:
-dump_aa
option prints the abstract assembly for each
function (separately), along with the names of the function being printed.You may choose to add additional options not mentioned here as long as they do not interfere with the required functionality. For example, your compiler might accept more than one source file on the command line or support more debugging options than those listed. You may allow multiple dump options to be specified; we require support for only a single dump option on a particular command invocation. Describe any modifications or extensions to this specification in your documentation.
Your compiler will produce assembly files (with an
.asm
extension) from correspondingly named files with a
.im
extension. You then can use the assembler to convert
these assembly files into object files (.obj
files) and
the linker to convert a set of such object files into an executable
file (.exe
extension).
To invoke the assembler, the following command line is used for an
assembly file foo.asm
:
ml /c /Cp /coff /Zd /Zi foo.asm
Supposing that this module is the only module in the program, it can be linked as follows:
link /pdb:none /debug /debugtype:coff foo.obj iotaextra.lib iota.lib
The library file iota.lib is a collection of
.obj files bundled together, containing the code for the
standard modules io
and conv
that are
defined in the language specification, along with run-time support for
garbage collection and a few useful utility functions that can be
called from your generated code. The file iotaextra.lib is an
additional library that you may modify to include additional
utility functions.
The program ml
can be found in the directory \\goose\courses\cs412-sp01\masm611.
The program link
is from Microsoft Visual Studio VC++ 98
(C:\Program Files\Microsoft Visual Studio\VC98\Bin\ in the
undergraduate lab). However, to simplify the assembly and linking
process, we are providing scripts
asm.bat and ilink.bat that package up these
operations for you, so that the lines above can be typed simply as
asm foo ilink foo
The first argument to ilink
is the name of the module
containing the program's main
function. More module names
can be specified on the command line following this first module,
causing the linker to use the corresponding object files.
Examples of how to compile various Iota constructs are given in a companion handout, Pentium Code Samples. These examples show the extra assembler directives necessary to instruct the assembler to generate 32-bit code, and assembly code for some of the example programs found in the Iota language definition. You can also work backward from this assembly code to help figure out what IR to generate for various programming constructs. You will also find the Intel Architecture Software Developer Manuals, volumes 1 and 2, to be helpful.
You will be generating 32-bit Pentium code, so all variables and registers will take a full 32 bits of storage. Although it would be possible to generate code that stores characters and booleans more compactly, these data types will be stored in 32-bit words as well. All stack locations and dynamically allocated objects will be aligned to 32-bit boundaries; the address of a variable or dynamically allocated object always will be divisible by four.
The library file iota.lib
contains the code for the
standard modules io
and conv
that are
defined in the language specification. It also contains several
additional routines that are useful when generating code. These
routines are described in the Pentium Code Samples
handout.
Arrays and strings will be stored in memory in the manner described in class.
String constants will be allocated statically in the data segment. To
create new arrays and strings, you will use the newarray
and strconcat
routines described in the Pentium Code
Samples handout. These routines allocate heap storage for arrays and
strings. The memory layout of arrays and strings is as
described in the Programming Assignment 3 handout.
The other utility functions described in the PA3 handout will also be
available to you. You may also define your own runtime support routines in C.You should be aware that a call to a runtime
routine will likely be more expensive than the IR code that it replaces, even if
that IR code is fairly complicated. Be sparing with the number of routines you
introduce, because frequent calls to runtime routines will likely degrade performance of the code
your compiler emits. If you take this approach, you must submit the C source code that implements
your routines with this assignment. You must also put an extended
iotaextra.lib
file in your top-level directory that
includes all the added functions in addition to the ones we
supplied. Visual Studio, gcc, or any other compiler
or language you like may be used to produce the
iotaextra.lib
file. A skeleton iotaextra.lib
with a Visual Studio workspace is located at \\goose\courses\cs412-sp01\iotalib\iotaextra\.
The canonical IR trees generated in Programming Assignment 3 will be converted to assembly code using tiling and the simple stack-frame register allocation scheme described in lecture. When the canonical IR code for f(x:int, y:int):int = g(y) + y:
MOVE(TEMP(t1), CALL(NAME(g), MEM(FP + 12))) MOVE(RV, MEM(FP + 12) + TEMP(t1))is converted to abstract assembly by tiling, the following code might result:
push [ebp + 12]
call g
mov t1, eax
mov t2, t1
add t2, [ebp + 12]
mov eax, t2
After the register allocation (t1
= t2
= eax
) and
elimination of unnecessary mov instructions, we might obtain the following
compact code:
push [ebp + 12]
call g
add eax, [ebp + 12]
However, for this programming assignment you will not make this final step. Instead,
all temporaries will be allocated on the stack. The caller-save registers eax
, ecx
,
and edx
will be used for all instructions involving temporaries. Before the instruction, one
or two of these registers will be loaded from the appropriate stack location. After the
instruction, if necessary, the stack location will be updated from the corresponding
register. This register allocation strategy is not particularly efficient, but it is
simple and will result in working code. You will need to compute how many
temporaries have to be stored on the stack and adjust the stack pointer in the
function prologue accordingly. In this example, suppose that we assign the temporary t1
to the
location [fp-4]
, and the temporary t2
to the location [fp-8]
.
The resulting code will look like the following:
push [ebp + 12]
call g
mov [ebp-4], eax ; mov t1, eax
mov eax, [fp-4]
mov [ebp-8], eax ; mov t2, t1
mov eax, [ebp-8]
add eax, [ebp + 12] ; add t2, [fp + 12]
mov [ebp-8], eax
mov eax, [ebp-8] ; mov eax, t2
Obviously there are many opportunities for even simple-minded optimization in this code, but the goal here is just to generate working code. We recommend not trying to optimize this code, because the register allocation optimizations that you will implement in Programming Assignment 6 will do a better job than any ad-hoc peephole optimizer. In addition to temporaries, local variables and function arguments should be allocated on the stack relative to the frame pointer register, as discussed in lecture.
An Iota program may comprise several Iota modules that must be linked together. When
you generate code for an Iota module, you will generate assembler PUBLIC
directives that
expose publicly visible symbols in the Iota program so that the linker can link external
references from other modules to these global variables or functions. The public variables
and functions in the module are, of course, those declared in the interface file of the
module.
Global variables and function names in Iota are translated to assembler names using the name mangling scheme described in the Programming Assignment 3 handout.
Certain assembly instructions are generated at the beginning and end of every
function; these instructions are known as the prologue and epilogue,
respectively. The prologue and epilogue are not explicitly represented in the IR
for the function body. In the prologue, the current frame pointer (the ebp
register) is pushed onto the stack, and the stack pointer is adjusted to make
room for the local variables and temporaries used by the function. In the
epilogue, the return value is placed in the eax
register, the stack
pointer is restored to point to the control link, the previous frame pointer is
popped off the stack, and the function returns with a ret
instruction.
Although this should not be an issue for this programming assignment, the various
general-purpose registers on the Pentium architecture are by convention considered to be
caller-save or callee-save as follows: eax
, ecx
, and edx
are caller-save, and
ebx
, esi
, edi
, ebp
, esp
are
callee-save. This means that a function can freely modify the registers eax
,
ecx
, and edx
, but must
assume that their contents have been destroyed if it in turns calls a function.
Conversely, it may call another function and know that the callee-save registers have not
been modified; however, if it modifies these registers itself, it must restore them to
their original values before returning. Saving callee-save registers is usually done by
adding appropriate push
instructions to the function prologue, and
corresponding pop
instructions to the function epilogue. Caller-save
registers can be saved and restored around every function call; before the arguments
are pushed onto the stack, any caller-save registers containing useful data are
stored on the stack. In the code you are generating, the caller-save registers
should not need to be saved at function calls because caller-save registers are
only used to shuttle data off of and onto the stack. As we'll see later in the
course, there is no need to save caller-save registers explicitly, because the
register allocation process will automatically cause caller-save registers to be
spilled to the stack.
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date. Electronic submissions
after 10AM will be considered a day late. To submit Programming Assignment
4, please drop your files in \\goose\courses\cs412-sp01\grpX\pa4
,
where grpX
is your group identifier. Please organize your
top-level directory structure as follows :
src\
- all of your source code and class files. For example,
we expect to find a class file for your main program in src\Iota\Compiler.class,
and a companion .java file in the same directory.doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy in your src\
directory, brief
descriptions of the major classes, any known bugs, and any other information
that we might find useful when grading your assignment.test\
- any test cases you used in testing your project.Note: Failure to submit your assignment in the proper format may result in deductions from your grade.