In this programming assignment, you will implement the the syntax and semantic analysis phases for IC. The latest version of the IC language definition may be found at http://www.cs.cornell.edu/courses/cs412/2002sp/ic/ic.html. Your program will determine whether a source file defines a legal interface or module. We expect you to build upon the code that you wrote for Programming Assignment 1.
The syntax and semantic analysis phase of the compiler will have a main class, IC.SynSemTest. When invoked from the command line, this checker will implement at least the following minimal specification:
SyntaxException
class for syntax errors and a SemanticException
for semantic errors. Both of these exceptions should implement the following
interface which indicates where the error occurred. The interface also
provides more information about the cause of the error in an additional
error message:
interface ErrorInfo {
String fileName();
int lineNumber();
String errorMessage();
}
For this assignment, you will use the CUP automatic parser generator available at http://www.cs.princeton.edu/~appel/modern/java/CUP. You should use CUP only to build the AST representation. Your program should perform the semantic analysis only after the AST has been constructed.
You will use the grammar outlined on the IC language description page as a starting point. As presented, the grammar is ambiguous -- it doesn't specify the precedence or the associativity of several operators. You have to write a LALR(1) grammar for IC which satisfies the correct precedence of operators and which has all binary operators left-associative.
Note: for this assignment you are not allowed to use operator precedence or associativity CUP directives. The operator associativity and precedence must be explicitly encoded in the grammar productions for the language.
You also have to design a hierarchy of the abstract syntax tree (AST) nodes for the IC language. When the input program is syntactically correct, your checker will produce a corresponding AST representation of the program. The abstract syntax tree is the interface between the syntactic and semantic analysis phases, and designing it carefully before starting to write code will result in much less work.
The abstract syntax tree should be constructed so that binary operators are left associative. The nodes of the abstract syntax tree will not necessarily correspond to the non-terminals of your grammar. Use the grammar in the IC specification as a guideline for constructing the AST.
Your checker must support a "-dump_ast" option. When invoked with this option and the name of a module file (e.g., java IC.SynSemTest -dump_ast test.ic), the checker will print to System.out a textual representation of the abstract syntax tree for the module it has checked. Each node in the AST will be printed on a separate line; the information for each node should include a (unique) numeric identifier of the node, the name of the class representing the node, and the list of identifiers representing the children AST nodes.
You then have to design a hierarchy of symbol tables of IC. You will fill the symbol tables in only after the AST has been constructed. Your design should allow accessing each symbol table from the AST node representing the corresponding scope (module, procedure, statement block). Any errors which occur during symbol table construction (such as multiply declared identifiers) are considered semantic errors.
Your checker must support a "-dump_symtab" option. When invoked with this option and the name of a module file (e.g., java IC.SynSemTest -dump_symtab test.ic), the checker will print to System.out a textual representation of the abstract symbol table for the module it has checked. Each entry in the symbol table will be printed on a separate line; the information for each node should include a the name of the identifier, its kind (variable, function, etc), and its type. Each symbol table will have a unique identifier, and the output will use these identifiers to show the children of each symbol table. Different symbol tables should be separated by blank lines.
After the AST and the symbol tables are constructed, your checker will analyze the program to ensure that it contains no other semantic errors. As part of this analysis, the checker may need to read interface files (extension ".ii") to learn the signatures of operations from other modules. Interface files are read in when uses declarations are processed during semantic checking. The checker must throw an exception for any lexical, syntactic, or semantic errors found in the interface files. The checker should look for interface files in the same directory as the module file.
Semantic errors include the following categories: type errors, undefined
identifiers, overlaps of identifier scope (see the language
definition). Your checker must report these categories of errors. It is not
required to report more than one error; it may terminate after the first
lexical, syntactic, or semantic error it encounters.
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date.
Electronic submissions after 10AM will be considered a day
late. Place
your files in \\goose\courses\cs412-sp02\grpX\pa2
, where grpX
is your group identifier. Use the same directory structure for the electronic
submission as in the previous assignment : 1) src\
- all of your source code and class
files, 2) doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy of your source code, brief
descriptions of the major classes, any known bugs, and 3) test\
- any test cases you used in testing your
project.