CS 412/413
Introduction to Compilers
Spring 2002

Assignment 2: Syntax and Semantic Analysis

due: Wednesday,  February 27, in class


In this programming assignment, you will implement the the syntax and semantic analysis phases for IC. The latest version of the IC language definition may be found at http://www.cs.cornell.edu/courses/cs412/2002sp/ic/ic.html. Your program will determine whether a source file defines a legal interface or module. We expect you to build upon the code that you wrote for Programming Assignment 1.

Assignment Description

The syntax and semantic analysis phase of the compiler will have a main class, IC.SynSemTest. When invoked from the command line, this checker will implement at least the following minimal specification: 

interface ErrorInfo {
    String fileName();
    int    lineNumber();
    String errorMessage();
}

The IC grammar and CUP

For this assignment, you will use the CUP automatic parser generator available at http://www.cs.princeton.edu/~appel/modern/java/CUP. You should use CUP only to build the AST representation. Your program  should perform the semantic analysis only after the AST has been constructed. 

You will use the grammar outlined on the IC language description page as a starting point. As presented, the grammar is ambiguous -- it doesn't specify the precedence or the associativity of several operators. You have to write a LALR(1) grammar for IC which satisfies the correct precedence of operators and which has all binary operators left-associative.

Note: for this assignment you are not allowed to use operator precedence or associativity CUP directives. The operator associativity and precedence must be explicitly encoded in the grammar productions for the language. 

AST construction

You also have to design a hierarchy of the abstract syntax tree (AST) nodes for the IC language. When the input program is syntactically correct, your checker will produce a corresponding AST representation of the program. The abstract syntax tree is the interface between the syntactic and semantic analysis phases, and designing it carefully before starting to write code will result in much less work.

The abstract syntax tree should be constructed so that binary operators are left associative.  The nodes of the abstract syntax tree will not necessarily correspond to the non-terminals of your grammar.  Use the grammar in the IC specification as a guideline for constructing the AST.  

Your checker must support a "-dump_ast" option. When invoked with this option and the name of a module file (e.g., java IC.SynSemTest -dump_ast test.ic), the checker will print to System.out a textual representation of the abstract syntax tree for the module it has checked.  Each node in the AST will be printed on a separate line; the information for each node should include a (unique) numeric identifier of the node, the name of the class representing the node, and the list of identifiers representing the children AST nodes. 

Symbol Table construction

You then have to design a hierarchy of symbol tables of IC. You will fill the symbol tables in only after the AST has been constructed. Your design should allow accessing each symbol table from the AST node representing the corresponding scope  (module, procedure, statement block). Any errors which occur during symbol table construction (such as multiply declared identifiers) are considered semantic errors.

Your  checker must support a "-dump_symtab" option. When invoked with this option and the name of a module file (e.g., java IC.SynSemTest -dump_symtab test.ic), the checker will print to System.out a textual representation of the abstract symbol table for the module it has checked.  Each entry in the symbol table will be printed on a separate line; the information for each node should include a the name of the identifier, its kind (variable, function, etc), and its type. Each symbol table will have a unique identifier, and the output will use these identifiers to show the children of each symbol table. Different symbol tables should be separated by blank lines. 

Semantic analysis

After the AST and the symbol tables are constructed, your checker will analyze the program to ensure that it contains no other semantic errors.  As part of this analysis, the checker may need to read interface files (extension ".ii") to learn the signatures of operations from other modules. Interface files are read in when uses declarations are processed during semantic checking. The checker must throw an exception for any lexical, syntactic, or semantic errors found in the interface files. The checker should look for interface files in the same directory as the module file.

Semantic errors include the following categories: type errors, undefined identifiers, overlaps of identifier scope (see the language definition). Your checker must report these categories of errors. It is not required to report more than one error; it may terminate after the first lexical, syntactic, or semantic error it encounters.


What to turn in

Turn in on paper:

Turn in electronically:

Electronic submission instructions

Your electronic submission is expected at the same time as your written submission: at the beginning of class on the due date.  Electronic submissions after 10AM will be considered a day late. Place your files in \\goose\courses\cs412-sp02\grpX\pa2, where grpX is your group identifier.  Use the same directory structure for the electronic submission as in the previous assignment : 1) src\ - all of your source code and class files, 2) doc\ - documentation, including your write-up and a README.TXT containing information on how to compile and run your project, a description of the class hierarchy of your source code, brief descriptions of the major classes, any known bugs, and 3) test\ - any test cases you used in testing your project.