CS 412/413
Introduction to Compilers
Spring 2002

Programming Assignment 1: The Lexical Analyzer

due: Friday, February 8, in class


Assignment Description

In this programming assignment, you will have to implement the lexical analysis for the IC language, defined on the web page at http://www.cs.cornell.edu/courses/cs412/2002sp/ic/ic.html. You will build the lexer using the JLex lexical analyzer generator. Detailed documentation and examples for JLex can be found on the JLex web page at: http://www.CS.Princeton.EDU/~appel/modern/java/JLex.

What to implement

As part of this assignment, you have to implement the following classes:

interface TokenInfo {
    void printToken(OutputStream o) throws IOException;
        // Print a human-readable representation of this token on the
        // output stream o. The representation has the form
        // <token-type, matching-string, line-number>.

    int lineNumber();
        // Return the number of the line that this token came from.
}

For string literals, the printToken method should translate each escape sequence "\ddd" representing a printable character into  the character itself, e.g., "\065B\067" should be printed as "ABC". It should also translate non-printable characters into a suitable canonical form, such as the "\ddd" or "\^c" escape sequences. 

class Lexer {
    Lexer(InputStream i);
        // Create a lexer that reads characters from the input stream i

    Token getToken() throws LexicalError;
        // Return the next language token on the input stream. Returns
        // a token representing the end of file as the last token.
}

Code Structure: All of the classes you write should be in or under the package IC, so the Lexer class will be IC.Lexer, the testbed will be IC.LexTest, etc.

Testing the lexer: We expect you to perform your own testing of the lexer. You should develop a thorough test suite that tests all legal tokens and as many  lexical errors as you can think of. We will test your lexer against our own test cases -- including programs that are lexically correct, and also programs that contain lexical errors.

Other tools: It is recommended that you start using the CVS system. This is a useful tool for managing the concurrent code development by multiple persons.  Such a tool will become more useful in the following assignments, which will be significantly larger than this first assignment. You should therefore use this assignment as a chance to set up your code production and testing process. You may also consider the automation of this process using makefiles, shell scripts or other similar tools.


What to turn in

As in any other large program, much of the value in a compiler is in how easily it can be maintained. For this reason, a high value is placed here on both clarity and brevity -- both in documentation and code.

Turn in on paper:

Turn in electronically:

Electronic submission instructions

Your electronic submission is expected at the same time as your written submission: at the beginning of class on the due date.  Electronic submissions after 10AM will be considered a day late. Place your files in \\goose\courses\cs412-sp02\grpX\pa1, where grpX is your group identifier.  Please organize your top-level directory structure as follows :

Note: Failure to submit your assignment in the proper format may result in deductions from your grade.