In this programming assignment, you will have to implement the lexical analysis for the IC language, defined on the web page at http://www.cs.cornell.edu/courses/cs412/2002sp/ic/ic.html. You will build the lexer using the JLex lexical analyzer generator. Detailed documentation and examples for JLex can be found on the JLex web page at: http://www.CS.Princeton.EDU/~appel/modern/java/JLex.
As part of this assignment, you have to implement the following classes:
Token
class should at least implement the following interface:
interface TokenInfo {
For string literals, the
void printToken(OutputStream o) throws IOException;
// Print a human-readable representation of this token on the
// output stream o. The representation has the form
// <token-type, matching-string, line-number>.
int lineNumber();
// Return the number of the line that this token came from.
}
printToken
method should translate each escape sequence "\ddd" representing a printable character into the character itself, e.g.,"\065B\067"
should be printed as"ABC"
. It should also translate non-printable characters into a suitable canonical form, such as the "\ddd" or "\^c" escape sequences.
LexicalError
class should
also implement the TokenInfo
interface, but you should use a
more appropriate output format to report errors.Lexer
class should at least implement
the two methods below :
class Lexer {
Lexer(InputStream i);
// Create a lexer that reads characters from the input stream i
Token getToken() throws LexicalError;
// Return the next language token on the input stream. Returns
// a token representing the end of file as the last token.
}
LexTest
that works as follows. It takes a single filename as an
argument, it reads that file, breaks it into tokens, and uses the Token.printToken
method to dump a representation of the file as a series of tokens. When it
encounters a lexical error, it prints an error message that includes the line number
on which the error occurred. It must always report the first lexical error in the
file (but not necessarily the subsequent ones).Code Structure: All of the classes you write should be in or under the package IC
,
so the Lexer class will be IC.Lexer
, the testbed will be IC.LexTest
,
etc.
Testing the lexer: We expect you to perform your own testing of the lexer. You should develop a thorough test suite that tests all legal tokens and as many lexical errors as you can think of. We will test your lexer against our own test cases -- including programs that are lexically correct, and also programs that contain lexical errors.
Other tools: It is recommended that you start using the CVS system. This is a useful tool for managing the concurrent code development by multiple persons. Such a tool will become more useful in the following assignments, which will be significantly larger than this first assignment. You should therefore use this assignment as a chance to set up your code production and testing process. You may also consider the automation of this process using makefiles, shell scripts or other similar tools.
As in any other large program, much of the value in a compiler is in how easily it can be maintained. For this reason, a high value is placed here on both clarity and brevity -- both in documentation and code.
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date.
Electronic submissions after 10AM will be considered a day
late. Place
your files in \\goose\courses\cs412-sp02\grpX\pa1
, where grpX
is your group identifier. Please organize your top-level directory
structure as follows :
src\
- all of your source code and class files. doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy in your src\
directory, brief
descriptions of the major classes, any known bugs, and any other information
that we might find useful when grading your assignment.test\
- any test cases you used in testing your project.Note: Failure to submit your assignment in the proper format may result in deductions from your grade.