CS 412/413
Introduction to Compilers
Spring 2001

Homework 1: Lexical Analysis

due: Wednesday, January 31, in class

When writing regular expressions, you may use the syntax [^chars] to indicate the possibility of any character except the characters listed in chars.

  1. Write a regular expression for http URLs. An http URL consists of four parts: the protocol (http://), the DNS name or the IP address of a host, an optional port number, and the pathname for a file. For simplicity, let's assume:
  2. A comment in the C language begins with the two-character sequence "/*", followed by the body of the comment, and then the sequence "*/". The body of the comment may not contain the sequence "*/", although it may contain the "*" and "/" characters. Write regular expressions for each of the following, or explain why there is no regular expression defining it.
    1. exactly C comments.
    2. C comments, permitting a contained "*/" as long as it is contained inside quotes.
    3. C comments, permitting nested comments.
  3. Appel, problem 2.3
  4. Appel, problem 2.5 (a,b)
  5. Appel, problem 2.6
  6. Appel, problem 2.9