CS 333: Assignment #2

Lexical Analysis

The first thing we have to do with any programming language is build a lexical analyzer that converts a text string into a series of tokens. There are lots of tools for handling this. For this assignment, you'll make use of the flex lexical analysis tool and build a few different parsers.


Flex is an open source lexical analysis tool. The way it works is you first create an input file that defines symbols and production rules that describe how to parse a text file. Then you run flex and it produces a C file. You can then compile the C file and run it on some input.

As an example, consider the following hello world program for Flex. The line beginning with blah defines a rule that says to replace instances of blah with hello world.

blah    printf("hello world");

int main(int argc, char *argv[]) {

If you type the above into a file called replace.yy and then execute the following commands, you get the resulting behavior.

bash-3.2$ flex replace.yy
bash-3.2$ gcc -o repl lex.yy.c -lfl
bash-3.2$ echo "blah and another blah" | ./repl
hello world and another hello world

The documentation for flex provides many more examples and a description of the syntax.

Note that flex is based on regular expressions, which are similar to the extended BNF notation we've gone over in class. There is lots of documentation for regular expressiones, and they can be very useful in many situations.

  1. Using flex, make a program called encode that takes any character in a-z or A-Z and shifts it 13 spaces forward in the alphabet, with wraparound from z back to a. Run it on this file. You can test that it is working correctly by running the output of encode through encode again. The result should be the original document.
  2. Using flex, make a program that reads in a text file and tells you not only the number of rows and characters, but also how many of each vowel [a, e, i, o, u] are in the file.
  3. Make a program that strips an html file of all tags and comments. It should also strip the file of all whitespace except for a blank line wherever there used to be a <p> tag. Test it on this web page. Here is a downloadable version.

    Note that this is challenging to do in a way that make the output reasonably formatted. As an extension, you can make your program do more sophisticated things like replace &gt; with >.

  4. Implement a parser for Clite. For now, the output of your program should just be a sequence of strings, one per line. Please use the following strings.
    Integer-<value>A sequence of one or more digits.
    Float-<value>An integer, a dot, and another integer.
    Keyword-<word>if | else | while | for | int | float
    Identifier-<name>Legal names/identifiers.
    Assignment =
    Comparison-<symbol>== | < | > | <= | >=
    Operator-<symbol>+ | - | * | /
    Open-bracket {
    Close-bracket }
    Open-paren (
    Close-paren )

    Ideally, your program should also ignore comments. You can test your program on this example. You should get something like this result.



The writeup for each project should be a brief summary of what you did along with some code examples, terminal output, or screen shots, depending upon the assignment. Please organize the writeup as follows.

  1. Title of the project and your name
  2. An abstract describing what you did in 200 words or less.
  3. A brief description of code you wrote.
  4. Results.
  5. A brief description of what you learned.


Make your writeup for the project a wiki page in your personal space. If you have questions about making a wiki page, stop by my office or ask in class.

Once you have written up your assignment, give the page the label:


You can give any page a label when you're editing it using the label field at the bottom of the page.

Do not put code on your writeup page or anywhere it can be publicly accessed. To hand in code, attach it to an email and send it to the prof.