Title image Spring 2018

Due: Wednesday 22 February 2018

Part I: Lexical Analysis

The first thing we have to do with any programming language is build a lexical analyzer that converts a text string into a series of tokens. There are lots of tools for handling this. For this assignment, you'll make use of the Flex lexical analysis tool and build a few different parsers.


Flex is an open source lexical analysis tool. The way it works is you first create an input file that defines symbols and production rules that describe how to parse a text file. Then you run Flex and it produces a C file. You can then compile the C file and run it on some input.

As an example, consider the following Hello World program for Flex. The line beginning with blah defines a rule that says to replace instances of blah with hello world.

			 * Hello World: replace "blah" with "hello world"
			 blah    printf("hello world");
			 int main ( int argc, char *argv[] ) {

If you type the above into a file called replace.yy and then execute the following commands, you get the resulting behavior.

$ flex replace.yy
$ gcc -o repl lex.yy.c -ll
$ echo "blah and another blah" | ./repl
hello world and another hello world

The documentation for flex provides many more examples and a description of the syntax.

Note that flex is based on regular expressions, which are similar to the extended BNF notation we've gone over in class. There is lots of documentation for regular expressions, and they can be very useful in many situations.

  1. Using flex, make a program called encode that takes any character in a-z or A-Z and shifts it 13 spaces forward in the alphabet, with wraparound from z back to a. Run it on a text file of your choice. You can test that it is working correctly by running the output of encode through encode again. The result should be the original document. Make use of the fact that you can get the ASCII value of any character by wrapping it in single quotes (e.g. 'a' is the value 97). You can treat it as an integer and do math.

  2. Using Flex, make a program that reads in a text file and tells you not only the number of rows and characters, but also how many of each vowel [a, e, i, o, u] are in the file.

  3. Make a program that strips an html file of all tags and comments. It should also strip the file of all whitespace except for a blank line wherever there used to be a <p> tag. Test it on this example first. Then try testing it on this web page.

    Note that this is challenging to do in a way that make the output reasonably formatted. As an extension, you can make your program do more sophisticated things like replace &gt; with >.

  4. Implement a parser for Clite. The grammar is available on page 16 of the course lecture notes. For now, the output of your program should just be a sequence of strings, one per line. Please use the following strings.

    Integer-<value> A sequence of one or more digits.
    Float-<value> An integer, a dot, and another integer.
    Keyword-<word> if | else | while | for | int | float
    Identifier-<name> Legal names/identifiers.
    Assignment =
    Comparison-<symbol> == | < | > | <= | >=
    Operator-<symbol> + | - | * | /
    Open-bracket {
    Close-bracket }
    Open-paren (
    Close-paren )

    Ideally, your program should also ignore comments. You can test your program on this example. You should get something like this result.

Part II: Selected Langauges

Select two languages you want to learn this semester. Choose one from list A and one from lists B or C on the Languages page. You may also ask permission to select languages not listed on the page. You may work with a partner for the non-C language tasks. You and your partner need to choose the same languages. You will hand in your C work separately, but have a single wiki and handin for your non-C language assignments.


  1. Organize your wiki pages. There should be a top-level page for the course from which a user can navigate to each language (C and your two chosen languages). Each language page should have a link to each project page. If you are working with a partner, there should be only one page for each of your non-C languages, but you both should have the structure in place to navigate to it. In other words, the grader should be able to get to all the pages of your projects from your personal wiki space.

    Make sure there is a link to your project 1 on your top level page for the course.

    Having a properly organized wiki space is worth 3 points on this project.

  2. For each of the two languages, create a wiki page for Project 2.
  3. The first section of each wiki page should list a set of language resources. Spend some time searching for useful resources, documentation, and formal descriptions of the language. Organize them in a useful way and annotate the links (e.g. "I found this site useful for explaining the concept of ..."). Think of it as your reference manual for the rest of the semester, so make it useful for you.
  4. Write a "hello world" type of program for each language. Do something more interesting than printing hello world (i.e. don't just copy and paste a hello world example and do something more than print text). A snapshot of your program and its outputs are the second section of the wiki page.


Extensions are a way for you to customize the assignment and learn something of interest to you. Pick a task and dig deeper. Pick a different language and learn something about it. Do something interesting. The following are possible extensions, but you can choose your own. Extensions can be undertaken individually or jointly.

  1. Get comments working properly in your parser for Clite. Be sure to try out a variety of examples.
  2. Make the encoder more complex.
  3. Make the parser for analyzing a document to some more interesting things.
  4. A perfectly acceptable extension for any assignment going forward is to do a third language. The third language should have its own subdirectory. Under this subdirectory, there should be a set of links for all projects where you inluded this language.


The submission of this project has three components:

  1. Code: Your source code for the tasks of part I and the "hello world" programs of your two selected languages should be submitted to the fileserver in a project2 directory in Private. If you pick a third language as your extension, its "hello world" program should also be submitted to the fileserver. Each task of part I should have a .yy file and a, .c file. It should be clear which files correspond to which tasks. Note that the quality of your comments (in all of your code files) counts toward your grade.
  2. README: Submit a README file to the fileserver for your parsers (flex tasks). The README file can be a .txt file. It should be well-organized such that readers can easily understand the usage and the outputs of each parser. In addition, any known bugs of each parser should be indicated in the README file.
  3. Wiki Report: The C report for this project should be a 1-paragraph abstract explaining what you did in your own language.
  4. Wiki Report: The non-C language pages for this project should have the following elements.
    • Title of the project and your name
    • Resources for the selected languages
    • The snapshot/code of the "hello world" programs
    • The outputs of the "hello world" programs

    To check whether you've made your write-up well organized, you may ask yourself the following questions:

    • Is there a main page that indicates the main purpose of the page collection (e.g. this is a set of pages describing how a particular language implements standard programming language features)?
    • Does the main page have a title (e.g. Ying Li CS 333, Ying’s Explorations in R, Ying’s Notes in JS)?
    • On the main page, are there clearly organized links to supporting pages?
    • Does each supporting page have a title that indicates its role in the set of pages?

    Here is a recommended layout.

    	Ying Li CS333
    	|_Ying’s Explorations in R
    	| |
    	| |_P1
    	| |_P2
    	| |_...
    	|_Ying’s Notes in JS
    	| |
    	| |_P1
    	| |_P2
    	| |_...

Please note that it is your responsibility to explicitly indicate the extensions you have undertaken. If you complete extensions for the parsers, please indicate explicitly in the code and README file. If you complete extensions in selected languages, please indicate this explicitly in your wiki report. Because extensions can be individual or joint, you need to indicate this along with your description of the extension. If no indication is provided, the grader will assume the extensions were completed jointly.