CS232 Project 8: Assembler

Main course page

The purpose of this project is to write a 2-pass assembler that converts a mnemonic assembly language into the machine code for the CPU you designed in project 8.

This is the third part of three coordinated projects. You should demonstrate the functionality of all three components at the end of this project.


The overall task is to create a two-pass assembler for your CPU. An assembler converts a set of instructions written in a simple mnemonic language into machine code appropriate for loading onto your machine. In this case, you will generate an MIF file that can be read by Quartus when compiling your CPU.

You are free to use whatever language you like to write the assembler. My example code will be provied in Python, and its string handling capabilities make it fairly simple to use for this task.

Reference the CPU Design when building your assembler.

The assembly language you need to support is given in the assembly language design document.

  1. Download the assembler template. It contains a function that reads and tokenizes a file. It also contains python functions for coverting a decimal number into an 8-bit unsigned or 2's complement binary string.

    Tokenizing a file means the function reads through the file and creates a token for each individual element of the file, separating numbers and symbols into individual strings. The tokenizer also converts all characters to lower case, meaning the assembler is case insensitive. The output of the tokenize function is a list of lists. Each sub-list corresponds to single line in the file and contains a list of the individual strings for that line.

    For example, the file:

    movei 8 RA
    movei 8 RB

    becomes the list:

    [ ['start:'], ['movei', '8', 'ra'], ['movei', '8', 'rb'] ]
  2. Symbols in assembly language are used by branching operations. Rather than require the programmer to know the address of each instruction, the programmer can place symbols within the code and use those symbols as targets for branch instructions.

    The first pass of an assembler needs to figure out what line number corresponds to each symbol. In the second pass, the assembler generates the machine code for each instruction and fills in the address values for branch instructions from the symbol table.

    If the assembly language allowed symbols to be attached to locations in the data RAM, the first pass would also have to calculate those as well.

    The output of the first pass through the code should be a dictionary with the symbols as the keys and the line number as the value. The assembler should count only actual machine instruction line numbers. You may want your assembler to detect duplicate symbols and report an error.

    The first pass can also remove lines with labels from the tokens, since they are no longer necessary.

  3. The second pass should take in the tokens and the label dictionary and generate the set of machine instructions. The first token in each line should be the instruction. The interpretation of the rest of the tokens on each line is instruction dependent.

    The output of the second pass should be a list of machine instructions, where each instruction is a string representing the 16-bit binary instruction.

  4. Create a main function that opens a file, tokenizes it, runs the two passes, and then prints out the machine instructions in an MIF format appropriate for use by Quartus.

    Note that the python expression

    print "%02X : %s;" % (line, instr)

    will print out the numeric value in the variable line as a 2-digit hex number and then replace the %s with the string in the variable instr.

  5. Write the fibonnacci program from last week in assembly. Compile it and demonstrate that it works in simulation.
  6. Write the recursive factorial program in assembly and demonstrate that it works, up to the range of the numbers that can be represented.



Create a wiki page with your writeup. For each task, write a short description of the task, in your own words.


Give your wiki page the label cs232s14project8.

Put your python files in a folder named Proj8 in your private subdirectory on the CS232 server. If you wrote any new vhdl or mif files, you are welcome to put them on the server as well.