Title image Fall 2019

Project 8: Assembler

The purpose of this project is to write a 2-pass assembler that converts a mnemonic assembly language into the machine code for the CPU you built in project 7. This is the third part of three coordinated projects.


The overall task is to create a two-pass assembler for your CPU. An assembler converts a set of instructions written in a simple mnemonic language into machine code appropriate for loading onto your machine. In this case, you will generate an MIF file for the ROM that can be read by Quartus when compiling your CPU. An advanced assembler may also generate a separate MIF for the RAM in case there are variables defined in the assembly code that need to be preloaded into memory.

You are free to use whatever language you like to write the assembler. My example code is provided in Python, and its string handling capabilities make it fairly simple to use for this task. Dictionaries also come in very handy (hint).

Reference the CPU Design when building your assembler.

The assembly language you need to support is given in the assembly language design document.

  1. Setup and Context

    Download the assembler template. It contains a function that reads and tokenizes a file. A token is a string separated by spaces from its neighbors. It also contains python functions for coverting a decimal number into an 8-bit unsigned or 2's complement binary string.

    Tokenizing a file means the function reads through the file and creates a token for each individual element of the file, separating numbers and symbols into individual strings. The tokenizer also converts all characters to lower case, meaning the assembler is case insensitive. The output of the tokenize function is a list of lists. Each sub-list corresponds to single line in the file and contains a list of the individual strings for that line.

    For example, the file:

    movei 8 RA
    movei 8 RB

    becomes the list:

    [ ['start:'], ['movei', '8', 'ra'], ['movei', '8', 'rb'] ]
  2. Symbols and the First Pass

    Symbols in assembly language are used by branching operations. Rather than require the programmer to know the address of each instruction, the programmer can place symbols within the code and use those symbols as targets for branch instructions.

    The first pass of an assembler needs to figure out what line number corresponds to each symbol. In the second pass, the assembler generates the machine code for each instruction and fills in the address values for branch instructions from the symbol table.

    If the assembly language allowed symbols to be attached to locations in the data RAM, the first pass would also have to calculate those as well. It would, for example, put the first such symbol at location 0, the second at location 1, and so on.

    The output of the first pass through the code should be a dictionary with the symbols as the keys and the line number as the value. The assembler should count only actual machine instruction line numbers. You may want your assembler to detect duplicate symbols and report an error.

    The first pass can also remove lines with labels from the tokens, since they are no longer necessary.

  3. Instructions and the Second Pass

    The second pass should take in the tokens and the label dictionary and generate the set of machine instructions. The first token in each line should be the instruction. The interpretation of the rest of the tokens on each line is instruction dependent.

    The output of the second pass should be a list of machine instructions, where each instruction is a string of 1s and 0s representing the 16-bit binary instruction.

  4. The main function

    Create a main function that opens a file, tokenizes it, runs the two passes, and then prints out the machine instructions in an MIF format appropriate for use by Quartus.

    Note that the python expression

    print "%02X : %s;" % (line, instr)

    will print out the numeric value in the variable line as a 2-digit hex number and then replace the %s with the string in the variable instr.

  5. Write a fibonnaci sequence program in assembly

    Write the fibonnacci program from last week in assembly. Compile it and demonstrate that it works in simulation.

  6. Write a recursive program in assembly

    Write a recursive program that sums numbers from 1 to N in assembly and demonstrate that it works, up to the range of the numbers that can be represented. A recursive program must use CALL and RETURN instructions.



Create a wiki page with your report. For each task, write a short description of the task, in your own words.


Give your wiki page the label cs232f19project8.

Put your VHDL, python, and assembly files in zip file in your private subdirectory on the Courses server. If you have any issues with the server, try using vpn.colby.edu in a browser.