Title image Spring 2017

Machine Learning

Due Monday 17 April 2017

The goal of this week's project is to build two simple classifiers that can be trained from data. In particular, you will implement a Naive Bayes classifier and a K-nearest-neighbor [KNN] classifier. Once they are working, build some tools for evaluating the outputs and use your visualization app to look at the results.


  1. Write the two functions in the Classifier parent class for creating and printing a confusion matirx. The confusion_matrix method should build a numpy matrix showing the number of data points in a category classified as each output category. The confusion_matrix_str method should convert it into a string that does a nice job of printing out the matrix.
  2. Write a python function, probably in a new file, that does the following.
    1. Reads in a training set and its category labels, possibly as a separate file.
    2. Reads a test set and its category labels, possibly as a separate file.
    3. Builds a classifier using the training set.
    4. Classifies the training set and prints out a confusion matrix.
    5. Classifies the test set and prints out a confusion matrix.
    6. Writes out a new CSV data file with the test set data and the categories as an extra column. Your visualization application should be able to read this file and plot it with the categories as colors.

    You will want to be able to use either the Naive Bayes or the KNN classifier for this task. You can create two files, or you can let the user select one or both classifiers from the command line.

  3. Run the above code on the original Activity Recognition data set. Then run it again on the PCA-transformed version of the data set. Include the confusion matrices in your writeup and note any significant differences.
  4. Plot the activity recognition data set using the first three PCA axes and use color to show the output labels of the classifier. Include this image in your writeup.
  5. Repeat the above two exercises on a data set of your choice other than the Iris and Activity Recognition.



Make a wiki page for the project report.


Once you have written up your assignment, give the page the label:


Put your code on the handin server in a project8 directory in your private subdirectory.