CS 365: Assignment #3

Assignment 3: Real-time 2D Object Recognition

Due 15 March 2016 (2 1/2 weeks)


This project is about real-time 2D object recognition. The goal is to be able to place objects on a white surface and have the computer identify the objects in a translation, scale, and rotation invariant manner from a camera looking straight down. The computer should be displaying live video from the camera with the identified objects clearly marked.


There will be a few different small stages in the robot lab that you can use. You are also free to make one of your own (we have lots of cardboard boxes that should work fine). The concept of the stage is simple: have the camera pointing straight down at a clean, white, surface on which you can place objects. Try to avoid strong shadows.

Your system needs to be able to differentiate at least 10 objects. Five of them will be provided; you get to choose the other five..


  1. Using the video framework from the first project (if you wish), start building your OR system by implementing a thresholding algorithm of some type that separates an object from the background. Give your system the ability to display the thresholded video (remember, you can create multiple output windows). Test it on the complete set of objects to be recognized to make sure this step is working well. The objects in to be recognized are either dark or fairly saturated in color.
  2. The next step is to run connected components analysis on the thresholded image to get regions. You may need to do some morphological processing on the thresholded image to get rid of spurious regions and fill in holes in your objects. Give your system the ability to display the regions it finds. A good extension is to enable recognition of multiple objects simultaneously.

    If you are running OpenCV 3.1, there is a connected components function. If you are using the dwarves, you will need to either write your own or use this function with this include file.

  3. Write a function that computes a set of features for a specified regions given a region map and a region ID. You probably want to make these features be translation, scale, and rotation invariant. Give your system the ability to display at least one feature in real time on the video output. Then you can easily test whether the feature is translation, scale, and rotation invariant by moving around the object and watching the feature value. Start with just 2-3 features and add more later.
  4. Enable your system to collect feature vectors from objects, attach labels, and store them in an object DB (e.g. a file). In other words, your system needs to have a training mode that enables you to collect the feature vectors of known objects and store them for later use in classifying unknown objects. You may want to implement this as a response to a key press: when the user types an N, for example, have the system prompt the user for a name/label and then store the feature vector for the current object along with its label into a file.
  5. Enable your system to classify a new feature vector using the known objects database and a scaled Euclidean distance metric [ (x_1 - x_2) / stdev_x ]. Label the unknown object according to the closest matching feature vector in the object DB. Have your system indicate the label of the object on the output video stream. An extension is to detect when an unknown object (something not in the object DB) is in the video stream.
  6. Implement a different classifier system of your choice. For example, implement K-Nearest Neighbor matching with K > 1. Note, KNN matching requires multiple training examples for each object.
  7. Evaluate the performance of your system.

    One possible approach is the following. Take ten objects, putting one in view at a time. Move each object to five different unique locations/orientations. Keep track of how the system identifies the object at each location, giving you 50 different evaluations. Build a confusion matrix of the results showing true labels versus classified labels and include that in your report.



For this project, make a wiki page that begins by explaining your overall pipeline for OR. Your audience is your fellow CS majors not in the course.

Explain the features you used for the OR process, how you computed them, and the classification methods you tested. Your audience for this section is other students in the course.

Your project report should include the images of 5 objects of your choice from your training set and then images of the same 5 objects being recognized by the system.

The final piece of your report should provide a quantitative metric that describes the performance of the system on the OR task.

If you did any extensions, describe the algorithms and show at least one example for each extension.

Give your wiki page the label: cs365s16project03


Put a zip file or tar file of your code in your Private Courses handin directory.