CS 336: Projects

Drawing a Julia Set with Pthreads (due Sat Oct. 18)

To draw a Julia set, we compute the color for each pixel according to the function z(n+1) = z(n) + c. In this assignment you will be developing multiple parallel versions of the Julia set drawing problem and running them on the Natural Sciences Computer Cluster (NSCC). An important part of the project is analysis of the effectiveness of the different data distribution algorithms.

There is a helpful webpage about Julia sets that contains an analysis of the Julia set we are creating at http://aleph0.clarku.edu/~djoyce/julia/. I highly recommend the introduction to Julia and Mandelbrot sets.

Logging onto NSCC

You all have accounts on the Natural Sciences Compute Cluster (NSCC) in order to run code for CS 336. The node we will be directed to has 8 processors, with each pair of processors sharing a 6MB L2 cache. There is no queueing software installed, so it is up to you to be careful not to hog the machine nor to clobber other people's jobs.

My advice is to develop and test your code on your machine first. You can even use 8 threads to simulate the use of 8 processors - there just won't be a speedup. Once you are reasonably happy with your code, use an SFTP or SCP server (I use SFTP on Fugu) to move it (and its makefile) to the network server (colby0.colby.edu). That puts the code into a directory that NSCC can access.

To run your code on NSCC, you must log onto it using SSH. In a Terminal window, type ssh nscc.colby.edu. The system is set up to automatically place you in your home directory associated with the colby0 file server. Tho see who else is logged in to the machine, type who on the command line. To see who is running jobs on the machine, type top (then, to quit it, type q). If none of your classmates are actively using it, then it should list FahCore_a2.exe as the process taking up most of the CPU time for each processor. This is code Randy is running in the background. It has a low priority and will yield to any processes you start.

If you have any problems using this computer, let me know AS SOON AS POSSIBLE!

What to Do

  1. Create a directory structure for your project with a root directory named proj1 and daughter directories src, obj, and bin. Download ppmIO.h, ppmIO.c, julia.c, and Makefile into the src directory.
  2. Do a sanity check - compile (make julia) and run (../bin/julia) it on NSCC. It should generate julia_seq.ppm using the sequential code. To view the result, convert it to a JPEG using the command line convert julia_seq.ppm julia_seq.jpg, download it to your machine, and open it. It should look like the Julia set we saw in class.
  3. Familiarize yourself with the sequential version of the code. In particular, note
    • The image is designed to contain the complex plane with the real values between -1.5 and 1.5 and the imaginary values between -1 and 1.
    • The code computes one grayscale value for each pixel. To do so, it iterates the function z(n+1) = z(n) - 0.75. z here indicates a complex number, but the code considers the real and imaginary parts separately using x for the real part and y for the imaginary part.
    • The main tuning knobs for this problem are the num_rows and num_cols global variables which determine the resolution of the image. I suggest you increase them to 2000 each once you have your parallel code working.
  4. Add support for a parallel version that uses static allocation of work. I suggest you copy julia.c into a file julia_static.c. Implement three forms - one that assigns different rows to different processors, one that uses cyclic allocation, and one that uses block cyclic allocation (you can determine the block size).
  5. Add support for a parallel version that uses dynamic allocation of work. I suggest you copy julia.c into a new file julia_dynamic.c for the new implementation. Implement a work queue like that shown in class for the Collatz conjecture. Recall that WORK_CHUNK is the macro that determines how many pixels at a time are assigned to a thread. This is an important tuning knob.
  6. Thoroughly test your code.
  7. Compile and run the working code on NSCC and then analyze the speed-up for all confiugrations for at least three different problem sizes (and block sizes and WORK_CHUNK values). There is code in sequential version to report the time it takes to run, and you should add similar code to the parallel version. For each data decomposition and block or WORK_CHUNK size, record the times and graph them against the number of threads. Also, record the runtime for the sequential code. Choose your problem sizes carefully - be sure the problem size is large enough to cause the 8-threaded version to run at least 4 seconds. Does changing the problem size affect the speed-up (i.e. does the ratio between the running time for the 1-threaded version to the 8-threaded version change from problem size to problem size? If so, can you make a conjecture as to why?)
  8. Write-up your results. The report should include your graphs and text describing the data in the graphs and answers to the questions above. In addition, discuss whether or not the speed-up is what you would expect theoretically. Is the parallel version with 1 thread faster, slower, or the same as the sequential version? Why? Which data decomposition provides the greatest speedup? Which data decomposition provides the smallest speedup? Why? Spend a few paragraphs discussing the merits and drawbacks of the data decompositions you used.

What to Turn In

Turn in your code and a short report by emailing me. Put "CS336" in your subject line so that my mail filters will catch it and I will notice it right away. For an example report, see the HW5 Example Write-up.



Back to Projects Page