CS 336: Project #4

### Project 4: Counting primes faster with pthreads

On NSCC, create a proj04 directory.

In this project, you will be exploring different methods of counting the prime numbers between 1 and N. You will use 8 threads, and each will be given a range in which to count primes. The question is: where do you store your counter? Do you use a local variable? Do you use a global variable?

1. #### Sequential

Download the sequential solution, prime_search_seq.c, read it, write a Makefile for it, compile it, run it, and embrace it.

2. #### Parallel

Each thread counts a section of the array (the first thread gets numbers 1 through N/8, the second threads gets numbers N/8+1 through 2*N/8, etc.) - it is alright to mandate that N be a multiple of 8 so that it divides evenly among the threads. It doesn't matter if N is a global variable, constant, or command line input. I have found that the value N=800000 is a good one for my laptop (it takes long enough for me to time it, but not too long).

To test the Pthreads version, just compare it to the sequential version.

You will also need to add code to time these programs. Time them from before the pthreads are created to after they have all been joined. To do this, copy my_timing.h and my_timing.c from the fish schooling code into your proj04 directory. Be sure to link my_timing.o into your executable.

1. #### Global Counter (Scalar)

Use a global variable to keep track of the total count of prime numbers. Since all threads will be reading and writing to it like crazy, we need to protect it with a mutex lock. So, make a global variable mutex lock. Every time a thread sees a prime number, it locks the lock, updates the counter, and then unlocks the lock.

2. #### Global Counter (Array), Global Scalar Update

Use a global variable that is an array of counters - one counter for each thread. When a thread sees a prime number, it updates its entry in the global array. Before the thread exits, it should update a mutex-protected global scalar variable.

3. #### Global Counter (Array), Global Array Update

Use a global variable that is an array of counters - one counter for each thread. When a thread sees a prime number, it updates its entry in the global array. The main code should be responsible for summing the entries in this array. I do this after each thread is joined.

4. #### Local Counter, Global Scalar Update

Use a local variable to keep track of an individual thread's prime number count. Then, before the thread exits, it should update a mutex-protected global scalar value.

5. #### Local Counter, Global Array Update

Use a local variable to keep track of an individual thread's prime number count. Then, before the thread exits, place the value in its slot in a global array. The main code should be responsible for summing the entries in this array. I do this after each thread is joined.

3. Learn about false sharing and prevent it. Chapter 7 in Introduction to Parallel Computing talks about Pthreads. They have a nice description of false sharing. Read this and then try to prevent false sharing in the global array version of this problem. To do so, use an array of padded ints instead of an array of ints. Here is my definition of a padded int.
```typedef struct {
int value;
```
When you time your solution, you will likely fail to see an improved runtime. What is your theory as to why we don't observe an improvement?

### Extensions

• The code to determine whether or not a number is prime should take longer to run on large numbers than on small numbers. As a consequence, it may not be a good idea to assign the first N/8 numbers to the first thread, the second N/8 numbers to the second thread, and so on. The eighth thread may take significantly longer to run than the first thread.
• First, time the threads separately and determine whether or not there is a load imbalance (i.e. the final thread takes longer than the first). (Hint: Stephanie found there was one.)
• Then, develop a scheme to rebalance the load (for any N -- not just a specific one). One scheme to consider is dynamic load assignment. Have a global variable (protected by a mutex) that increases by some number much smaller than N (so that this number represents the beginning or end of a subrange of numbers to check). When a thread is ready to work, it grabs the chunk and updates the global variable so that the next thread will take the next chunk. Threads finish when there are no more chunks.
• Finally, evaluate and tweak your scheme to demonstrate its superiority to the originally mandated static allocation of work.

### Writeup and Handin

To hand in your project, you will gather all of the necessary files into a proj04 directory:

1. Create a file named README.txt for your project write-up. This week, I want you to collect timing information from all of your runs and then analyze the results. The analysis is a significant part of the project, so don't leave it until the last minute, when you are too tired to think clearly. Here's the data I want you to collect: the number of seconds it takes to run each version of the program. You should record this 5 times for each program. Drop the min and max values from these 5 and then report the mean of the middle 3. What conclusions can you draw? What can you say about the roles of mutex locks for this problem? If you did the extension, then talk about false sharing as well. Which version of the code is the one you would most like to write? Which would you most like to run?
2. You should hand in all code necessary to run your solutions. Place all necessary .h, .c, and Makefile files in the proj04 directory. Stephanie will probably want to compile and run the code. It should be possible to do so without looking for any more files.

Compress it and email it to Stephanie.