Project 1: What is Data?
The purpose of this project is to give you chance to break down simple problems into precise steps that solve the problem . In computer science speak, you will be developing algorithms. In this case you will write your algorithms as sequences of Python commands that calculate a value or otherwise manipulate data.
- If you haven't already set yourself up for working
on the project, then do so now.
- Mount your directory on the Personal server.
- Open the Terminal and navigate to your Project1 directory on the Personal server.
- Open TextWrangler. If you want to look at any of the files you have already created, then open those files.
- Working with numbers: The first part of the project
is to continue the example from lab of adding three numbers. You will
write six different versions of the code (you have already written
one) that each get increasingly sophisticated.
Version 2: Run your addthree.py file from lab. It should be
clear that python is executing integer math (no decimals in the
answer). Note that we did not include any decimals in our code,
either, so Python automatically assumed that we wanted it to do
For version 2, copy your version 1 lines of code and paste them below version 1. Change the first print statement to 'version 2', and then add a decimal point and a zero to each number in the code. Now you are telling Python that each of the numbers is a floating point number, so it should do floating point math.
Run addthree.py again and see if it gives you a different answer than version 1.
Testing your code: Sometimes your code doesn't work. The first thing to do is to carefully read any error message. It should direct you to the line of the file that failed to work. Some common errors include the following.
- Syntax error: Forgetting the colon at the end of a function definition line.
- Syntax error: Forgetting to pair all parentheses (Python really hates it when you have more left/open parentheses than right/close parentheses).
- Typo: Misspelling a python command or function name
- Tabbing/White Space error: Having
inconsistent tabbing. All code should be lined up
carefully. The main code should have no spaces or tabs at the
beginning of the line. The code "inside" a function
definition should be tabbed in once, with all lines tabbed in
the same amount.
Note that Python considers tabs and white space to be different things, which is sometimes hard to debug because all of your code looks correct. If you think your code is correct, then select all of your code (cmd-A) and then choose Text::Entab in TextWrangler to convert all of your white space to tabs or Text::Detab to convert all of your white space to spaces. That will generally correct the problem if the error is mis-matched white space.
Version 3: Copy your version 2 code and paste it at the bottom
of the file. Change the comment and first print statement to
'version 3'. Then remove the .0 from all of the numbers except
the 3. Leave the 3.0 alone.
Run addthree.py again and see what answer it gives you.
The lesson from version 3 is that when Python does a computation, it uses the most flexible representation present in the mathematical expression to hold the result.
Version 4: If we wanted to use the program to add a
different three numbers, how many places to we have to change
the code? Because we have to change all instances of a number
everywhere in the code, we have to change each number in two
different places. From a coding point of view, that is a bad
idea. It is inefficient (takes too much time and effort) and
prone to create errors in the code (lots of opportunities to
type the wrong thing).
To make our code more efficient, start a version 4 at the bottom of the file. Make three assignments, assigning to the variable a the number 42, assigning to variable b the number 21, and assigning to variable c the number 5. The equals sign is the assignment operator and it copies the information on the right side of the assignment to the variable on the left side of the assignment. So to assign to the variable a the number 42 you would write the following code:
a = 42
Another way of describing that statement is to say the a gets the value 42.
After you have made the three assignments, you can use the variables a, b, and c in mathematic expressions, just like we used 42, 21, and 5 in the prior versions.
Have python print out the sum of the three variables. Just change the expression 42 + 21 + 5 in version 1 to a + b + c for version 4. Do the same for the average, using the expression (a + b +c) and then dividing that expression by 3.0
Make sure your code prints out the same set of values as version 3.
Version 5: in version 4, if we want to add a different
set of three numbers, we still have to edit the code. Version 5
will let us enter the numbers on the terminal when we run the
program, letting us add any three numbers without changing the
Copy version 4 to the bottom of the file. But instead of assigning numbers to the variables a, b, and c, assign the result of calling the raw_input function. Your first assignment might look like the following. Modify the other two to assign b and c.
a = raw_input("Enter first number :")
The raw_input function prints out the prompt and then waits for the user to type something and hit the return key. In this example, whatever the user types after the prompt is stored in the variable a.
What happens when you run this program? Is Python happy with the last line of code?
The basic problem with this program is that raw_input returns a string of characters, not a number. Therefore, we have to convert the string into a number before we can add and divide the numbers properly. This is a process called casting. The following assignment converts the string in variable a into an integer and then assigns the integer back to the variable a, overwriting its old contents.
a = int(a)
Modify version five so that it converts the three variables to integer values before executing the sum or average expressions.
Note that you can find out more about a function like raw_input by using the Python help function. If you start python in a terminal (type python and hit return), then you can ask Python for help about functions and types. If you type
then Python will give you more information about the function, including the fact that it returns a string. You can type 'q' to get out of the help function. Likewise, if you type:
then Python gives you a lot of information about the int type, including that it will convert a number or string to an integer. Again, type 'q' to exit the help viewer. To exit the Python interpreter, you can use control-d, or you can type exit().
Version 6: for version 6, create a new file. Call it
three.py. Put your name, a date, and the project at the top in
comments. After that, import the sys package by writing the
following line of code.
Importing packages is something we do a lot when using Python, because there are lots of packages people have written that do useful things. The sys package lets us communicate with information from the terminal.
Copy version 5 into the new file and call it version 6. Instead of using raw_input to get information from the Terminal, however, use the function sys.stdin.readline().
a = sys.stdin.readline()
Do the same for variables b and c. You will still need to cast the variables to integers, just as in version 5.
To run version six, we're going to use a file named threenumbers.txt that contains three numbers, each on a different line. Download it into your Project1 directory. You are going to use the cat terminal command to dump the contents of the file and then you will pipe the contents to your three.py program. The terminal command to do this is as follows.
cat threenumbers.txt | python three.py
The vertical slash is called a pipe in Unix terminology and it sends the output of one program to the input of the next. So it dumps the contents of the threenumbers.txt file into the input stream of three.py. When you run the program, you should get the same output as the prior versions. To change what numbers you want to sum, however, you just need to change the input file, not the python code.
One last change to this version. Make it so it can read in floating point numbers (type float) instead of just integers (type int). Figure out how to cast something to type float instead of type int.
- Version 2: Run your addthree.py file from lab. It should be clear that python is executing integer math (no decimals in the answer). Note that we did not include any decimals in our code, either, so Python automatically assumed that we wanted it to do integer math.
- Unix Tools: For the second main task, you will explore some
standard Unix tools for requesting data from a web page and then
extracting information from that data stream. The source web page we
will be using is the Goldie Buoy Data from Great Pond.
This page contains a comma-separated [CSV] file with data every 15 minutes from May through July 30th, 2016. The data include information about the buoy, informationa bout temperature, information about how much chlorophyll is in the water, and information about how much visible light is available.
To see the contents of this web page, use the following Terminal command.
To be able to scroll through the data, pipe the output of curl to the program less. You can type q at any time to exit less.
curl http://schupflab.labs.keyes.colby.edu/buoy/Goldie2016.csv | less
To select data from a particular data, you can pipe the curl output to a program called grep that searches for lines that contain a particular string. For example, the following finds all of the data from July 4th, 2016.
curl http://schupflab.labs.keyes.colby.edu/buoy/Goldie2016.csv | grep 07/04/2016
In a CSV file, all of the different fields are separated by commas. Another useful Unix command is cut which allows us to chop up lines of text into fields using a specified separator. To find out more about the cut command, you can use man cut on the terminal, or you can take a look at this overview of cut. Field 11 happens to be the 1m temperature measurement. So we could get the 1m temperature measurement for July 4th, 2016 using the following.
curl http://schupflab.labs.keyes.colby.edu/buoy/Goldie2016.csv | grep 07/04/2016 | cut -d ',' -f 11
Now you have a single stream of numbers coming from the buoy data. The diagram below shows the whole process.
Try piping the output of the last command to your python program that adds and averages three values. To do that, type the line above, then add the pipe symbol, then add python three.py. Does your answer seem reasonable? It should sum and average the first three numbers from the buoy.
Take a screenshot of the Terminal with the command and output. You should include this in your wiki page write-up as an image to demonstrate the correctness of your code. This is required image 1.
Writing a function: An important concept in programming is the idea of
a function. A function is a set of instructions with a name. A
function will sometimes take one or more inputs, and sometimes it will
have one or more return values. When a program executes a function it
stops what it is doing, executes the function, then goes back to where
it was in the code.
Right now we're using the expression (a + b + c) twice in our code, when what we want is the sum of three numbers. To explore how to write a function, let's edit our file three.py.
To define a function, we use the keyword def followed by the name of the function. You can call a function whatever you like, so long as it starts with a letter and contains letters, numbers, or the underscore _ character. Go ahead and define a function called sum3.
def sum3(x, y, z):
The parameters x, y, and z in parentheses tell Python that the function takes three arguments and the colon tells Python to begin a block of code. A block of code must be indented relative to its parent, and the end of the indentation indicates the end of the block.
On the first line of the function, indented relative to the def statement, assign to sum the expression x + y + z. This puts the sum of x, y, and z into the variable sum.
The second and last line of the function is a return statement. In order to use the value that is in the variable sum outside of the function sum3, we have to return the value, which means moving data. To do that, we need to return sum as the last line of the function. A function ends when it hits a return statement.
Once you have finished the function, the rest of your code is similar to the prior code. Put a comment after your function that says:
# main code
Then everwhere you have the expression a + b + c, you can replace it with the function call sum(a, b, c). The values in a, b, and c will get copied into x, y, and z inside the sum3 function, and their sum will replace the function call in the original expressions.
Test your program. You can pipe the contents of three.txt to it, or alternatively, you can use the Unix commands to access the buoy data and pipe it to python three.py. Is your answer still reasonable? If not, then examine your code carefully to determine what went wrong and fix the problem.
- Adding All the Numbers: the last coding task is to
write a program that adds all of the numbers coming from the standard
input and outputs the average. Since we don't know how many numbers
there will be, we will have to use a simple loop and keep going until
the numbers run out. That also means we have to count how many
numbers are in the input.
Create a new file addlots.py. Put your name and a date at the top. Your main program will have three parts: initialization, a loop, and the final calculations and print commands. Each of the following comments corresponds to one line of code.
# import sys # assign to sum the value 0.0 # assign to count the value 0 # assign to nextval the result of calling sys.stdin.readline() # while nextval.strip() != '': # assign to sum the value of sum plus the result of casting nextval to a float. # assign to count the result of count + 1 # assign to nextval the result of calling sys.stdin.readline() # print an appropriate string and the value of count # print an appropriate string and the value of sum / count
Using the final unix command from above that grabs all of the 1m temperature values from July 4th, pipe that to your program and see what you get. You should get an N of 96 and an average of 22.556 (it might print more decimal places). Take a screenshot of the Terminal with this output. This is required image 2.
The final task is to compare the average temperature at 1m on the 4th day of the month in May, June, and July 2016. Alternatively, you can analyze the 2015 data from May through September at the link:
Report your results as part of your writeup. You can do this by changing the argument to the grep function on the terminal.
Take a screenshot of the Terminal with this output or copy-paste the text. This is required "image" 3.
If you have a problem with files looking as though they have just one line (the last one), then check out these instructions for fixing the problem
Each assignment will have a set of suggested extensions. The required tasks constitute about 85% of the assignment, and if you do only the required tasks and do them well you will earn a B+. To earn a higher grade, you need to undertake one or more extensions. The difficulty and quality of the extension or extensions will determine your final grade for the assignment. One complex extension, done well, or 2-3 simple extensions are typical.
The following are a few suggestions on things you can do as extensions to this assignment. You are free to choose other extensions.
- Test your ability to change which field of the file your code is reading and compute average temperatures at other depths.
- Pick a different data source and show that you can also get it to work. It's probably best to stick with CSV formatted sources.
- Figure out how to compute the minimum and maximum temperature from the data stream. Print out those values along with the average and count.
- Calculate the standard deviation of temperature over the course of a day. This can be tricky, but it's possible to compute standard deviation on line (without reading over the data twice).
As you can check both the 2015 and 2016 data, see which summer was
warmer. You can also access data from a second buoy that has all
of the 2016 data through the current day. The 1m temperature is
field 9 in this data.
LEA buoy: http://schupflab.labs.keyes.colby.edu/buoy/3100_iSIC.csv
Turn in your code
You will turn in your code (all files ending with .py) by putting it in a directory in the Courses server. On the Courses server, you should have access to a directory called CS152, and within that, a directory with your user name. Within this directory is a directory named private. Files that you put into that private directory you can edit, read, and write, and the professor can edit, read, and write, but no one else. To hand in your code and other materials, you will create a new directory, such as Project1, and then copy your code into the project directory for that week. Note: This directory will not be available during lab, but will become available during the week before the projects are due.
As with the Personal server, there are two ways to mount the appropriate directory.
- Option 1: Load the root server directory and navigate to your directory.
You can mount the Colby fileserver root directory by going to the Finder and typing cmd-K, or selecting 'Connect To Server...' from the Go menu. It will bring up a dialog box, into which you want to enter the following.
Mac : smb://filer.colby.edu/Courses Windows: \\filer.colby.edu\Courses
Then click on the CS152 directory, and then your hand-in directory (it will have your username as its name).
- Option 2: Mount your directory directly.
You can mount your personal directory explicitly using the the following path in the 'Connect To Server...' dialog.
Turn in your code by copying your entire Project1 directory from your Personal server to the Courses server. The easiest way to do this is to drag and drop the folder from one Finder (one open to Personal) to another (one open to Courses).
Write about the project on the wiki
In lab, you made a new wiki page for your assignment. Put the label cs152f16project1 in the label field on the bottom of the page. But give the page a meaningful title (e.g. Milo's Project 1).
Next, expand on the wiki page you began in lab. In general, your intended audience for your write-up is your peers not in the class. Your goal should be to be able to use it to explain to friends what you accomplished in this project and to give them a sense of how you did it. Follow the outline below.
- A brief summary of the project, in your own words. This should be no more than a few sentences. Give the reader context and identify the key purpose of the assignment.
- A description of your solution to the tasks, including any text output or images you created (including the three required images mentioned above). This should be a description of the form and functionality of your final code. Note any unique computational solutions you developed or any insights you gained from your code's output. You may want to incorporate code snippets in your description to point out relevant features. Code snippets should be small segments of code--usually less than a whole function--that demonstrate a particular concept. If you find yourself including more than 5-10 lines of code, it's probably not a snippet.
- A description of any extensions you undertook, including text output or images demonstrating those extensions. If you added any modules, functions, or other design components, note their structure and the algorithms you used.
- A brief description (1-3 sentences) of what you learned. Think about the answer to this question in terms of the stated purpose of the project. What are some specific things you had to learn or discover in order to complete the project?
- A list of people you worked with, including TAs and professors. Include in that list anyone whose code you may have seen, such as those of friends who have taken the course in a previous semester.
- Double-check the label. When you created the page, you should have added a the label cs152f16project1. Make sure it is there.