Due: Monday, April 24, 2017, 11:59 pm
The main purpose of this project is to give you an opportunity to use your priority queue to determine which words are the most frequently used in reddit post. A second purpose is to give you the opportunity to find trends in the usage of particular words over 8 years of Reddit posts.
PQHeap<KeyValuePair<String,Integer>>to store the pairs and retrieve them in order from highest count to lowest count (i.e. the Comparator will need to operate on KeyValuePairs, but use the Value to do the comparison).
Test your class using counts_short.txt. Debug it until it works perfectly.
Run your code on at least one set of Reddit comments and report the 10 most frequent words.
Report the frequency for each word in each file. Note that it is important to report the frequency, rather than the count, because each file has a different number of words.
For example, you could design your main program to require inputs formatted according to this usage statement:
USAGE java FindTrends <WordCountBaseFilename> <WordCountNumberBegin> <WordCountNumberEnd> <interestingWord1> <interestingWord2> ... where <WordCountBaseFilename> contains the text part of the name of each WordCount file you want to analyze. and <WordCountNumberBegin> refers to the first number and <WordCountNumberEnd> refers to the last number in the range of word files you want to analyze. <interestingWord1> <interestingWord2> ... is the list of words you want to analyze.
To generate the graph shown as follow
You can use the following command and get the output similar to that on the screenshot.
java -Xmx512m FindTrends ../proj07/counts_reddit_comments_ 2008 2015 snapchat uber tesla microsoft apple yahoo
Then, copy-pasted the output to an Excel spreadsheet, added years as column headers, and plotted the results as lines. Sometimes you may need to press the "Switch Plot" button to switch the rows and columns.
Call FindTrends with one or more lists of approximately 6-10 words. Choose a theme for the words that you think may trend over 8 years, keeping in mind that the comments are all collected during the month of May. It is a good idea to use words that are not particularly common, such as proper names. Here are some lists that you may use include:
But you are encouraged to develop your own list.
Generate a line graph with your results and include it in your write-up. Also, in your write-up, you should include an analysis of the output. Are these trends expected? unexpected? What events at the time could explain the trends?
Make your writeup for the project a wiki page in your personal space. If you have questions about making a wiki page, stop by my office or ask in lab.
Once you have written up your assignment, give the page the label:
You can give any wiki page a label using the label field at the bottom of the page. The label is different from the title.
Do not put code on your writeup page or anywhere it can be publicly accessed. To hand in code, put it in your folder on the Courses fileserver. Create a directory for each project inside the private folder inside your username folder.
© 2017 Ying Li. Page last modified: .