Big Data on Reddit
With this project, you can look at how a subreddit reacts to different posts. This allows you to find the general attitude of a large group of people toward a subject passively. Then this same data can be used to find out which sites are posted more for negative content versus positive. It can also be used to find the general mood of a subreddit at any given time. This would allow a news site to not only look at which subjects provoke the largest response from users but also determine the best time to publish it. By using this data smartly, a news site can know when, what, and where to publish any given article for the largest viewership. Plus, because of the way reddit is structured, a news aggregator with a large and active userbase, it is easy to extrapolate this data to the reddit population as a whole using the demographic data of reddit. This would allow the attitude of a large population to be quickly determined on breaking news. It takes away the need to poll and sample opinions as often and lends itself to finding the popular opinion quickly and efficiently.
The goal is to later add a database to save the sentiment of all the posts and to also get a better NLP so that more comments can be run though at a time. This would allow for a better sample and would also allow for more analysis to be done, such as finding the key words in a post title that lead to more exposure. It would also allow for me to cross-reference the sentiment of a post to its newspaper on large sample, and therefore determine the overall semtiment that a newspaper displays on certain subjects.
In all, this is a small project that, given time to create a good API and an increased budget for access to better APIs (the ones I'm using now are slow because they're free), could allow for anyone to scrape reddit for the data they need to optimize their release.