• Note: see images for presentation slides (I like to make figures/explanations/documentation).

The slides seem to show up out of order on this site. I also stuck a .pdf and .pptx are on my github for convenience.

Inspiration

Reddit isn't just cat memes - there is a hugely active political community that is rich in information about the public's attitudes and feelings towards political topics and candidates. A simple data mining and sentiment analysis approach should be able to capture some interesting information.

What it does

This was a data science project, so it involved me building scripts to grab a lot of data, then cleaning and visualising the data. I made some interactive plots in ggplot/plotly and outlined some interesting trends that we can see from analysing the content and emotional valence of the top 1000 posts in the previous year (see presentation slides/pdf).

How I built it

I used a Reddit API wrapper library called PRAW to gather the 1000 most "top" and "controversial" posts in the last 12 months, together with the top comments on this post. I also have information on the upvote ratio, total post score, whether the top comment was gilded, etc. Lots of data (but annoyingly, limited to n=1000 because after that I get various HTTP errors from PRAW/urllib).

Data was then imported in R/RStudio, in which I used a set of data analysis and visualisation tools that I'm already comfortable with. Microsoft cognitive services was used to tag sentence topics and score emotions from 0 to 1, but I also used outside data to get a more fine grained view of emotion.

Challenges I ran into

Lots of new tech O_O. Issues with request timeouts, and lots of trouble with Pusher. Still, happy I got something off the ground.

Accomplishments that I'm proud of

Never mined Reddit before, but have been wanting to since I'm a total Reddit fanboy.

What I learned

Lots about reddit's internal structure, and more generally how to mine reddit data :)

What's next for Data Mining Political Emotions on Reddit

Patching up the existing analyses, and extending into mapping the linguistic structures by candidate (maybe a model could identify Trump's distinctive stoccato, rambling style?).

I also think that doing emotion detection in the faces of political leaders in photos of political meetings could be insightful - if one leader is looking happy and another is annoyed, this could be a predictor of a bad meeting, which could have potentially worldwide significance.

Built With

Share this project:

Updates