Data Mining Political Emotions on Reddit

Everyone knows reddit :)
Just a source of cats, useful bananas and memesters?
Slide1 - Mining reddit for information on how people felt about Trump, Sanders, and Clinton over the last 12 months
Overview of the project methods :)
Wow, that's a lot of Trump!
Some interest trends in posts by month, upvote ratios, popularity ('score') and Trump-ness
Reddit actually has a really active political community. Clearly there is a lot of information that we can gain from analysing r/politics
As a rough and ready measure of emotions, mapped a known set of word-emotion pairs to r/politics submission language
Not too interesting yet..
Quite a lot of emotional separation by Trump-ness
Future work! Ironing out the messy code for the previous analyses aaaand..
What a "surprising" result, heheh.
Hillary posts don't seem the same separation
Seeing if The Donald uses recognisably different sentence structures. But maybe, beyond comprehension? :)
Doing emotion detection on photos of world leaders meeting, to predict political outcomes
Azure for sentence topic tagging and emotion tagging
Methods appendix for the keenos
Jupyter notebook + python and a Reddit API wrapper used to mine Reddit data
Output is pandas dataframe from .json dict
Pusher attempt was not successful :/
R code in RStudio (<3)
Azure as before

Note: see images for presentation slides (I like to make figures/explanations/documentation).

The slides seem to show up out of order on this site. I also stuck a .pdf and .pptx are on my github for convenience.

Inspiration

Reddit isn't just cat memes - there is a hugely active political community that is rich in information about the public's attitudes and feelings towards political topics and candidates. A simple data mining and sentiment analysis approach should be able to capture some interesting information.

What it does

This was a data science project, so it involved me building scripts to grab a lot of data, then cleaning and visualising the data. I made some interactive plots in ggplot/plotly and outlined some interesting trends that we can see from analysing the content and emotional valence of the top 1000 posts in the previous year (see presentation slides/pdf).

How I built it

I used a Reddit API wrapper library called PRAW to gather the 1000 most "top" and "controversial" posts in the last 12 months, together with the top comments on this post. I also have information on the upvote ratio, total post score, whether the top comment was gilded, etc. Lots of data (but annoyingly, limited to n=1000 because after that I get various HTTP errors from PRAW/urllib).

Data was then imported in R/RStudio, in which I used a set of data analysis and visualisation tools that I'm already comfortable with. Microsoft cognitive services was used to tag sentence topics and score emotions from 0 to 1, but I also used outside data to get a more fine grained view of emotion.

Challenges I ran into

Lots of new tech O_O. Issues with request timeouts, and lots of trouble with Pusher. Still, happy I got something off the ground.

Accomplishments that I'm proud of

Never mined Reddit before, but have been wanting to since I'm a total Reddit fanboy.

What I learned

Lots about reddit's internal structure, and more generally how to mine reddit data :)

What's next for Data Mining Political Emotions on Reddit

Patching up the existing analyses, and extending into mapping the linguistic structures by candidate (maybe a model could identify Trump's distinctive stoccato, rambling style?).

I also think that doing emotion detection in the faces of political leaders in photos of political meetings could be insightful - if one leader is looking happy and another is annoyed, this could be a predictor of a bad meeting, which could have potentially worldwide significance.