-
Everyone knows reddit :)
-
Just a source of cats, useful bananas and memesters?
-
Slide1 - Mining reddit for information on how people felt about Trump, Sanders, and Clinton over the last 12 months
-
Overview of the project methods :)
-
Wow, that's a lot of Trump!
-
Some interest trends in posts by month, upvote ratios, popularity ('score') and Trump-ness
-
Reddit actually has a really active political community. Clearly there is a lot of information that we can gain from analysing r/politics
-
As a rough and ready measure of emotions, mapped a known set of word-emotion pairs to r/politics submission language
-
Not too interesting yet..
-
Quite a lot of emotional separation by Trump-ness
-
Future work! Ironing out the messy code for the previous analyses aaaand..
-
What a "surprising" result, heheh.
-
Hillary posts don't seem the same separation
-
Seeing if The Donald uses recognisably different sentence structures. But maybe, beyond comprehension? :)
-
Doing emotion detection on photos of world leaders meeting, to predict political outcomes
-
Azure for sentence topic tagging and emotion tagging
-
Methods appendix for the keenos
-
Jupyter notebook + python and a Reddit API wrapper used to mine Reddit data
-
Output is pandas dataframe from .json dict
-
Pusher attempt was not successful :/
-
R code in RStudio (<3)
-
Azure as before
- Note: see images for presentation slides (I like to make figures/explanations/documentation).
The slides seem to show up out of order on this site. I also stuck a .pdf and .pptx are on my github for convenience.
Inspiration
Reddit isn't just cat memes - there is a hugely active political community that is rich in information about the public's attitudes and feelings towards political topics and candidates. A simple data mining and sentiment analysis approach should be able to capture some interesting information.
What it does
This was a data science project, so it involved me building scripts to grab a lot of data, then cleaning and visualising the data. I made some interactive plots in ggplot/plotly and outlined some interesting trends that we can see from analysing the content and emotional valence of the top 1000 posts in the previous year (see presentation slides/pdf).
How I built it
I used a Reddit API wrapper library called PRAW to gather the 1000 most "top" and "controversial" posts in the last 12 months, together with the top comments on this post. I also have information on the upvote ratio, total post score, whether the top comment was gilded, etc. Lots of data (but annoyingly, limited to n=1000 because after that I get various HTTP errors from PRAW/urllib).
Data was then imported in R/RStudio, in which I used a set of data analysis and visualisation tools that I'm already comfortable with. Microsoft cognitive services was used to tag sentence topics and score emotions from 0 to 1, but I also used outside data to get a more fine grained view of emotion.
Challenges I ran into
Lots of new tech O_O. Issues with request timeouts, and lots of trouble with Pusher. Still, happy I got something off the ground.
Accomplishments that I'm proud of
Never mined Reddit before, but have been wanting to since I'm a total Reddit fanboy.
What I learned
Lots about reddit's internal structure, and more generally how to mine reddit data :)
What's next for Data Mining Political Emotions on Reddit
Patching up the existing analyses, and extending into mapping the linguistic structures by candidate (maybe a model could identify Trump's distinctive stoccato, rambling style?).
I also think that doing emotion detection in the faces of political leaders in photos of political meetings could be insightful - if one leader is looking happy and another is annoyed, this could be a predictor of a bad meeting, which could have potentially worldwide significance.
Log in or sign up for Devpost to join the conversation.