Our inspiration was to help prevent and identify hate speech.
What it does
Our code traverses through reddit communities searching for key words. It then runs matches through a natural language processor and identifies how positive or negative the comment or post title was. By gathering all this data we are able to map relationships between users, subreddits, and subjects to identify patterns in hate speech.
How we built it
We used Python to gather data from reddit, Google's natural language processing cloud api to perform sentiment analysis, mysql to store the data, and neo4j to display the data.
Challenges we ran into
Because we used Google's implementation for natural language processing we were unable to tweak the analysis to better match our goals.
Accomplishments that we're proud of
We were able to complete the entire product we set out to build. We did this by controlling the scope and complexity of our project.
What we learned
We learned the difficulties of analysing language as well as pulling meaningful relationships out of large data sets.
What's next for Reddit Hate Speech NLP
We hope to better tweak the NLP code to gather more relevant comments and posts. We also would hope to expand beyond reddit, and also track sentiment changes over time.