Reddit Hate Speech NLP

Inspiration

Our inspiration was to help prevent and identify hate speech.

What it does

Our code traverses through reddit communities searching for key words. It then runs matches through a natural language processor and identifies how positive or negative the comment or post title was. By gathering all this data we are able to map relationships between users, subreddits, and subjects to identify patterns in hate speech.

How we built it

We used Python to gather data from reddit, Google's natural language processing cloud api to perform sentiment analysis, mysql to store the data, and neo4j to display the data.

Challenges we ran into

Because we used Google's implementation for natural language processing we were unable to tweak the analysis to better match our goals.

Accomplishments that we're proud of

We were able to complete the entire product we set out to build. We did this by controlling the scope and complexity of our project.

What we learned

We learned the difficulties of analysing language as well as pulling meaningful relationships out of large data sets.

What's next for Reddit Hate Speech NLP

We hope to better tweak the NLP code to gather more relevant comments and posts. We also would hope to expand beyond reddit, and also track sentiment changes over time.