Stop the Hate

Inspiration

All of our members have been using the internet for a long time, experiencing various forms of hate speech, with much of it being targeted against Asians, Pacific-Islanders, and Asian Americans. From the prompts given—and our personal experiences, we thought we could help combat anti-asian hate speech through a crowd sourced machine learning model similar to Google ReCaptcha.

What it does

Our project uses a python server utilizing an NLP machine learning model in order to detect hate speech. We use an SQLite database in order to store the model used for training. We grabbed an existing dataset to train the model initially. We then developed a Google Chrome extension, which allows the user to submit hate speech they happen to come across on the internet by highlighting the text, right clicking, and then selecting the report feature. This adds the input to our database, by connecting to the Python server. Currently, the model retrains after every 10 new inputs. We then developed a webscraper and a website in order to display how efficient the model is at detecting hate speech.

How we built it

We built the NLP model with scikit-learn and NLTK. Chrome extension is built with javascript and uses messaging to exchange data. Server is built using flask with blueprint.

Challenges we ran into

We initially ran into a problem with the webscraper, as it was originally developed in JavaScript. We realized that this would run into issues communicating with the model, server, and website in certain use case instances. We found it to be more efficient to have the web scraper written in Python in order to allow it to more easily and directly communicate with the server.

Accomplishments that we're proud of

Some of the accomplishments that we are proud of include getting the NLP model to work, getting the NLP model to read data from the webscraper and apply a hate score to it and then push it to the website. Getting the server to run properly (up and running, updating the database from the chrome extension inputs, working chrome extension).

What we learned

We learned how to utilize NLP model, how to prevent SQL injection with SQLite, how to style using CSS and JQuery, callbacks with JavaScript.

What's next for "Stop the Hate"

Cleaning up the website, handling data poisoning, adding a route to manually clean the database, multithreading long NLP operation to prevent stalling, and making a bot that automatically reports hate speech on social media website.