[1] - CounterGambit

Screenshot of Saturday Night Workings :)
Confidence of hate speech and non hate speech with "I love you"

Link to our App!

http://countergambit.herokuapp.com/

Inspiration

In 2020, a popular YouTube channel was blocked for hate speech. Although terms like “black”, “white”, “attack”, “capture” were used, it was still mistakenly flagged by AI due to not understanding the context — chess. Inspired by the paper “Are Chess Discussions Racist? An Adversarial Hate Speech Data Set” (Sarker & Khudabukhsh), we improved upon an existing hate speech detection model utilizing BERT to improve its accuracy in regards to chess comments.

What it does

We now have a model that identifies chess comments as non-hate-speech that can be applied to (or reapplied to) Youtube comments, Twitter feeds, and more to protect the freedoms of law-abiding citizens while keeping them safe from harmful speech.

We're improving the ethical implications of AI as leaders in the field who create and have an understanding of the behind the scenes. This extends to other debiasing work in the field, especially for minorities who can realize their dreams when they're placed in models that represent the world for what it should be. Each small step we take towards helping our community has a big impact on the future we have. We took a small step today for the chess community, but this has a meaningful impact on how AI treats the world.

How we built it

We improved upon a prebuilt BERT model from HuggingFace on Hate Speech detection. We found multiple hate-speech databases and combined them with correctly labeled (inoffensive) chess comments and retrained the model to see its improvements.

Challenges we ran into

Getting the model run on the GPU instead of CPU
Tuning different hyperparameters
Storage & RAM on our devices kept running out
Figuring out how to translate model output into API calls

Accomplishments that we're proud of

As a team meeting each other for the first time, we worked well together and had trust in our work
We are proud of getting the model to work as only 1 person on the team had taken an ML class before

What we learned

NLP basics (train, validation, test sets are different ahh!)
Data Visualization tools (Tensorboard, Chart.js)
Training a model takes an ~e x t r e m e l y~ long time to run...

What's next for Counter Gambit

Training on more examples for better accuracy
Tuning our hyperparameters
Switch to Google cloud's AutoML instead of a compute engine for training and deployment of the model

Updates

We now have the API running fully on the cloud! Our data was collected using Google Cloud's YouTube API (to get data for chess comments), a model was trained using a Compute Engine, and now we have an HTTP/SSL server hosted on the Compute Engine, and the backend supports API calls to the machine learning model for a fully independent frontend design.

Resources and References

“Are Chess Discussions Racist? An Adversarial Hate Speech Data Set” (Sarker & Khudabukhsh): https://arxiv.org/pdf/2011.10280.pdf
- Chess Speech Data Set: https://github.com/styx97/chess_racism
- Prebuild hate-speech model: https://huggingface.co/Hate-speech-CNERG/dehatebert-mono-english

Built With

css
cuda
html
javascript
python
pytorch
react
tensorboard
tensorflow

Submitted to

HackUIowa
- Winner Best Data Visualization Award by Leepfrog
Hacktech 2021

Created by

I help set up tensorboard to visualize and track our model as I trained it on 2 different dataset - a dataset with chess comments and a dataset without.

Christine Lam
CS Major @ Wellesley College 2022
Worked on setting up the BERT model in the backend and modifying it to accept the new chess data set.

Lauren Mangibin
I worked on preprocessing the data, helping design the machine learning model, and running an HTTP server on Google Cloud to host the backend and run the deployed model.

Andrew Mascillaro
ECE Major at Olin College of Engineering
Brandon Samaroo