Our team was inspired with machine learning that we studied and wanted to do something cool with it.

What it does

Our code identifies hate speech in twitter type corpus. As an example we took Donald Trump's tweets.

How we built it

We wrote on python and python's libraries for data analysis (NumPy, Pandas, Sklearn). We used gensim - implementation of Word2Vec algorithm. To perform our code we used Croc Cloud.

Challenges we ran into

We understood that to understand hate speech can be complicated even for humans, that's why we started ouR work with discussion about structure of hate speech and its operationalisation. The most difficult part was to have Trump's tweets tagged according to their intensity. Also, our model needed features to be trained with, so we manually analysed tweets, trying to understand what was the primary offense of the message.

Accomplishments that we're proud of

We managed to use and manipulate vectors for NLP. Also, we are glad that our model gives 70-78 percent of accuracy.

What we learned

We used Word2Vec in our project and therefore learned it in practice. Also our background was mostly about numeric data, and now we have an idea how to process natural language.

What's next for N-dimensional Hatred

In theory algorithm can be used for detecting hate speech in media to prevent hostility spread. Our algorithm can be improved by increasing corpus of tagged speech.

Built With

Share this project: