Our team was inspired with machine learning that we studied and wanted to do something cool with it.
What it does
Our code identifies hate speech in twitter type corpus. As an example we took Donald Trump's tweets.
How we built it
We wrote on python and python's libraries for data analysis (NumPy, Pandas, Sklearn). We used gensim - implementation of Word2Vec algorithm. To perform our code we used Croc Cloud.
Challenges we ran into
We understood that to understand hate speech can be complicated even for humans, that's why we started ouR work with discussion about structure of hate speech and its operationalisation. The most difficult part was to have Trump's tweets tagged according to their intensity. Also, our model needed features to be trained with, so we manually analysed tweets, trying to understand what was the primary offense of the message.
Accomplishments that we're proud of
We managed to use and manipulate vectors for NLP. Also, we are glad that our model gives 70-78 percent of accuracy.
What we learned
We used Word2Vec in our project and therefore learned it in practice. Also our background was mostly about numeric data, and now we have an idea how to process natural language.
What's next for N-dimensional Hatred
In theory algorithm can be used for detecting hate speech in media to prevent hostility spread. Our algorithm can be improved by increasing corpus of tagged speech.