TweetHawk

Consolidate NLP and supervised learning decided harassing tweets on Google Maps API using location data

Over the last few years trolling and harassing has increasingly become a major problem on the internet. This is a big concern that most big social networks have to deal, since harassment adversely affects the users and the community and depicts a negative image of the brand. We have personally witnessed people who we care about be victims of online harassment and we want to help stop this verbal violence.

We decided to work with Twitter because most of our friends were targeted through Twitter and we hope to stop this from happening to other people. Our first step was to select a location and gather tweets from the area. Then, in the back-end we used our Harassment Analysis Model (HAM) to identify troubling tweets and link them to its author.

Our HAM was trained using supervised learning with the labeled data from HackHarassment. The neural network consisted of a binomial perceptron with Hinge Loss to reduce false positives. The features originally consisted of unigrams but bigrams were added to improve the performance.