Twitter is a great form of social media that allows one to express their opinions quickly and concisely. Sometimes it can be used maliciously and users can be verbally or sexually harassed. We hope to combat this problem and recognize individual tweets that are harmful. This is especially because if there is no hashtag (as is the case for most non-celebrities/the everyday man or woman), then it is harder to recognize or see a trend of abuse on Twitter. With Spotlight, we hope to find these trends and classify tweets as malicious.
What it does
Spotlight takes the URL of a tweet and using the Twitter API, analyzes the text of the tweet and classifies the text as positive, negative, or neutral. The output of the program is a percent probability as to whether or not the text is negative.
By using sentiment analysis, harmful language can be recognized and the tweet can be classified as potentially malicious and "at risk" of being flagged.
How we built it
We used machine learning to train Spotlight. With a database of tweets, we were able to manually classify a series of tweets as positive, negative, or neutral. Spotlight created word frequency distributions of the three types of words (positive, negative, neutral). Using these frequencies, a new tweet could be assigned a percent probability of "negativity" or "maliciousness." The word frequency distributions used by Spotlight are updated with every user's inputed tweet, getting more accurate with every user entry.
Spotlight is primarily written in Python and implemented as a web application.
What we learned
We all learned how to work with GitHub, databases, and frameworks!
Applications: How we hope to expand on our project
After implementing single tweet analysis, we wondered if we could identify whether or not a malicious tweet was in response to another. Our next goal is to trace the origin of conflict and map tweets. Our current program ignores mentions of twitter handles (@name), but these handles can be used to identify users and get the ID of other tweets.
If we recognize a stream or influx of negative tweets targeted at a specific user, we hope to provide them access to a harassment hotline.