Spot the Bot - Changing News for Good
Twitter is saturated with irritating bot spammers, but they can be malicious as well. We wanted to make an attempt at identifying them and so making the platform less frustrating to use for all.
What it does
Rates a Twitter account as "bot" or "human" using 5 machine learning models. Users can test any Twitter username, with the UI reporting either "real" or "fake" with the number of models that agreed on the decision (i.e. either 5/5, 4/5 or 3/5).
How we built it
Data was obtained from the Twitter API on 9 different features:
- Whether the account has tweeted at all
- How similar the account's tweets are to each other
- Whether the account tweets in a scheduled pattern
- Proportion of tweets that contain links
- Frequency of posting tweets
- Proportion of tweets that are just single links
- Frequency of common clickbait phrases used
- Ratio of friends to followers
- Whether the account is verified
A feature vector was created by appending normalised versions of these values. 5 different machine learning models - Multinomial, Bernoulli, Logistic Regression, Linear Support Vector and Stochastic Gradient Descent - were then trained on a dataset of 3000 users, before the model was tested on another 3000 users. It reported 67% accuracy.
Challenges we ran into
The models took a long time to train (6 hours) as Twitter only allows so many requests within specified time periods.
Accomplishments we're proud of
Getting a simple UI (in the nick of time) to ensure that anyone can test their own account. Download the github repository and run ui.py to try it for yourself!
What we learned
How to query the Twitter API and how to train Machine Learning models.
What's next for SpotTheBot
Launching a website to host our UI so you don't have to download the entire repository to be able to test it.