Testing out the UI
Returned result
Values of feature vectors

Spot the Bot - Changing News for Good

Inspiration

Twitter is saturated with irritating bot spammers, but they can be malicious as well. We wanted to make an attempt at identifying them and so making the platform less frustrating to use for all.

What it does

Rates a Twitter account as "bot" or "human" using 5 machine learning models. Users can test any Twitter username, with the UI reporting either "real" or "fake" with the number of models that agreed on the decision (i.e. either 5/5, 4/5 or 3/5).

How we built it

Data was obtained from the Twitter API on 9 different features:

Whether the account has tweeted at all
How similar the account's tweets are to each other
Whether the account tweets in a scheduled pattern
Proportion of tweets that contain links
Frequency of posting tweets
Proportion of tweets that are just single links
Frequency of common clickbait phrases used
Ratio of friends to followers
Whether the account is verified

A feature vector was created by appending normalised versions of these values. 5 different machine learning models - Multinomial, Bernoulli, Logistic Regression, Linear Support Vector and Stochastic Gradient Descent - were then trained on a dataset of 3000 users, before the model was tested on another 3000 users. It reported 67% accuracy.

Challenges we ran into

The models took a long time to train (6 hours) as Twitter only allows so many requests within specified time periods.

Accomplishments we're proud of

Getting a simple UI (in the nick of time) to ensure that anyone can test their own account. Download the github repository and run ui.py to try it for yourself!

What we learned

How to query the Twitter API and how to train Machine Learning models.

What's next for SpotTheBot

Launching a website to host our UI so you don't have to download the entire repository to be able to test it.

Built With

css
html
nltk
python
scilearn
tweepy

Submitted to

Hack the Burgh 2018

Created by

Researched and planned out which features of the user/their tweets we would use and wrote code for extracting several of them.

Combined the different elements into a feature vector and sorted the final program's output.

Angus Shaw
Current CS student and hackathon enthusiast
Wrote the python files for adding features to form a dataset of bot identifier.

Contributed in setting up the learning model for the project.

Created the gui to enter and display data.

Sameer Karim
I helped coordinating the the team and come up with the structure of the project. Researched to find out what features should we extract from the data.

Searched for the training and testing dataset. Trained the learning models, and hooked them up with the feature builder and the data.

Daniel Biro

Updates

Angus Shaw started this project — Mar 11, 2018 07:56 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.