We wanted to know how many of Hillary Clinton's and Donald Trumps Twitter followers were bought and how many people were actually their followers.

What it does

It scrapes user data from twitter and based on that trains a classifier which can detect an actual twitter user from someone who is part of a collusion network and gets paid to tweet.

How We built it

Using the Tweepy Python library to scrape Twitter data, and Matlab to build and train the classifier and a lot of Redbull and Monster Just to keep us going.

Challenges We ran into

Accounting for Twitters rate-limiting API, as well as ensuring data is collected in a correct manner. Also the other guy didn't know how to code so that was a challenge in itself (but he knew maths, so that was good).

Accomplishments that We are proud of

Creating a classifier with a 98.5% accuracy (the current cutting edge algorithm manages a 99% accuracy) is something we are extremely proud of. Managing to code continuously for 30 hours... that's something to be proud of.

What We learned

We learned how to add numbers. We also learned that 580 minutes approximates to 10 hours. And if it's 5am and the deadline is at 2, there's no way you can finish the project on time, unless you parallelize it.

What's next for Are you an Egg?: Twitter Fraud Detection

We will most probably make the overall detector statistically more robust and train and validate it over a much bigger dataset.

Built With

Share this project: