Inspiration

Misinformation throughout social media, spread many times by harmful bots, which can have serious impact on the real world, as seen in the Covid pandemic.

What it does

Performs statistical and machine learning tests on any public Twitter account, giving the following results:

  • Probability of the account being a bot
  • Probability of account spreading misinformation/"fake news"
  • Sentiment rating based on the account's tweets

How we built it

Bot Accounts

To find bot accounts, we used a Kolmogorov-Smirnov test n comparing the number of likes and retweets on the account's tweets to Benford's Law. This is in the idea that bots that are artificially created would follow each other, in an artificial way, such that they would not follow Benford's Law with precision.

Fake news accounts

For this, we used a machine learning algorithm (Support Vector Classification), trained with the PHEME dataset, which contains tweets from both misinformed and trustworthy sources. We were able to achieve an accuracy level of 85% on the test data.

Sentiment Analysis

We used a built-in ML algorithm from NLTK python library for analysing the type of words/phrases used, and therefore giving a result on the sentiment behind each tweet.

Challenges we ran into

Started using Twitter API, but we were limited by the number of requests per minute, so we switched to snscrape.

Accomplishments that we're proud of

Implementing both statistical analysis (Benford's Law) and Machine Learning in detecting fake tweets.

What we learned

Using APIs, Machine Learning, Collaborative Development, Git, Statistical methods (also Kolmogorov-Smirnov test), and managing large data sets.

What's next for Fake Tweet Detector

Improving statistical parameters to finetune detection thresholds and training with more up to date data.

Built With

Share this project:

Updates