Inspiration
Misinformation throughout social media, spread many times by harmful bots, which can have serious impact on the real world, as seen in the Covid pandemic.
What it does
Performs statistical and machine learning tests on any public Twitter account, giving the following results:
- Probability of the account being a bot
- Probability of account spreading misinformation/"fake news"
- Sentiment rating based on the account's tweets
How we built it
Bot Accounts
To find bot accounts, we used a Kolmogorov-Smirnov test n comparing the number of likes and retweets on the account's tweets to Benford's Law. This is in the idea that bots that are artificially created would follow each other, in an artificial way, such that they would not follow Benford's Law with precision.
Fake news accounts
For this, we used a machine learning algorithm (Support Vector Classification), trained with the PHEME dataset, which contains tweets from both misinformed and trustworthy sources. We were able to achieve an accuracy level of 85% on the test data.
Sentiment Analysis
We used a built-in ML algorithm from NLTK python library for analysing the type of words/phrases used, and therefore giving a result on the sentiment behind each tweet.
Challenges we ran into
Started using Twitter API, but we were limited by the number of requests per minute, so we switched to snscrape.
Accomplishments that we're proud of
Implementing both statistical analysis (Benford's Law) and Machine Learning in detecting fake tweets.
What we learned
Using APIs, Machine Learning, Collaborative Development, Git, Statistical methods (also Kolmogorov-Smirnov test), and managing large data sets.
What's next for Fake Tweet Detector
Improving statistical parameters to finetune detection thresholds and training with more up to date data.
Built With
- nktl
- pandas
- pheme
- pyqt
- python
- scikit-learn
- snscrape
Log in or sign up for Devpost to join the conversation.