Fake-Tweet-Visualizer

GIF
demo!

Inspiration

Modern elections are shaped by social media platforms like Twitter. Discussions that occur on the platform are of great importance to voter informedness, the policy of candidates, and the ultimate outcome of the vote. But like most corners of the internet, Twitter is rife with misinformation.

We wanted to build a dashboard to exemplify how much misinformation there is on Twitter!

What it does

Our goal was to create an interactive dashboard that displays the flow of truths and lies that stream through the platform each day, using NLP to flag tweets that may contain misinformation.

A visualization like this could serve independent election watchdog organizations to monitor the health of the election, be deployed by newspapers in a data journalism piece, or simply provided to the electorate for their consideration.

How we built it

Most fake news detection datasets provide full news articles. Since we wanted to classify tweets, we needed data that looked a bit more tweet-like. We used the Fake News Inference Dataset, which provides one-sentence statements from news articles labelled by Politifact as "fake" or "real". While these do not perfectly resemble tweets, they were closer than the other options we found. We used this dataset to build a model using binary Naive Bayes model, and created an application implementing the model and pulling tweets from a twitter stream. The application shows the news accuracy, number of tweets processed and an interface to interact with the tweets. By hovering over the rolling dots, we are able to see exactly what the tweet is. Additionally, the dots are also coloured to match the news accuracy widget to indicate if the tweet is true or false.

Tech Stack:

A binary Naive-Bayes model that does our ML work (notebook)
A flask backend to wrap this into an API, pushed with Docker to GCP (github)
A modern react frontend (github)
coffee

Challenges I ran into

Our baseline classifier, a binary Naive Bayes run on tf-idf transformed data, achieved accuracy of 64%. This was lower than expected, and pointed to the problem being rather difficult to solve. Through tuning, we managed to achieve an accuracy of 70%. We also implemented a simple neural network and spent many hours tuning the parameters, but were unable to beat this baseline.

Deploying the backend was not as easy as it should have been.

Accomplishments that I'm proud of

We still managed to get decent accuracy, which is great! We also learnt a lot about deploying ML models to the cloud

What's next for Fake-Tweet-Visualizer

Keep training and working on improving the performance of our ML Model
Add widget allowing users to input tweets/text found on the internet to check for truth/fake
Update the visualizer!

Built With

dockerfile
python
react

Submitted to

MAIS Hacks 2020

Created by

I worked on the front-end visuals and live tweet streaming service. Front-end is written in react, and the streaming service used the twitter API (which is new for me).

Kevin Cherry
I worked on the backend, wrapping our model in a Web API, and deploying the model to our cloud service. It was challenging using Docker and Kubernetes, but learnt alot about DevOps!

Preyansh Kaushik
coffee-fueled-dev
I contributed to backend development through linking and deploying the model to Google cloud services. It was a new experience working with Kubernetes but overcome the challenges and learned a lot!

Julie Tian
My contribution was to collect and preprocess the data and train the model. It was my first time diving deep into neural networks and working with Keras, which was very challenging. In the end I couldn't get my network to beat the baseline Naive Bayes classifier but I learned a ton that I hope to apply to my next NLP project!

Peter Ball