It is to no one’s surprise that fake news is rampant across the internet. Thankfully, there has been increasing interest in thwarting this wave of misinformation through the help of machine learning. Kaggle competitions on fake news detection have displayed impressive ranges of accuracy on the test datasets. However, it is without a doubt very difficult to detect fake news on inference and label it as such. Also, this raises the question of thresholds. If the model’s softmax yields a value that is somewhere in the middle can we confidently say to people that a certain news article is fake? Furthermore, can we provide a label to people about certain news articles without really fact-checking the content?
That’s why we shifted our approach to do something no one has ever done; gamify news itself.
What it does
Our website aggregates news from various media sites and uses a PyTorch-based neural net model to classify articles as fake or real. This model is trained on a fake/real news dataset obtained from Kaggle. The model’s prediction is shown to the user, and user input is also taken to measure users’ agreement with the model. Articles can be sorted by genre and date.
I think this revolutionizes how we view news in two folds.
Firstly, we are actually stepping away from personalizing data to give a more objective view of how an individual may approach new information. With pied paper you are not simply given true news. We are giving the user the power to decide whether an article is true or fake based on their interpretation, AND then giving them feedback. Even then we are not telling the users that they are wrong or right. We are simply displaying information about how other people think and what machines think. This way I think we can more naturally engage users to become more thoughtful in how they approach new information.
Secondly, we are incentivizing users to read the news more carefully and thoroughly because we have essentially gamified news. At the end of every article, you are being tested for your objectivity and are compared with everyone else.
Furthermore, I think the data obtained from the dataset can be fed back into the ML dataset as some sort of bias to help build a better model that can detect fake news on inference.
How I built it
We used TorchText for language preprocessing ( padding news articles and such). Then using AWS Sagemaker we hosted the .pth on an endpoint which can be accessed by AWS gateway through AWS lambda.
We used Express Js along with Node to orchestrate feeding and fetching articles from the AWS Sagemaker endpoint. We used Postgres to store the retrieved data in an SQL format so that categorization and searching were made much quicker.
We used React JS to design the user interface, Redux to handle state management and Material UI for css theming. NewsAPI was used to fetch the news themselves.
Challenges I ran into
- Optimizing the model
- Designing the front-end to give a feeling of choice to users
- Using AWS Sagemaker to host Pytorch endpoint
Accomplishments that I'm proud of
We have a pretty robust prototype running at piedpaper.net! Tell us how you feel about the website and interact with it as much as possible.
What I learned
We learned as a team how to integrate an existing PyTorch model into something that can be used on inference time to create a hands-on machine learning experience. We also learned that ON AVERAGE, fox news seems to be the news outlet with most of their news labeled as 'fake'.
What's next for Pied Paper
*Create a better data visualization and metric for the true/false values of the users. ( Make it more interactive !)
*Improve our machine learning model for better accuracy. And perhaps use our user input as some kind of a bias in the algorithm.
*Create a dedicated user experience by allowing them to log in and record their scores.