In the past year, Facebook removed nearly 7.1 million fake news posts, many of which included links to articles that were designed to confuse and mislead the masses. On top of that, just this past February, a study was conducted to gauge distrust of media sources world wide and found that 29% and 28% of adults trust media sources in the United States and the United Kingdom respectively. With the growing concern of fake news, the upcoming election, and general distrust of mass media, team JAVE has developed a way to determine if a news article is truthful or full deceit.

What it does

The application is a website, where users can upload articles from various news sources. The website using Natural Language Processing (NLP) will discern whether or not the article is untruthful based on a pre-trained model of words that suggest that an article may be fake. Our team took into consideration that our model may not be the most accurate, and decided to add a user consideration as an added heuristic. The application will take the url of an article picked by a user and apply the initial NLP heuristic to the article. Next, the users can upvote or downvote articles depending on the sentiment they may feel on the article. Using a unique algorithm combining both heuristics the website outputs a score thus ranking the article. Articles with a higher score bubble to the top where those with a lower score fall to the bottom.

How we built it

We have two components: a front and a back end. The front end is a simple HTML, CSS and Javascript implementation. The back end is a Python Flask server, using Gunicorn. We designed a REST API to interface between the front and back end. We also store a database on the Python server to store all of the articles. We also use PyTorch and BERT, a common NLP model, to train and predict the authenticity of an article. We utilize Flask in tandem with PyTorch to do real-time predictions.

Challenges we ran into

We ran into an issue of technical jargon being difficult to parse in terms of NLP. We performed very well in regards to Politics since our dataset mostly consisted of political news pieces. We struggled a lot with scientific articles since they often included very specific jargon. We also ran into the issue of training speed when training the models since they were pretty heavy weight. In addition, we ran into some networking issues with visibility on the network for our Flask server, but we fixed that quickly.

Accomplishments that we're proud of

We are very proud of having a NLP algorithm that was able to predict fake news with a test accuracy of 98.7% on a test data set from Kaggle. We are also very proud of being able to link a very aesthetically pleasing frontend with a production ready serve in the back. It’s connectivity through multiple heuristics like voting and NLP really makes this project special.

What we learned

We learned several ideas in NLP, especially about how to handle different jargons. We also learned about networking through servers and different channels of information. We got a lot better at several aspects of the front end, especially modular design.

What's next for Newsworthy

Our ML based fake news detection system shows that with careful training and targeting of specific fields we could do a sufficient job of parsing which articles are more or less biased. This system could certainly be improved to become a more generalized model that could detect bias in a number of different fields that are currently challenging like scientific articles. With the election year coming up, we believe being able to tell Americans what is possibly fake news is critical to preventing misinformation. As a result, we’d like to make this a more widespread idea for news sources to adopt.

Share this project: