Fake News Linear SVC Model

Hack the Northeast Hackathon Project -- Predicting whether an article is fake news or trustworthy based on its title. Fake news has been in the news a lot since 2016. However, in the last few weeks, social media companies Twitter and Facebook have grappled with the idea of labeling false information on their sites. Twitter has started to flag certain tweets, but they are currently based on hard text within the tweet, or its done by hand. Facebook has said it will not attempt to label false information.

This inspired us to create a machine learning model to predict whether an article is fake news based on its title. This would allow Twitter or Facebook to check the title of an article being shared and determine whether to flag the tweet or not. This algorithm would have huge applications as the 2020 election gears up.

In creating our model, we cleaned and separated words from the title using the nltk library. Then, we used Term Frequency * Inverse Document Frequency to get floating-point values for the most common 300 words per title. Finally, we used sci-kit learn to create a linear support vector classifier model that predicts whether an article is fake news based on the title with 98% precision.

Built With

jupyter-notebook
nltk
pandas
python
scikit-learn

Updates

Seth Keim started this project — Jun 07, 2020 12:23 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.