Given the recent news and controversies surrounding President Trump and his accusations of certain news sources being "fake news", we wanted to make a web application where you can check whether a certain article is fact or fiction.
What it does
From the home page, you enter in the URL to a news article, and then the application will display whether or not the article is "True" or "Fake" and gives a percentage confidence for this classification.
How we built it
We built this project in Python with the Flask framework, NLTK and Scikit Learn. We started by compiling a large bank of true news articles (for example, CSPAN, New York Times, Reuters) and fake news articles (for example, abcnews.com.co, nationalreport.net, infowars.com). We trained our model from these articles, which gave us the most likely words and their frequencies to show up in true and fake news articles respectively. Then using the Newspaper API, we parse out an HTML page to its plain text article form and tokenize this body of text by sentence. Applying our fact checking analysis function to each sentence and taking the mode of these values gives us whether or not the article is true or fake, and taking the ratio determines the confidence.
Challenges we ran into
We needed a large training set, but to do so we had to write a script to scrape through a certain website (say, New York Times) and retrieve article content. This same challenge occurred when we needed to retrieve the article that the user enters in the search bar. This proved to be very tedious at first, however, with the use of the Newspaper API, it was greatly simplified.
Accomplishments that we're proud of
Achieving a decent accuracy rate (around 80-85%) with our testing set with our classifiers. Using the Newspaper API to efficiently retrieve article content from the entered URL.
What we learned
Web scraping article content from news websites is tedious and complicated, and that the Newspaper API is perfect to solve this problem.
What's next for Factable
Creating a Chrome extension that determines whether or not the article page you are currently on is true or fake news.