Your fake and biased news evaluator
With the growing concern over the veracity and bias of news sources, as well at the rise of "fake news" accusations, we decide that we wanted to create something to programmatically show the trustworthiness of a website. Using Markov chains and machine learning, we've created a program that reads a news story and evaluates it based on past data in an attempt to categorize the article's veracity and political leanings.
Using a combination of web scraping and the NewsAPI, we gathered data from sources across the political spectrum, including The New York Times, Bloomberg, and Breitbart News. The more similar the chains for an article were to one of these sources, the more we classified them as left, center, or right, respectively.
Disclaimer: This program is not meant to forward any political agenda. It is simply a statistical analysis of the word usage of news media. The members on our team range across the political spectrum.
What it does
The program uses an API (https://newsapi.org) and Python to ingest and classify the data by news source. The resulting text is passed to the Markov chain generator, and then to a program which compares the generated chains to chains from articles with known biases. The program then gives its best guess as to the political leanings of the article to give the reader a better idea of the trustworthiness of an article.
How we built it
We used a combination of Python and C# to gather web data and build the learning component, respectively.
Challenges we ran into
Some small disagreements as to how to proceed in regards to the choice of a programming language, but because we decide to make our workflow IO-based, we were all able to use our preferred language without having to worry about inconsistencies. Web crawling proved more difficult than we expected. There was an issue with unicode characters in input. And we often had to manually classify articles for truth data for the program.
Accomplishments that we're proud of
We built a working proof of concept that shows that, with more data, our program would be able to correctly classify most articles based on political bias.
What we learned
We learned how to efficiently divide labor, and that, while we have by no means completely solved the problem of detecting biased sources, we were able to take some major steps towards doing so.
* Python - for data gathering * C# - for Markov chains and evaluation
Powered by NewsAPI.org (https://newsapi.org/)