Recently, there have been concerns on the truthfulness of news companies and what their best interests are. Misinformation, biased news, and hate speech are more prevalent now than ever. In the quest for social good, we have built a news ranking and recommendation system focused on politics in a foreign country--India. (We picked politics in India for the purpose of our demo, but the models and algorithms work on any type of news from any country) Rising political unrest and biased Indian news companies leave doubt in the people's minds, so we wanted to solve this problem by thoroughly analyzing news and presenting users with the highest-quality news possible.
What it does
Cleannews is a web application which is focused on providing unbiased, verified, and analyzed news. Recently, there have been concerns on the truthfulness of news companies and what their best interests are. In the quest for social good, we have built a news ranking and recommendation system focused on politics in a foreign country--India. (We picked politics in India for the purpose of our demo, but the models and algorithms work on any type of news from any country) Rising political unrest and biased Indian news companies leave doubt in the people's minds, so we wanted to solve this problem by thoroughly analyzing news and presenting users with the highest-quality news possible.
We do this by filtering our news articles to native Indian publications about politics, determining the fake news, bias, and clickbait probabilities based on the content of the articles, utilizing Tensorflow to present articles with diverse viewpoints and eliminate articles with hate speech and offensive content, and using this data and corresponding weights to rank our news articles in terms of quality. Finally, we perform sentiment analysis to contextualize articles to users, keyword analysis to scrape the key terms, and present all of the information in a clean, concise, and readable way.
How we built it
First, we use the Bing News API in order to aggregate news articles. This API allows us to focus on polishing and innovating on existing search engines and news ranking algorithms, which allows us to focus on providing a better experience to users who are likely to be using the same search engines as we are using. *We focus on getting articles directly from Indian publishers rather than American publishers writing about Indian politics in order to reduce bias. After retrieving the articles, we analyze each article for signs of fake news which severely undermine the quality journalism we strive to uphold. *To do this, we use newspaper.js to harvest article data and pull the content and important keywords into our system for further analysis.
After determining the news articles we've aggregated are not fake, we analyze and verify that the articles in question are not clickbait. This is done in conjunction with our fake news analysis with the help of multiple pre-trained models repurposed for our tasks. Afterwards, we use a sentiment analysis model to identify the sentiment of an article. We use this sentiment to detect biases of the author towards specific subjects and the author’s political preference which is incorporated in our final analysis of all of the individual algorithms we developed to polish the existing ranking algorithms. The goal of this sentiment analysis is to contextualize the article that readers may click on. Finally, we use a custom Tensorflow backend to detect hate speech and other languages which we consider to render an article as nonconstructive and written only to incite violence. The aggregate algorithms combine our results from the previous tests using a series of weights to properly create a custom threshold that indicates which news can be considered legitimate and which news is illegitimate.
Challenges we ran into
We spent a lot of time creating a workflow that enabled us to learn new technologies and create a web service that people would be able to use. Some challenges were connecting to Microsoft Azure to use the Bing API, Azure App Services, and working to deploy the website such that load time was minimal. We also faced challenges with adding custom machine learning models to have better results. Another challenge was dealing with the computation limits of running our models on the CPU and using the free tier of Azure. Overall, we were able to solve these challenges and build a platform we love.
Accomplishments that we're proud of
We're proud of being able to build an end-to-end service that helps people who are interested in learning positive journalism. We strive to create a safe place to learn about clickbait, fake news, and hate speech that can be found within common news articles. We're proud of the website we've built and it is amazing to see our algorithms work on news articles and produce tangible and verifiable results.
What we learned
We've learned that creating quality journalism is tough and takes a lot of time and effort, and news verification is a much needed service in this day and age. We also learned a lot about all types of ML-based analysis as well as about technologies such as Azure, Tensorflow, Node, and Pug.
What's next for cleannews
We hope to continue Cleannews and build a better service for everyone. We restricted our current application to one area of news in one country for the purposes of our demo, but after the hackathon we hope to broaden the scope and make it a usable tool for everyone.
Design Document: here
Data Rankings Spreadsheet: here
Note: The website is currently not hosted because of server costs.