NewsReel

Inspiration

Given the extreme importance of being knowledgeable about what is going on in the world, but the prevalence of fake and unreliable news, we identified that there is a need for a better way to distinguish factual and accurate information presented in the news. Inspired by the NewsQ challenge, we chose to investigate the Coronavirus Pandemic in the United Kingdom because there is a lot of hysteria surrounding the virus which makes it a target for misinformation.

Design Document

Link: https://drive.google.com/file/d/1pTJ9m88R7vYW9w-dqvYAWBwB5jZeFq-i/view

What it does

We mainly aimed to create a strong backend service that is capable of rating any relevant news article. Based on a trained data set that we created, we could input a list of articles and assign each a legitimacy score. Several factors go into determining the legitimacy score of a given article, such as the number of quotes, the tone, the number of typos, the number of swear words, the number of links the article has, and the word count. To provide a basic visualization of our results, we have organized it on a webpage which neatly lists the ultimate rankings.

How we built it

We began by building a robust website parser which had the capability of analyzing a variety of websites. Meanwhile, we also developed a machine learning model that took an initial evaluation of websites and compared it to a new list of websites. Once we had our models built, we transferred the project from a local device to a server.

Challenges we ran into

Finding a standard technique for scraping a variety of pages
Ensuring out local changes were reflected on our server
Determining why there were false positives that led to legitimate articles being ranked too low
Managing multiple versions of models and output files

Accomplishments that we're proud of

Hosting on a unique Domain Name
Scraping from a variety of websites
Implementing a neural network
Ability to maintain strong collaboration even virtually
Putting together a wide combination of web technologies
Using IBM Cloud’s Watson Tone Analyzer to determine the tone
Setting up the Ubuntu Virtual Machine Service using Azure Cloud Services

What we learned

The most valuable lesson we learned from this project was looking at all the requirements. There were several times during the development process where we found ourselves referring back to the directions to ensure we address all aspects of the project. Another important lesson we learned was how there are two sides to a measured outcome. For example, an article may have very little quotes so at the surface level it may seem unreliable. However, we learned that it is also important to consider that an article may be a primary source that does not need to have quotes in order to provide legitimate information.

What's next for NewsReel

Fact checker API by google
Larger dataset more training so more accurate
More criteria for analyzing a web page

Link to CSV with rankings

Link: https://drive.google.com/file/d/1Z4g7Jyfp0fRAGutTgdcjmECx9_YbtaYJ/view?usp=sharing

Built With

Submitted to

HackGT 7
- Winner IBM: The Community Response to COVID-19
- Winner NewsQ for Social Good

Created by

I worked on webscraping using BeautifulSoup in Python, parsing the page to extract the factors that the model trains on, and outputting that data in to a JSON file. I also created the front-end of the website using HTML, CSS, and JavaScript by reading the outputted data and scores from the neural network.

Megan Dass
Ignacio Di Leva
Sneha Roy
Udit Subramanya

Updates

Ignacio Di Leva started this project — Oct 18, 2020 06:30 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.