Given the extreme importance of being knowledgeable about what is going on in the world, but the prevalence of fake and unreliable news, we identified that there is a need for a better way to distinguish factual and accurate information presented in the news. Inspired by the NewsQ challenge, we chose to investigate the Coronavirus Pandemic in the United Kingdom because there is a lot of hysteria surrounding the virus which makes it a target for misinformation.
What it does
We mainly aimed to create a strong backend service that is capable of rating any relevant news article. Based on a trained data set that we created, we could input a list of articles and assign each a legitimacy score. Several factors go into determining the legitimacy score of a given article, such as the number of quotes, the tone, the number of typos, the number of swear words, the number of links the article has, and the word count. To provide a basic visualization of our results, we have organized it on a webpage which neatly lists the ultimate rankings.
How we built it
We began by building a robust website parser which had the capability of analyzing a variety of websites. Meanwhile, we also developed a machine learning model that took an initial evaluation of websites and compared it to a new list of websites. Once we had our models built, we transferred the project from a local device to a server.
Challenges we ran into
- Finding a standard technique for scraping a variety of pages
- Ensuring out local changes were reflected on our server
- Determining why there were false positives that led to legitimate articles being ranked too low
- Managing multiple versions of models and output files
Accomplishments that we're proud of
- Hosting on a unique Domain Name
- Scraping from a variety of websites
- Implementing a neural network
- Ability to maintain strong collaboration even virtually
- Putting together a wide combination of web technologies
- Using IBM Cloud’s Watson Tone Analyzer to determine the tone
- Setting up the Ubuntu Virtual Machine Service using Azure Cloud Services
What we learned
The most valuable lesson we learned from this project was looking at all the requirements. There were several times during the development process where we found ourselves referring back to the directions to ensure we address all aspects of the project. Another important lesson we learned was how there are two sides to a measured outcome. For example, an article may have very little quotes so at the surface level it may seem unreliable. However, we learned that it is also important to consider that an article may be a primary source that does not need to have quotes in order to provide legitimate information.
What's next for NewsReel
- Fact checker API by google
- Larger dataset more training so more accurate
- More criteria for analyzing a web page