Since media sources such as newspapers must draw attention to their articles to make a profit, they often portray events in extreme ways or interpret the facts in a way that appeals to their particular audience. Although these interpretations are not necessarily untrue, they focus on different aspects of the issue and therefore important information can be left out or overly emphasized. Differences between cultures or countries can also contribute to these exaggerated or underexaggerated facts, as it is in a paper's best interest to represent their country well. In particular, we were inspired by the media responses to the invasion of Ukraine, which is framed so differently between US and Russian news sources that it seems like a different event. This makes it very difficult for the average consumer, particularly those who pay attention to multiple sources, to determine what is fact and what is fictitious.

What it does

In order to determine the validity of statements from an article, we ask users to input links to two articles from different news sites and then compare their content. We chose to focus on the key words of the articles, allowing the user to select one to analyze, and our website presents the 5 most similar claims about the topic (suggesting a true fact), and the 5 least similar claims (suggesting misinformation or a significantly different interpretation.

How we built it

We used an open source library news-fetch, which was dependent on further libraries such as newspaper3k to scrape the news source from a given URL. The scraper generated data on the article such as the date published, the author, the article title, the article content, etc. Once scraped, we scanned the full texts of both articles and selected sentences that contained the chosen key word, comparing these sentences semantically using cosine similarity (a numerical measure of how similar words are) through the library NLTK. We integrated our back-end services (in Python) with our front-end services (HTML and CSS) using Flask, and hosted on Replit with a custom domain hosted at We specifically chose not to use any form of machine learning or neural network to address this issue, as the act of training these algorithms from a data set creates inherent bias towards sources, which defeats the purpose of determining "fake news".

Challenges we ran into

We ran into challenges at just about every step of the process. Initially, many of the open sources we tried had very poor documentation or were outdated, which led to our team struggling to complete even menial tasks. None of us had any experience with Flask or with hosting a website on our registered domain, so we spent quite some time figuring out how to do such tasks due to inexperience with this regard.

Accomplishments that we're proud of

We're proud to have developed a dynamic website using Flask that worked! Since we all lacked experience in many of the components we were working with, the fact that our website turned out the way we had planned is great. Each of us is also particularly proud of the sections that we personally did, as we all learned plenty along the way (and that staying awake for this long is difficult).

What we learned

As we mentioned earlier, many of the components involved in our site were areas that none of us had used before. We had no experience with Flask, domains and website hosting, or many of the libraries we used, and we also learned about integrating many different components of the site.

What's next for News Comparator (News Comparison)

On the front end, we could create a more accesible user interface that is up to standard with websites today, and transition to HTML5 instead of HTML. On the backend, we could utilize more robust algorithms to determine the semantic similarity instead of cosine similarity, giving us more precise results. More generally, there is also potential to add functionality for comparing multiple news articles at once, allowing the users to truly determine what is fact and what is fiction.

Built With

Share this project: