Truthify This

Inspiration

Social media in the modern day defines our relationships, what we do, and what we learn. With the ever-growing spread of fake news throughout social media, people are in danger of being manipulated at the whims of others without being aware of the fact. As a group of college freshman, we have all grown up in a social media-oriented world in which fake news has become the new norm. We decided to develop Truthify This in order to help people become more knowledgeable about what is fake news so that they can identify whether what they're reading is real or fake.

What it does

Truthify This is a Python-based web app that takes an article link, Facebook image, or YouTube video and determines whether the link contains fake news.

How we built it

For articles, we extract the article from the page using the Goose API. For Facebook images, we retrieve the page's HTML and then parse the image link from it. With the image, we apply optical character recognition via the Tesserocr API to extract the image text. Lastly, for YouTube videos, we extract the audio from the video and then transcribe it to WAV using the PyDub API. Then, we use IBM Watson to transcribe the audio file into text.

Following these steps for articles, Facebook photos, and YouTube videos, we use the text file we have and apply natural language understanding via IBM Watson in order to determine the relevance, emotion, and sentiment of the key components of the text. Then, we go through the entirety of the results of the natural language understanding step to determine which key components of the text contain words that indicate that a claim is being made. If the claim contains a strong emotional bias (anger, joy, sadness, fear, or disgust) and is relevant to the overall text, then we check whether it's fake news.

To check if a claim is fake news, we send a query to IBM Watson Discovery News that determines the average sentimentality of the claim throughout the top 100 most relevant results of its 17+ million document dataset. We compare this against the sentiment of the claim in the context of the original text to determine the absolute sentiment differential, which if greater than a certain threshold indicates that the claim is likely fake news.

Challenges we ran into

We ran into trouble with front-end web development as our team had little exposure to modern design principles. However, we were able to get past this by developing a clear plan as to how we would improve our web development skills in a short time-period. The biggest benefit came from spending time learning Bootstrap as we were able to boot straight into developing a much stronger UI than we were initially capable of. Some of the members of our team have experience with front-end development and others have experience with back-end development, however, no one has full-stack experience. Thus, we had to spend a significant amount of time learning how to connect our back-end to our front-end. After looking through various possible solutions, we decided to spend time learning jQuery AJAX with our Python Flask web application in order to connect the two and resolve our issue.

On a different note, we also ran into trouble ensuring that our fact-checking was as unbiased as possible. We wanted to avoid using a similar solution to current fact-checking websites (such as http://www.fakenewsai.com/) that base their models off of the reputation of the website as a whole as we found that many websites, even commonly trusted websites, had fake news in some of their articles. We spent a significant amount of time developing a plan as to how we would base our fact-checking on general opinion rather than reputation. After exploring several APIs, we found the IBM Watson Discovery News API that allowed us to search through their database of articles. We decided to take the average semantic score of the top one-hundred articles for each relevant claim from an inputted link as we thought this would be the best way to get a sample of the general opinion surrounding that claim while keeping run-time manageable.

Accomplishments that we're proud of

We're proud of developing our programming, problem-solving, and team-work skills together. This is our first time working together and although we all had different levels of experience, we were all able to contribute meaningfully to the project and learn from the others on our team. This was three of our members first of many hackathons and we're all proud of what we've made working together and we'll hopefully all continue to improve in the future as we move on from our freshman year and learn more and more throughout our education together at Columbia.

What we learned

The most significant new experience was combining the front-end and back-end components of our product. Prior to this hackathon, none of us had the experience of full-stack development with each of us only having experience in either front-end or back-end, however, we had to come together and build off of each other's strengths in order to figure out how to combine the two.

What's next for Truthify This

In the future, we want to include the ability to upload links to a larger amount of social media networks as currently, we support text articles, Facebook posts, and YouTube videos. Also, we want to make a Chrome plug-in of Truthify This that will allow you to determine whether what you're reading, watching, or looking at contains fake news. We also want to add a page to our website that keeps track dynamically of the level of fake news that a news website publishes in order to keep a leaderboard of the most trustworthy news sources so that anyone who visits our site can find out what are the most trustworthy sources at that moment in time. With our fake news detection being based on the content of the website rather than its reputation, we hope that we can reduce the level of bias as much as possible. Furthermore, we believe that we can further reduce the bias by increasing the amount articles that we use to determine the semantic score of a claim as this will decrease potential bias among the top 100 articles that we currently use.