A lot of research has gone into detecting fake news in the past few years but determining whether an article is fake or not depending on the content alone is almost impossible.

What it does

  • Defake evaluates the likelihood (rather than binary classification) of an article being fake depending on:
  1. Article Sentiment: Whether the article is highly emotional or uses anger/emotion packed phrases.
  2. References Quality: The amount of trusted websites the article has referenced opposed to un-trusted ones.
  3. Website's User Feedback: Feedback collected by Defake on whether a website is a trusty news source.
  • Defake combines all three metrics in an attempt to find a coloration between these metrics and the article being fake or biased. Defake should then pass the data points to a deep learning model to get how likely is the article to be fake.
  • Defake further prompts the user with the sentences having high sentiment included in the article.

How I built it

I have used python and Beautiful soup to extract articles content and Django to be build a web app hosting the service. I have then used GoogleCloud Natural Language API to analyze an article's content and would use Pytorch in the future to complete and train a regression model.

Challenges I ran into

There was a problem getting gcloud NL API to run on my local linux environment so I switched to windows after several hours of trying to run the service. It wasn't clear how to know if a referenced website is trusted or not, so I used Amazon Alexa's world's top 50 websites under News category. However, I'll probably be using more accurate metrics to determine a website's quality in the future.

Accomplishments that I'm proud of

Tackling a considerably ambiguous problem. Finishing a significant amount of the required work within the time given and while being a Solo Hacker.

What's next for Defake

Preparing a big dataset with real, fake and biased news articles and training a deep learning model based on this data. Having a bigger user base to enhance websites' rating accuracy.

Share this project: