IMPORTANT REMARK

  • Average cosine similarity between true and false news articles: 0.7975854665517808 Just got the result right after my submission, so this concretes my theory and I'll continue with the part I have in legend.ipynb on github! But yes, hopefully the results will be visible by the time judges view this project

Inspiration

True and false news have become increasingly similar in the recent years. Can we find a way to distinguish truth from the false with feature identification from a similar pair of news headlines? (and without a ML model by our side?)

What it does

We have a study for all the feature extraction from the dataset in main.ipynb. But I believe the key insight here is to look at legend.ipynb where we segregate news content that is similar and then train an ml model to understand key features which distinguish the truth from the lie.

How we built it

Built with <3 using Python, Data Viz, A lot of python libraries and some sql

Challenges we ran into

The lack of a GPU meant I couldn't run my cosine sim models on the dataset in due time. I tried implementing the same on Kaggle, but the resources were limited.

Accomplishments that we're proud of

This justified my problem statement that news has become increasingly more difficult to classify. Looked at a possible solution and its implementation

What we learned

  • The necessity of understanding of how to categorize information and bring out the truth from the lies.
  • How Important a GPU is in ML models

What's next for the project

  • Run the ML models through another week to hopefully generate better results and look through them.

Built With

Share this project:

Updates