IMPORTANT REMARK
- Average cosine similarity between true and false news articles: 0.7975854665517808 Just got the result right after my submission, so this concretes my theory and I'll continue with the part I have in legend.ipynb on github! But yes, hopefully the results will be visible by the time judges view this project
Inspiration
True and false news have become increasingly similar in the recent years. Can we find a way to distinguish truth from the false with feature identification from a similar pair of news headlines? (and without a ML model by our side?)
What it does
We have a study for all the feature extraction from the dataset in main.ipynb. But I believe the key insight here is to look at legend.ipynb where we segregate news content that is similar and then train an ml model to understand key features which distinguish the truth from the lie.
How we built it
Built with <3 using Python, Data Viz, A lot of python libraries and some sql
Challenges we ran into
The lack of a GPU meant I couldn't run my cosine sim models on the dataset in due time. I tried implementing the same on Kaggle, but the resources were limited.
Accomplishments that we're proud of
This justified my problem statement that news has become increasingly more difficult to classify. Looked at a possible solution and its implementation
What we learned
- The necessity of understanding of how to categorize information and bring out the truth from the lies.
- How Important a GPU is in ML models
What's next for the project
- Run the ML models through another week to hopefully generate better results and look through them.

Log in or sign up for Devpost to join the conversation.