Inspiration
We were inspired to build this project because of the exponential increase of misleading news articles over the past few years. Towards the beginning of the pandemic especially, there was a lot of confusion as to who to believe when it came to news. Our goal is to change that by using AI to estimate the credibility of any given article.
What it does
Our project takes a link to an article and decides whether it is fake news or not. The user uses our website and pastes the link of the article they wish to verify. We have not yet set up a working neural net to do the classifications yet, but that is a goal for the future. If it were working, it would return a score of how reliable the source is.
How we built it
We built the site using a combination of Flask, HTML, and CSS. This allowed us to retrieve the article link from the user's input and will return a guess as to how reliable the site is once we get the neural net working. To scrape the news articles for their headings and text, we used a Python library called Beautiful Soup 4. To gather/preprocess data, we made a Python Notebook. We used a dataset from Kaggle that had around 20k samples of real and fake news, classified. https://www.kaggle.com/c/fake-news/data?select=test.csv We also used a pre-trained word2vec model to retrieve word embeddings for the articles. The source of this model was https://code.google.com/archive/p/word2vec/
Challenges we ran into
When scraping the article, it was difficult to only retrieve the body of the page, and not all of the text, since every news website formats its HTML differently. Another challenge was the neural net. We thought we would have time to have it fully fleshed out, but by the time the competition was drawing to a close, the team member who would have been responsible for leading that push ended up having other responsibilities to attend to. While our other team members have experience building models, it was too late to learn all of the requires skills to make a decent one.
Accomplishments that we're proud of
We are proud of learning new skills, such as webscraping, using CSS, manipulating data in Pandas, and attempting to solve an open-ended question such as what news is "real" or "fake". We were also proud that we were able to get word vectors for each sentence in the dataset. We are all beginners and felt like we gave this project our best effort, and we were satisfied that we were able to come up with a result that may take only a little bit more time to complete.
What we learned
We learned to solve the webscraping problem by making it so that the program only looks at text with class names that contain keywords such as "body". This approach worked, but still managed to pick up a few unwanted text from advertisements, etc. We also learned how to center text using CSS (hopefully) While we had experience in Pandas, we felt that we were able to expand upon that knowledge greatly by practicing our data preprocessing skills. Overall, each one of us pushed ourselves when working on our respective parts of the project, and we all came out more experienced because of it.
What's next for Fake News Detector
We plan on vastly improving our project in the near future. Our original goal was to build a Chrome Extension that can automatically assess the site's credibility, but it was more feasible to build a website in the time we were given. Besides that, we are going to build a model that takes in the word vectors we have gathered and classifies articles as fake or real news using those and other features.
Log in or sign up for Devpost to join the conversation.