Inspiration

With the many events that have turned 2020 into far from the vision we thought it would be and the array of fake and real news that plagues our headlines, we thought it would be applicable to create a fake vs. real news model to visualize so that we could not only technically learn a backend-> frontend product, but we could do so in a way applicable towards 2020.

What it does

The application is a machine learning model run via Flask, Python, and React to make it a web application with accessible UI. The user types in a headline and with a push of a button can check if it is fake or real news based on the model's prediction. It additionally mentions how we built it and has a link to a GitHub repo.

How we built it

We built it using a Multinomial Naive Bayes model after deciding against other models and turned it into a REST API via Flask and Python. We then made it into something to access on a UI with React as well.

Challenges we ran into

One challenge we first ran into was distinguishing what model to work with that trained and tested best. We tested Logistic Regression, Multinomial Naive Bayes, and Random Forest Classification as seen in our Jupyter Notebook file. The Multinomial Naive Bayes ended up performing best, based on the fact that we'd split our data into training and testing sets so the training could test on the testing. While this proved well for the data set that we had, we realized once we migrated the model to Flask and tested with our own inputs, the data set was very biased towards the 2016 election and our model wasn't the most robust to handle anything in 2020. In particular, Multinomial Naive Bayes classifies and predicts based on the probability of certain words appearing given a result in a dataset- it uses trends found in the dataset to predict what might be put in next. Had the world remained somewhat similar to 2016-2018 and the events in 2020 not unfolded in the way in which we did, this model might've been more successful. However, things such as COVID-19 could not have been predicted by our model based on probability of that term occurring, so we saw some failure there. We recognized the plethora of unexpected events could have played a role in the model's success on current articles, and realized that we likely could have picked a better model to handle the robust nature of the last four years. That being said, this was more of a learning for us and something we noted as being very interesting in pursuing this project in a 'post'-COVID era.

Accomplishments that we're proud of

We are proud of creating 3 models and finding which one was best based on solid statistics of accuracy to back it up. Additionally, we are proud of taking this model, making it a REST API, and creating a UI to visualize the model and its results. We are thus proud of the end-end completion of this project in the 36 hour timeframe we had.

What we learned

Each team member brought something to the table another didn't have. For instance, two of us were well versed in machine learning and were able to explain to the other two what was going on and why we picked each model and the reasoning behind picking the one we ultimately went with. The other two were well versed in frontend work with React and were able to explain turning this model into something we could visualize. As a result of the collaborative-heavy nature of our team, each of us was able to bring something to the table and take something else away.

What's next for FactCheck

We hope to find a new training dataset to better represent all news categories, thus branching out from political news. We also hope to use a new model that is better able to analyze a variety of news topics and handle predictions. Looking into models such as neural networks or other NLP frameworks/libraries would be useful for us.

Share this project:

Updates