VerifAI: Fake News Detector using BERT

Light Mode
Dark Mode

Inspiration

In light of recent events, there is a high concentration of media and social media being used to spread information to people across the globe and to the people who are directly affected by the conflicts happening. Making sure that people have the right information at the right times, including where a hospital may be located or where a road might be broken to redirect them to the right place, is extremely critical. With the advent of new technology and people promoting false information, creating a tool to benefit and aid these people in distress is something that we all felt was truly something we could dedicate ourselves and this hackathon towards.

Purpose:

We live in a world where the truth is constantly obscured by a veil of falsehoods, where every news article you read could be a fabrication designed to mislead and manipulate. Fake news isn't just misleading; it's a weapon used to sway public opinion, disrupt democratic processes, and deepen societal divisions. In response to this growing crisis, we created VerifAI, a powerful tool against the tide of misinformation.

What does it do?

Our hack, VerifAI, is a fake news detection model which utilizes the pre-trained BERT (bidirectional encoder representations from transformers) model and transfer learning. VerifAI uses Flask, a python web framework, to deploy a web app that allows users to check the likelihood of each line from a news article being fake. The app takes in a news article/heading input from the user and processes the input text by breaking it down into individual lines or sentences. Each line is then fed into the BERT model. BERT's natural language processing capabilities allow it to understand the context and semantics of each line, which is crucial for accurately determining the veracity of the information. Since BERT is pre-trained on a vast corpus of text, it already has a significant understanding of language. However, through transfer learning, the model has been further fine-tuned on datasets specifically related to news, misinformation, and fact-checking, enhancing its ability to detect fake news. For each line, the model assesses the likelihood of it being fake presented as a percentage with 0 being very unlikely to be fake and 100 being very likely to be fake. The app then displays the results to the user, showing which parts of the article might be misleading or false. This line-by-line breakdown helps users understand specific areas of concern within the article.

How we built it

We used various Python libraries, including NumPy, Pandas, PyCaret, Transformers (from Hugging Face), Matplotlib, and several modules from Scikit-Learn. These libraries are essential for data manipulation, model training, and evaluation. Then we performed data pre-processing, Split the dataset into training, validation, and test sets using the train_test_split. Then we loaded a pre-trained BERT model ('bert-base-uncased') and its tokenizer using Hugging Face Transformers. Then we had efficient training with batch processing. We then freeze the layers of the BERT model to prevent them from being updated during training. Freezing the layers of the BERT model ensures that the pre-trained knowledge is not lost during the training process. This step is crucial for transfer learning. Only the classification layers on top of the BERT model will be fine-tuned. Next we went on to define a custom neural net called 'BERT_Arch' that takes the BERT model, adds dropout layers, activation functions, and fully connected layers. This is designed for binary classification and the model uses a log-softmax activation for the output layer. The architecture includes dropout layers to prevent overfitting and employs a log-softmax activation function in the output layer for probability-based classification. We then trained the model, evaluated classification metrics (such as precision, recall, and F1-score) and finally performed the fake news prediction. We then set up a Flask web application using HTML to display the project.

Challenges we ran into

While working on this Flask application with a pre-trained BERT model, we came across some pretty interesting hurdles. The dataset quality was a big deal, and we had to make sure it wasn't biased or noisy, which could totally mess up the model's predictions. Dealing with the massive size of BERT models was a headache; they needed tons of memory and GPU power to run smoothly. Oh, and let's not forget about model latency – it could really slow things down, especially when a bunch of us were using it at the same time. We also had a really hard time importing and setting up the project specifically with pycaret because there was an issue with the version and that took up way too much time. Then there was the data preprocessing part. Real-world data is messy, and we had to whip it into shape so the model could understand it. Keeping the model up-to-date was a challenge, as was making sure our app was secure and user-friendly. We had to think about scalability, because what if this thing became super popular, right? Plus, explaining why the model made certain predictions was tricky since BERT is like this black box. And, of course, we had to be all legal and ethical with the news content. There were also a bunch of errors to catch and handle, and privacy stuff to think about. All in all, we learned a lot while tackling these challenges – it's not just about coding; it's about understanding the context and the real-world implications of what we're building.

Accomplishments that we're proud of

Successfully building and deploying a Flask application integrated with a pre-trained BERT model is a significant accomplishment. It demonstrates our ability to apply machine learning in a practical context and create a useful tool.

What we learned

Advanced ML and NLP techniques: We learned how to fine-tune a complex, pretrained model on a specific task. Deploying web applications using Flask, and designing a user-friendly interface.

What's next for VerifAI?

In the future we really want to fine-tune the BERT model and work with a larger dataset to understand more news articles. We were also thinking of the expansion of some of the features including more than just text like images. And possibly including user feedback to help refine the model to correct biases and improve accuracy. And we’d love to create a mobile application to make it easier for people to use to fix and gauge issues with fake news. We're determined to take our project to the next level and make a positive impact on the fight against fake news and misinformation. We're excited to see where this journey takes us and the opportunities it presents for learning and growth.