IDIR Lab Machine Learning Challenge

Inspiration

We were motivated by the need to tackle misinformation by identifying statements that should be checked for accuracy. Misinformation can confuse public discussions, so it's important to highlight statements that might be misleading or important enough to verify.

What it does

Our project creates a machine learning model that reads sentences and decides if they need to be fact-checked. This helps focus efforts on statements that could impact public opinions or decisions, preventing misinformation from spreading.

How we built it

We started by cleaning up a set of sentences that were marked as needing a fact-check or not. We used a powerful tool called RoBERTa from the Hugging Face library, which is designed to understand and analyze text. We trained this tool to recognize which sentences should be fact-checked by feeding it examples and making adjustments to improve its accuracy.

Challenges we ran into

Limited Computer Power: We faced challenges with not having enough computational power to run our model smoothly, especially on free platforms like Kaggle.

Model Overfitting: We had to adjust our model carefully to make sure it didn’t just memorize our training data but could actually apply what it learned to new, unseen data.

Unbalanced Data: Our data had more examples of one type than another, which made our model biased at first.

Accomplishments that we're proud of

Successfully Training the Model: We managed to train our model to predict with good accuracy, despite the limitations.

Improving Data Processing: We developed a strong process for preparing our data, which helped improve our model’s predictions.

Solving Technical Problems: We overcame several big challenges related to running advanced models with limited resources.

What we learned

This project taught us a lot about how to use advanced machine learning techniques for analyzing text. We learned how to manage and use computer resources more efficiently, how important it is to prepare our data well, and the ins and outs of training a complex model.