Inspiration
Twitter (or X) is a mess of information, and that makes it challenging for the average user to decipher if a tweet is being dramatic, or talking about a disaster. The goal of this project was to bring clarity to user by letting them input a tweet into the prototype, and then have the machine learning model tell them if its referring to a disaster or not.
What it does
This model takes in a dataset from Kaggle that contains a series of tweets, and labels if they are refereeing to a disaster. It then perform text preprocessing, before sending the data off to three different classification machine learning models. The best model is selected, then the user can input tweet data into it, and it would determine if the tweet is referring to a disaster with 80% accuracy.
How we built it
We built this model using Google Colab. We first found a dataset that would match the problem we wanted to work on. We then performed a exploratory data analysis to learn more about our data. From this, we used text preprocessing to clean up the data set. Three models, a ridge classifier, naïve bayes classifier, and SVM classifier were trained. The naive bayes was the best, and was selected to be the backbone of the prototype. A text based prototype was created were the user would input tweet data, and the model would classify it.
Challenges we ran into
Time. Time was the biggest challenge as it stopped us from implementing more complicated ideas. We had initially envisioned the user passing a twitter (or X) URL, and the prototype would scrape it for the relevant information, to then be able to predict off of it. Also, this was coded during a horrendous storm, and we had to deal with multiple power outages.
Accomplishments that we're proud of
We're proud that our prototype works with 80% accuracy from working on it for less than 24 hours.
What we learned
For past machine learning project, the team has primarily dealt with using numerical data. However, since this dataset was all text based, it was interesting to learn the techniques needed to deal with text-based data. Primarily, we learnt about removing stop words and pattern removal.
What's next for Tweet-Based Natural Disaster Prediction Tool
Improving the models accuracy, and adding a GUI to make the interface more user friendly.
Log in or sign up for Devpost to join the conversation.