Governments and agencies are struggling to organise efficient relief initiatives as natural catastrophes become more common, worsened by climate change. Natural language processing (NLP), machine learning (ML), and artificial intelligence (AI) can all aid. In times of crisis, Twitter has become a vital communication tool. Because smartphones are so common, users may use them to broadcast a real-time emergency. Also, Twitter has been utilised as an excellent platform for people to express their thoughts about the crisis, which can assist governing authorities comprehend the public's sentiments and take necessary action. A huge number of users uploaded information such as disaster damage reports and disaster readiness circumstances throughout the crisis response period, making Twitter a crucial social media platform for updating and accessing data. Mining emotive data effectively can help us better comprehend catastrophe response in a quick and straightforward manner.
What it does
So here, we have built a code which performs Sentiment Analysis on the labelled dataset, consisting of multiple tweets related to Natural Disasters. Here, we have considered the tweets labelled '1' as the negative tweets i.e., the tweets which talk about the negative effect a natural disaster has caused, and the tweets labelled '0' as the positive tweets i.e., the tweets talking about something positive which took place even in the midst of a natural disaster. We performed the code using the concepts of NLP and Supervised Learning.
How we built it
We used the different concepts of NLP like Tokenization, Stemming and Vectorization in order to find the important keywords so that we would be able to use the classify the positive and negative tweets. We even tried to extract the hashtags from each tweet and see the top 10 trending hashtags in the positive as well as negative tweets. After performing the same, we performed Binary Classification and we used 4 different classifiers for this. We then compared their performance with the use of the metrics F1 Score and created a chart in which we could see which Classifier gave us the best F1 Score. We found that Random Forest Classifier gave us an F1 Score of 91% and hence we went ahead and used this classifer to perform the prediction for the test dataset (Glaucoma.csv).
Challenges we ran into
Speaking about the challenges we ran into, we can say that the time constraint was a huge challenge since we had just two days to complete this project. Nevertheless, we overcame the challenge and submitted our file successfully! Another challenge which we faced was during the prediction process. It was quite difficult to incorporate our trained model in the Glaucome dataset to perform prediction. That part of the code is still a work in progress.