Inspiration: As native Houstonians, part of our team witnessed and experienced the damage wreaked by Hurricane Harvey firsthand. Throughout the fear, uncertainty, and chaos associated with that month of our lives, we developed a feeling that more could have been done to prepare for such disasters. We could have had better tools, better resource flow, and better communication platforms specifically designed for such times of crisis.
What it does: During Harvey, social media was not just a daily time sink that we took with our morning coffee. It became the single most useful way for us to keep track of friends and loved ones. Twitter was flooded with tweets sending out prayers and offering resources as well as desperate cries for help. Disasterrelief takes filtered tweets from a natural disaster. It categorizes them into tweets that are asking about food/water, being stranded, donations, power outages, clothing. It sorts out these tweets into tweets for help and tweets of support. Disasterrelief allows people to visualize the hotspots where help is needed after the impacts have occurred and it allows people to figure out how best to help their neighbors in an organized way.
How we built it:
- Split a dataset of Hurricane Sandy tweets into those from people who needed help and those who didn't need help but were simply offering thoughts, prayers, and commentary about the situation.
- Used Naive Bayes, KNN, Decision Trees, and Support Vector machines to figure out the model that gave the highest accuracy most efficiently in categorizing the data into help and non-hep Tweets
- Used Google Maps API to display Tweets by location. Displayed a separate marker by Tweet
- Further filtered tweets using decision trees into categories such as food/water, shelter, people without power, and donations
- Also filtered Tweets by state in order to show people what was needed closest to them
Challenges we ran into:
- Integrating google apps api with current framework and updating it with tweets
- Rerouting DNS through Google Cloud so as to work with IPV6 records
- Filtering through the tweets accurately
- We trained our dataset with roughly 3% of the size of the actual dataset, due to time and computer memory constraints. This gives somewhat artificially inflated values, due to the test sets being much smaller. (for reference, when we tried training the GNB classifier with the full dataset, the acccuracies were ~5% lower.)
Accomplishments that we're proud of:
- Able to successfully tag tweets with 80% accuracy using decision trees
- Able to use google maps api to integrate a visualization of hotspots
What we learned:
- Getting data is hard.
- There are a lot of react resources available that can be used to develop highly interactive websites with google maps.
What's next for Disasterrelief:
- Setting up help centers and resource flow paths based on Tweet updates in real time
- Better way to figure out who has been helped and who hasn't
- Include other forms of social media like Facebook