During extreme events such as natural disasters or virus outbreaks, crisis managers are the decision makers. Their job is difficult since the right decision can save lives while the wrong decision can lead to their loss. Making such decisions in real-time can be daunting when there is insufficient information, which is often the case.

Recently, big data has gained a lot of traction in crisis management by addressing this issue; however it creates a new challenge. How can you act on data when there's just too much of it to keep up with? One example of this is the use of social media during crises. In theory, social media posts can give crisis managers an unprecedented level of real-time situational awareness. In practice, the noise-to-signal ratio and volume of social media is too large to be useful.

I built CrisisTweetMap to address this issue by creating a dynamic dashboard for visualizing crisis-related tweets in real-time. The focus of this project was to make it easier for crisis managers to extract useful and actionable information. To showcase the prototype, I used tweets about the current coronavirus outbreak.

What it does

  • Scrape live crisis-related tweets from Twitter;
  • Classify tweets in relevant categories with deep learning NLP model;
  • Extract geolocation from tweets with different methods;
  • Push classified and geolocated tweets to database in real-time;
  • Pull tweets from database in real-time to visualize on dashboard;
  • Allows dynamic user interaction with dashboard

How I built it

  • Tweepy + custom wrapper for scraping and cleaning tweets;
  • AllenNLP + torch + BERT + CrisisNLP dataset for model training/deployment;
  • Spacy NER + geotext for extracting location names from text
  • geopy + gazetteer elasticsearch docker container for extracting geolocation from locations;
  • shapely for sampling geolocation from bounding boxes;
  • SQLite3 + pandas for database push/pull;
  • Dash + plotly + mapbox for live visualizations;

Challenges I ran into

  • Geolocation is hard;
  • Stream stalling due to large/slow neural network;
  • Responsive visualization of large amounts of data interactively;

Accomplishments that I'm proud of

  • A working prototype

What I learned

  • Different methods for fuzzy geolocation from text;
  • Live map visualizations with Dash;

What's next for CrisisTweetMap

  • Other crises like extreme weather events;

Built With

Share this project: