Disaster Watch on Azure

Motivation

Social media has played a key role in distributing information during disasters. People, both affected citizens and those outside the impact zone, and media outlets have used social media to collate and share disaster-related information during wildfires, earthquakes, floods, and tornados. This has created a well-established pattern: “a disaster strikes, and the crisis data collection begins”. For example, during the Calgary flood in 2013, people heavily used social media to post information, photos, and breaking news regarding the ongoing event. Besides citizens, Calgary’s official emergency responders, such as the Calgary Police Service and the City of Calgary, also used social media to broadcast safety-critical information and situation updates. So, both citizens and emergency response organizations have started to recognize the added value of information available via social media during disasters.

Challenges

There are some challenges when considering social media as an information source for disaster response. In particular, social media streams contain large amounts of irrelevant messages such as rumors, advertisements, or even misinformation. So, one major challenge to using social media messages like tweets is how to process them and deliver credible and relevant information to disaster responders and citizens. Another challenge relates to the amount of information that flows on social media and how to analyze them in real-time. Finally, social media messages are brief (e.g., 280 characters for tweets) and informal and, therefore, applying the methods that are used to process structured, long texts such as news articles to deal with them may lead to poor and misleading results.

Solution

Disaster Watch is a disaster mapping platform that collects data from twitter, extracts disaster-related information from tweets, and visualizes the results on a map. It enables users to quickly locate all the information in different geographic areas at a glance, and to find the physical constraints caused by the disaster, such as non-accessible river bridges, and take an informed action. Such information helps public and disaster responders (e.g., humanitarian organizations, disaster relief agencies, or local actors) answer the following questions:

When did the disaster happen?
Where are the affected areas?
What are the impacts of the disaster?

The answers to these questions provide spatial (where), temporal (when), and thematic (what) information about an event. The insights gained from analysis of such information can be of great value to decision-makers in different phases of a disaster (from preparedness to response and recovery).

Disaster Watch is built using free and open source software, open standards, and open data - TensorFlow 2.0, NodeJS and Express, VueJS, Vuetify, and Mapbox GL JS are used to create the system components. It collects tweets using Twitter’s streaming API, analyzes them using a deep learning model built by TensorFlow 2.0, and displays disaster-related tweets on a map.

What's new in Disaster Watch 2.0

For the first version of the application, the text classification model was built using TensorFlow 2.0 in Python and deployed as a Flask app on an EC2 instance. The model was able to process tweets in real-time and evaluate their relevance to flood, earthquake, hurricane, tornado, explosion, and bombing. In the new version, however, we added a new training dataset for wildfires and built and deployed the model using Azure ML services (see here for more details). The figure below shows the new Disaster Watch’s overall architecture.

architecture

The workflow starts with the data collection process. The backend API uses a keyword-based sampling approach to collect tweets using Twitter’s streaming API. In this context, a reference dictionary of disaster-related terms, developed by CrisisLex.org, was used as keywords. CrisisLex is a lexicon of 380 disaster-related terms that frequently appeared in relevant tweets during different types of disasters between October 2012 and July 2013 in the US, Canada, and Australia. The API then sends the tweet to a deep learning model, which analyzes textual content of tweets to evaluate their relevance to seven categories including flood, earthquake, hurricane, tornado, explosion, bombing, and wildfire. The relevant tweets are then sent to the geoparser, which extracts place names from the text and geocodes them. Finally, the results are sent to the frontend for visualization.

Text Processing

The first step in content analysis is to train a text classifier to remove irrelevant messages prior to any further analysis. A supervised deep learning approach is used in this project to perform the text classification task. The classifier used a set of annotated tweets provided by CrisisLex.org called “CrisisLexT6”. It contains 60,000 tweets from six different disaster categories, including flood, earthquake, hurricane, tornado, explosion, bombing, and wildfire, categorized into two main groups: on-topic and off-topic. To prepare the tweets for the training process, a set of standard text preprocessing operations are performed. First, all non-words, including URLs, user mentions, punctuations, white-spaces, and special characters, are removed. Afterwards, to avoid too many variables, capital letters are changed to lowercase and the most frequent place names are removed. Next, resampling is performed to minimize class imbalance. Each tweet is then split up into tokens in the tokenization process. The vectorization process is then carried out on the remaining tokens to create a set of feature vectors using word embeddings. In the end, the feature vectors are used to train the model using Keras's Sequential model API. Once unrelated tweets are filtered out and the on-topic ones are classified into six categories, the tweets are sent to a geoparsing tool called CLAVIN. Geoparsing refers to the process of identifying the implicit spatial information (e.g., place names) in the text and associating them with geographical coordinates. Finally, the results, including tweet id, text, category, model’s accuracy, place name, and geographical coordinates, are sent to the client for visualization purposes.

We encourage you to play with the web app live here, and explore our GitHub repositories to take a deeper dive into the code base: