Inspiration
With global climate change and industrialization today, the frequency and severity of natural disasters continue to grow exponentially with each year affecting life, property and infrastructure. In times like these, satellite imagery has always been a disaster responder's best friend, providing key insight into the environment that they're "walking into" to render aid. However, despite powerful map technology existing today, it is very time-consuming to make out different features and landmarks (such as buildings and roads) from such imagery, because satellite images are subject to a range of external factors that are not in one's control.
Especially for time-sensitive operations, you need to have real-time satellite imagery to assess the damaged environment around you. This goes hand-in-hand since the better insights one is able to draw from the satellite photos, the quicker you're able to render aid to the affected.
This presents a problem, however. Given a satellite image, your decision process might look similar to:
- first, are there roads in the image?
- is this little thin, squiggly line on the map a road, or?
- if it is a road, can one safely cross that road?
- if not, what are the alternative routes I can explore to get from
AtoB?
This can very quickly become a bottleneck in situations where rapid response is required and the difference of just a minute or two means several lives lost.
Atlas is such a system that solves this problem so concerned responders can focus on what matters most -- rendering aid.
What it does
Atlas is an AI model that:
- extracts road features from real-time satellite images
- annotates damaged roads from these features
- computes various metrics to provide speed estimates and travel-time predictions of all the roads in the image
Model Architecture
Atlas is based on a state-of-the-art CNN model architecture called a U-Net. The model, inspired by the winning algorithm for the SpaceNet 3 challenge, was modified to use a ResNet34 backbone encoder with a U-Net inspired decoder. Additionally, skip connections were included for every layer of the network. The best model performance was observed when the focal loss function was used with the Adam optimizer.

The output of the model is a 512x512xn segmentation mask that we subsequently process into road vectors. Graph extraction algorithms are applied, and we close small gaps and spurious connections by mask refinement procedures.
Here, n = number of speed bins applied (like 10-20mph, 20-30mph, etc).
The final output is a road skeleton mask with speed-as-colors overlayed on top of such skeleton.

The Data
For building the model, I made use of two public open datasets. After experimentation, I settled on SpaceNet DigitalGlobe 16-bit satellite imagery (30 cm/pixel) as well as the USGS 3DEP LiDAR Dataset both available on the AWS Registry.
In the SpaceNet Challenge Dataset, the imagery covers around 3000 sq. km, and nearly 8000 km of roads are labelled. Training images are tiled into 1300x1300 pixel chips. As such, they are massive and perfect for this project since one of the goals of this project was to understand the limits to which Gaudi's HPUs could be stretched. This dataset also provided GeoJSON data with road speed data as masks which I made use of during training.
For the USGS LiDAR Dataset, however, data was in the form of 3D point clouds. As such, during preprocessing, I made use of a script that mapped and converted said point clouds to pixel locations. Since this dataset did not provide similar GeoJSON data to the SpaceNet Challenges, I could not train the model in the same way I would for the first dataset. So, I settled on using the USGS LiDAR dataset to improve my model's ability to generalize segmentation masks from raw satellite images, and leave its ability to predict road speeds and travel time to the SpaceNet data.
I applied various degrees of pre-processing (such as converting the image from 16-bit to 8-bit, targeting select bands etc) on this data, and made this publically available for use as a Kaggle Dataset (~10.3k images and masks).
# Download the dataset (swap out my expired Kaggle credentials with yours)
$ python atlas/download.py
Training
The model was trained on a Amazon DL-1 Instance using the Habana® Deep Learning Base AMI (Ubuntu 20.04) (see Github repo for further information).
During training, I made use of distributed training. As such, I was able to train my model on all 8 Gaudi accelerators available in the instance. Additionally, I also made use of Habana's torch.apex.hmp to employ Mixed Precision on my datasets. Distributed training via PyTorch's DistributedDataParallel was made possible by PyTorch Lightning's DDPlugin strategy. I swapped out using Lazy Mode toward the end as I kept running into irrecoverable errors.

Results

Challenges
As someone completely new to the AWS ecosystem and Habana Gaudi Accelerators, the process initially seemed challenging. However, I had solid documentation and tutorials to back my research and understanding. An aspect I did find myself focused on for a good chunk of time was probably the distributed parallel training across the Habana server. I was fortunate to find PyTorch Lightning which simplified much of this process and made it easy for me to focus on prototyping.
Further, I ran into a few more (albeit smaller) issues:
- Adapting my model for Mixed Precision
- Permuting the convolution weights and other dependent tensors (like momentum) from PyTorch's default KCRS format to the HPU's RSCK format.
- Docker-related issues
What's Next?
I hope to develop this idea further. For this, I'd need to explore:
- More robust training data. SpaceNet is extremely high-quality but limited.
- Train larger models across multiple Habana servers to utilize the power of Gaudi
- Tune hyperparameters (batch size, learning rate, dropout)
- Publishing this model on a wider platform to aid responders in such time-sensitive operations
Built With
- amazon-web-services
- gaudi-accelerators
- habana
- python
- pytorch
Log in or sign up for Devpost to join the conversation.