Inspiration 💡

The increasing need for accurate, up-to-date land cover maps in urban planning, environmental monitoring, and disaster response inspired us to take on this challenge. We saw the potential of leveraging high-resolution aerial imagery combined with advanced machine learning techniques to create detailed, scalable maps. Our goal was to develop a model capable of addressing real-world challenges where precise information about roads, buildings, and vegetation is crucial.

What it does ⚙️

Our model processes high-resolution aerial images to generate detailed land cover maps, classifying areas into key categories such as roads, buildings, vegetation, and water. We focused on fine-tuning a pre-train foundation model. Our ultimate objective was to allow urban planners, environmental agencies, or disaster response teams to gain critical insights as quickly and as accurately as possible.

How we built it 🛠️

We built the model using a pre-trained Swin Transformer for image segmentation, fine-tuning it on the LandCover.ai dataset. The pipeline was developed using PyTorch Lightning to streamline training and included data augmentation techniques to enhance model generalization. We implemented class weighting and label smoothing to address class imbalance, and we logged metrics such as IoU, precision, recall, and F1 scores using TensorBoard for real-time monitoring.

Challenges we ran into 💥

  • Dealing with the class imbalance in the dataset, certain classes like roads and buildings are underrepresented.
  • Optimizing the model’s training on a local GPU, which required efficient memory and compute resource management. Training approximated a couple of hours.
  • Despite our best efforts, we faced difficulty improving certain performance metrics. For instance, our model's IoU, validation accuracy, and F1 score did not exceed 0.557, 0.683, and 0.651, respectively.
  • Initially, we tried using TerraTorch for model training, but it proved to be too slow on our local machine, hindering our ability to iterate quickly. We decided we would both learn more and have more control over where compute resources are spent by making our own run and config handler.

Accomplishments that we're proud of 👏

We have experimented with many different architectures and foundation models, and ended up building our own implementation of a Swin Transformer-based segmentation model, complete with an FPN (Feature Pyramid Network) and a custom segmentation head. We managed to keep inferring time low despite running a single GPU, and we integrated our model to a variety of tools for observability through a custom run saving and loading script. This script also handled a config and model architecture versioning system, which allowed us to iterate quickly on both hyperparameters settings and model architecture.

What's next for ML4Earth 2024 🚀

If we had more time, we would do some more testing to reach the absolute best model architecture and hyperparameters configuration, to get even better results. We would also like to try to deploy the model to a small server and experiment to see how it would fair in environments with even lower compute power.

Built With

Share this project:

Updates