Inspiration
Reliable, up-to-date water-surface maps are foundational for flood response, water-security planning, and wetland conservation—but many regions cannot afford the annotation and computing required to train bespoke models. Recent Earth-observation Foundation Models (FMs) promise transferability, yet it’s unclear how well they generalize beyond their pretraining horizon. This project set out to answer two practical questions:
- RQ1: How can FMs, particularly Copernicus-FM, empower the regions that need it most by making water surface mapping more affordable and accessible?
- RQ2: How effectively can Copernicus-FM perform when faced with a dataset beyond its pretraining horizon? >> Generalization is the target here!
What We Built (in one sentence)
A label-efficient water-mapping pipeline that fine-tunes a lightweight decoder on top of a frozen CopernicusFM backbone to segment water from satellite imagery, with fixed val/test splits and label-budget experiments (10–100%).
What We Learned
Foundation models transfer well: CopernicusFM, even when frozen, adapts effectively to an external, RGB dataset with no per-band metadata.
There is a clear label-efficiency “knee”: Most gains occur between 10–30% labeled data; by 30%, we recover ~97–99% of fully supervised performance.
Bottlenecks are data/label quality, not capacity: Remaining errors cluster at boundaries (shorelines, narrow channels) and in shadow/snow/turbidity edge cases.
Deployment can be lightweight: ~3.3M trainable params (~13 MB), sub-ms/image inference, and constant runtime across label fractions—suited to on-prem/edge.
How We Built It
Data
Dataset: Satellite Images of Water Bodies— 2,841 RGB images with binary masks; diverse geographies, seasons, and water types.
Splits: Fixed validation and test sets across all experiments; only the training set is subsampled to create label fractions {10, 20, 30, 50, 100}%.
Preprocessing: Resize/crop to model input; masks kept binary; standard tensor normalization.
Model & Training
Backbone: CopernicusFM (frozen) to preserve pretrained priors and reduce compute.
Head: Compact fully-convolutional decoder (two 3×3 conv blocks with GroupNorm + ReLU + dropout; final 1×1 projection), bilinear upsampling to input size.
Objective: Binary cross-entropy with logits (or equivalent), class-balanced sampling when needed.
Protocol: Same architecture/hyperparameters for all fractions; early stopping on validation IoU/Dice; fixed seeds.
Results
Full supervision (100%): Test OA = 0.864, IoU = 0.749, Dice = 0.855.
Low labels (10%): Test IoU = 0.683, Dice = 0.809—already operational for many use cases.
Label-efficiency: The largest jump is 10% → 30%; by 30%, we reach ~97.5% (IoU) and ~98–99% (OA/Dice) of the 100% result.
Error profile: False negatives and false positives both decline with more labels; improvements concentrate on thin channels and shorelines.
Complexity: Trainable params ~3.32M; inference sub-ms per image and stable across fractions.
Challenges We Faced (and How We Addressed Them)
Out-of-scope generalization: CopernicusFM is benchmarked on Copernicus data; our dataset differs (RGB-only, index-derived masks). Mitigation: Freeze the backbone (reduce overfitting risk), use a small decoder, fix val/test, and examine label-efficiency curves and confusion matrices.
Label noise & boundary ambiguity (shadows, snow/ice, turbid water, vegetated edges). Mitigation: Emphasize trends over single scores, micro-average metrics, and recommend boundary-focused QA and active learning.
Compute constraints (hackathon setting). Mitigation: Frozen backbone + tiny head; GroupNorm for small batches; early stopping; deterministic seeds.
Next Steps
Future work should (i) incorporate noise-robust losses and calibration metrics, (ii) expand to multi-sensor inputs (e.g., S1 SAR for cloud-robustness) without sacrificing the lightweight head, and (iii) add uncertainty quantification to support risk-aware decision-making during floods.
Log in or sign up for Devpost to join the conversation.