Adaptive Multi-Modal Phenological Transformer (AMPT)
A Novel Cross-Scale Fusion Architecture for Indian Crop Classification
Problem Statement & Innovation Gap
Northern India's fragmented agricultural landscape presents unprecedented challenges for automated crop monitoring. With average farm sizes of 1–2 hectares and irregular field boundaries, traditional remote sensing models trained on homogeneous Western farms achieve only ~65% accuracy on Indian smallholder plots.
The core challenge: Existing approaches treat SAR and optical satellite data as static inputs, ignoring the fundamental reality that different crop growth stages require different modal emphasis.
Innovation: AMPT introduces dynamic phenology-aware fusion — the first architecture to adaptively weight cross-modal attention based on real-time crop development stages.
Abstract
Northern India's smallholder farms—averaging 1–2 hectares with irregular boundaries—pose severe challenges for automated crop monitoring. Traditional remote-sensing models trained on large, homogeneous Western farms achieve only 65% accuracy on these fragmented plots, failing to capture complex temporal dynamics across crop growth stages.
We propose the Adaptive Multi-Modal Phenological Transformer (AMPT), the first architecture to introduce dynamic phenology-aware fusion, adaptively weighting SAR and optical satellite data based on real-time crop development.
Key Components
Cross-Modal Phenological Attention (CMPA):
- A temporal encoder infers the current growth phase (sowing, vegetative, flowering, maturity).
- Dynamically adjusts attention:
- SAR → during early soil-preparation stages.
- Optical bands → during maturity for chlorophyll and moisture detection.
- SAR → during early soil-preparation stages.
- A temporal encoder infers the current growth phase (sowing, vegetative, flowering, maturity).
Hierarchical Scale-Adaptive Fusion:
- Multi-scale tokenization at:
- Field level: 16×16 px
- Landscape level: 64×64 px
- Regional level: 256×256 px
- Field level: 16×16 px
- Inter-scale attention with boundary-aware masks preserves irregular field shapes and aggregates spatial context.
- Multi-scale tokenization at:
Foundation Model Adaptation:
- Starting point: IBM–NASA Prithvi geospatial foundation model.
- Fine-tuned on India's AgriFieldNet dataset with:
- Phenological augmentation.
- Multi-task learning (crop classification + phenology regression).
- Phenological augmentation.
- Reduces labeled-data requirements by 70% and enhances temporal generalization.
- Processes co-registered Sentinel-1/EOS-4 SAR and Sentinel-2/ResourceSAT optical time series (monthly composites, February–August).
- Performs dynamic fusion via transformer-based attention.
- Outputs:
- Precise classification for 12 crop types + fallow.
- Auxiliary growth-monitoring task derives NDVI/EVI curves and phenological dates (green-up rate, peak biomass, senescence).
- Precise classification for 12 crop types + fallow.
- Starting point: IBM–NASA Prithvi geospatial foundation model.
Expected Impact
Target Accuracy:
- 90% overall accuracy
- 87% accuracy on sub-hectare fields (15–20% improvement over current methods).
- 90% overall accuracy
Applications:
- Crop assessment.
- Yield forecasting.
- Food-security planning across Uttar Pradesh, Rajasthan, Odisha, and Bihar.
- Crop assessment.
By introducing phenology-aware multi-modal fusion tailored to India’s complex agricultural landscape, AMPT establishes a new paradigm for satellite-based crop monitoring in developing regions.
Log in or sign up for Devpost to join the conversation.