Adaptive Multi-Modal Phenological Transformer (AMPT)

A Novel Cross-Scale Fusion Architecture for Indian Crop Classification


Problem Statement & Innovation Gap

Northern India's fragmented agricultural landscape presents unprecedented challenges for automated crop monitoring. With average farm sizes of 1–2 hectares and irregular field boundaries, traditional remote sensing models trained on homogeneous Western farms achieve only ~65% accuracy on Indian smallholder plots.

The core challenge: Existing approaches treat SAR and optical satellite data as static inputs, ignoring the fundamental reality that different crop growth stages require different modal emphasis.

Innovation: AMPT introduces dynamic phenology-aware fusion — the first architecture to adaptively weight cross-modal attention based on real-time crop development stages.


Abstract

Northern India's smallholder farms—averaging 1–2 hectares with irregular boundaries—pose severe challenges for automated crop monitoring. Traditional remote-sensing models trained on large, homogeneous Western farms achieve only 65% accuracy on these fragmented plots, failing to capture complex temporal dynamics across crop growth stages.

We propose the Adaptive Multi-Modal Phenological Transformer (AMPT), the first architecture to introduce dynamic phenology-aware fusion, adaptively weighting SAR and optical satellite data based on real-time crop development.


Key Components

  1. Cross-Modal Phenological Attention (CMPA):

    • A temporal encoder infers the current growth phase (sowing, vegetative, flowering, maturity).
    • Dynamically adjusts attention:
      • SAR → during early soil-preparation stages.
      • Optical bands → during maturity for chlorophyll and moisture detection.
  2. Hierarchical Scale-Adaptive Fusion:

    • Multi-scale tokenization at:
      • Field level: 16×16 px
      • Landscape level: 64×64 px
      • Regional level: 256×256 px
    • Inter-scale attention with boundary-aware masks preserves irregular field shapes and aggregates spatial context.
  3. Foundation Model Adaptation:

    • Starting point: IBM–NASA Prithvi geospatial foundation model.
    • Fine-tuned on India's AgriFieldNet dataset with:
      • Phenological augmentation.
      • Multi-task learning (crop classification + phenology regression).
    • Reduces labeled-data requirements by 70% and enhances temporal generalization.
    • Processes co-registered Sentinel-1/EOS-4 SAR and Sentinel-2/ResourceSAT optical time series (monthly composites, February–August).
    • Performs dynamic fusion via transformer-based attention.
    • Outputs:
      • Precise classification for 12 crop types + fallow.
      • Auxiliary growth-monitoring task derives NDVI/EVI curves and phenological dates (green-up rate, peak biomass, senescence).

Expected Impact

  • Target Accuracy:

    • 90% overall accuracy
    • 87% accuracy on sub-hectare fields (15–20% improvement over current methods).
  • Applications:

    • Crop assessment.
    • Yield forecasting.
    • Food-security planning across Uttar Pradesh, Rajasthan, Odisha, and Bihar.

By introducing phenology-aware multi-modal fusion tailored to India’s complex agricultural landscape, AMPT establishes a new paradigm for satellite-based crop monitoring in developing regions.

Built With

Share this project:

Updates