🌱 Inspiration

500 million smallholder farmers worldwide have no access to agronomists. Crop diseases are typically identified only after 40–70% of yield is already lost. Manual irrigation wastes up to 50% of water. We asked: what if every farmer had a PhD-level agronomist in their pocket — one that could see, listen, remember a full year of field history, and act autonomously?

🤖 What it does

Agri-Lens is a multimodal AI field agent powered by the Gemini API.

  • Sees — the farmer takes a photo of a diseased leaf or records a voice question; Gemini processes both simultaneously in a single API call
  • Thinks — visual findings are cross-checked against real-time IoT sensor data (soil moisture, temperature, pH, NPK) and up to 12 months of historical field logs using Gemini's 1M-token Long Context window
  • Acts — Gemini uses Function Calling to autonomously trigger physical IoT actions: activate_irrigation(), apply_fertilizer(), trigger_pest_alert(), send_agronomist_report()

The core intelligence is Cross-Query Reasoning: when visual symptoms conflict with sensor data (e.g. leaf yellowing looks like disease, but soil moisture is critically low), Agri-Lens resolves the conflict and prioritizes the root cause — drought stress over disease treatment.

🔧 How we built it

Backend: FastAPI (Python 3.12) with 8 REST + SSE endpoints. All requests flow through GeminiService, which orchestrates multimodal inputs, function calling, and streaming responses.

Gemini Integration:

  • gemini-2.5-flash model via google-generativeai SDK
  • 5 capabilities used: Native Multimodality, Function Calling, Long Context Window, SSE Streaming, Cross-Query Reasoning
  • System prompts in prompts.py define ordered conflict-resolution rules (PLANT_DISEASE_PROMPT with priority_mode: disease / irrigation / nutrient / combined)

IoT Layer: 6 FunctionDeclaration schemas in tools.py, dispatched via AVAILABLE_FUNCTIONS in iot_handler.py — fully simulated but architected for real MQTT hardware integration.

Data Models: Pydantic v2 — IoTSensorData, PlantDiseaseRequest, DiagnosisResult, PlantDiseaseResult, PriorityMode enum — all validated at the API boundary.

Frontend: Mobile-first HTML/CSS/JS with SSE stream consumer, voice recording, and photo capture.

🧠 Challenges we ran into

The hardest problem was conflict resolution: a leaf that looks diseased and a sensor that screams drought stress should not trigger both treatments simultaneously — that wastes resources and can harm the crop. We solved this by encoding an explicit priority hierarchy into the system prompt and validating the output against a PriorityMode enum.

Streaming structured data (typed JSON events over SSE) while Gemini was mid-generation required careful chunking logic to keep the mobile UI responsive without waiting for the full response.

📚 What we learned

  • Gemini's native multimodality eliminates the transcription bottleneck entirely — no Whisper, no intermediate step, one unified context
  • 1M token Long Context makes RAG pipelines unnecessary for time-series sensor data at farm scale
  • Function Calling is the bridge between language model reasoning and real-world physical action — this is what separates an AI assistant from an AI agent

🚀 What's next

  • Real hardware IoT sensors via MQTT protocol
  • Offline-first edge AI mode for low-connectivity regions
  • Multi-language voice support (Swahili, Hindi, Turkish, Spanish)
  • Satellite imagery integration (Sentinel-2) for field-level crop mapping
  • Cooperative data network for seasonal pattern sharing between farms

Built With

Share this project:

Updates