Agri-Lens

Pitch Deck Slide 1
Pitch Deck Slide 2
Pitch Deck Slide 3
Pitch Deck Slide 4
Pitch Deck Slide 5
Pitch Deck Slide 6
Pitch Deck Slide 7
Pitch Deck Slide 8
Pitch Deck Slide 9
Pitch Deck Slide 10
Pitch Deck Slide 11
Pitch Deck Slide 12
Pitch Deck Slide 13
Pitch Deck Slide 14
Pitch Deck Slide 15
Live Backend Log Monitor
screen where video, audio, or image inputs are received
uploaded leaf picture
Gemini's cross-query diagnosis with confidence score, visual finding, and sensor data in one unified report.
AI recommendations with one-tap IoT controls, agronomist notification, and live environmental sensor context.

🌱 Inspiration

500 million smallholder farmers worldwide have no access to agronomists. Crop diseases are typically identified only after 40–70% of yield is already lost. Manual irrigation wastes up to 50% of water. We asked: what if every farmer had a PhD-level agronomist in their pocket — one that could see, listen, remember a full year of field history, and act autonomously?

🤖 What it does

Agri-Lens is a multimodal AI field agent powered by the Gemini API.

Sees — the farmer takes a photo of a diseased leaf or records a voice question; Gemini processes both simultaneously in a single API call
Thinks — visual findings are cross-checked against real-time IoT sensor data (soil moisture, temperature, pH, NPK) and up to 12 months of historical field logs using Gemini's 1M-token Long Context window
Acts — Gemini uses Function Calling to autonomously trigger physical IoT actions: activate_irrigation(), apply_fertilizer(), trigger_pest_alert(), send_agronomist_report()

The core intelligence is Cross-Query Reasoning: when visual symptoms conflict with sensor data (e.g. leaf yellowing looks like disease, but soil moisture is critically low), Agri-Lens resolves the conflict and prioritizes the root cause — drought stress over disease treatment.

🔧 How we built it

Backend: FastAPI (Python 3.12) with 8 REST + SSE endpoints. All requests flow through GeminiService, which orchestrates multimodal inputs, function calling, and streaming responses.

Gemini Integration:

gemini-2.5-flash model via google-generativeai SDK
5 capabilities used: Native Multimodality, Function Calling, Long Context Window, SSE Streaming, Cross-Query Reasoning
System prompts in prompts.py define ordered conflict-resolution rules (PLANT_DISEASE_PROMPT with priority_mode: disease / irrigation / nutrient / combined)

IoT Layer: 6 FunctionDeclaration schemas in tools.py, dispatched via AVAILABLE_FUNCTIONS in iot_handler.py — fully simulated but architected for real MQTT hardware integration.

Data Models: Pydantic v2 — IoTSensorData, PlantDiseaseRequest, DiagnosisResult, PlantDiseaseResult, PriorityMode enum — all validated at the API boundary.

Frontend: Mobile-first HTML/CSS/JS with SSE stream consumer, voice recording, and photo capture.

🧠 Challenges we ran into

The hardest problem was conflict resolution: a leaf that looks diseased and a sensor that screams drought stress should not trigger both treatments simultaneously — that wastes resources and can harm the crop. We solved this by encoding an explicit priority hierarchy into the system prompt and validating the output against a PriorityMode enum.

Streaming structured data (typed JSON events over SSE) while Gemini was mid-generation required careful chunking logic to keep the mobile UI responsive without waiting for the full response.

📚 What we learned

Gemini's native multimodality eliminates the transcription bottleneck entirely — no Whisper, no intermediate step, one unified context
1M token Long Context makes RAG pipelines unnecessary for time-series sensor data at farm scale
Function Calling is the bridge between language model reasoning and real-world physical action — this is what separates an AI assistant from an AI agent