V1.0 Submitted: Engineering AgriLive for the Real World!
Hey everyone! I just officially hit "Submit" on AgriLive: Multimodal Farm Assistant for the Gemini Live Agent Challenge, and I wanted to share a quick look under the hood at how this project evolved from a raw concept into a production-ready application.
When I started, the goal was simple: help farmers in regions like Kerala fight climate volatility without forcing them to navigate complex text menus. But building a real-time, voice-and-vision AI over intercontinental networks? That required some serious engineering pivots.
The Evolution & Key Features
- From Text to Native Audio: We completely ditched the text-box paradigm. AgriLive now uses
gemini-live-2.5-flash-native-audioover WebSockets for a continuous, bidirectional, and empathetic voice conversation. - The "Walkie-Talkie" Protocol: Vertex AI expects a continuous audio stream and will drop the connection if the user goes silent. My favorite hack of the weekend? Building a "Walkie-Talkie" mode that dynamically streams an array of zeros (pure silence) to Google's servers whenever the AI is speaking, keeping the WebSocket completely stable!
- Beating Ocean Latency: Streaming raw audio packets from India to the
us-central1servers caused brutal audio stuttering. I engineered a custom frontend Jitter Buffer with a "fast-track re-entry" mechanism to ensure the 24kHz PCM audio playback stays incredibly smooth, even on weak rural networks. - Concurrent Vision Agent: While the user is talking, they can snap a picture of a diseased crop. A backend cascading fallback engine routes the image to
gemini-2.5-flash, enforcing a strict Pydantic Structured Output to guarantee the UI gets a perfectly parsed JSON diagnosis every single time.
Snippet Spotlight: The Fast-Track Jitter Buffer
Here is a peek at the logic that stitches delayed audio packets back together mid-sentence without forcing the browser to wait for a full buffer stockpile:
// Push incoming Float32 PCM data to the jitter buffer
audioQueue.push(float32);
// Fast-track re-entry: If the network lags but the AI is already mid-sentence,
// bypass the buffer threshold and play the audio immediately!
const isMidSentence = btnStart.classList.contains("speaking");
if (!isPlaybackStarted && (audioQueue.length >= JITTER_BUFFER_THRESHOLD || isMidSentence)) {
startPlaybackLoop();
}
Log in or sign up for Devpost to join the conversation.