Inspiration

Introducing a baby to solid foods is one of the most anxiety-inducing parts of early parenthood. Every product has different age recommendations, allergen warnings, and ingredient lists — and parents are making these decisions at 2am on a phone while holding a crying infant.

We wanted to build something that felt like calling your grandmother: someone warm, knowledgeable, and patient who could look at what you already have in the kitchen and tell you exactly what to buy next.

That became Guida.

What it does

Guida is a voice-first AI shopping agent powered by Gemini 2.5 Flash native audio and Google ADK. You open a browser tab, allow mic and camera access, and start talking naturally.

  • Show your pantry — Guida sees what you already have through the camera
  • Ask anything — "My baby is 7 months old, what should I try next?" or "Is this rice cereal too advanced?"
  • Get real recommendations — Guida searches a live product catalog based on your baby's age and needs
  • Add to cart by voice — "Add that to my cart" triggers a real checkout session
  • No typing, no filtering, no searching — just conversation

How we built it

The core is a bidirectional WebSocket pipeline:

Browser → FastAPI → Google ADK → Gemini 2.5 Flash → back

  1. The browser captures 16kHz PCM audio and JPEG camera frames
  2. A FastAPI server bridges the WebSocket to ADK's LiveRequestQueue
  3. ADK runs the agent via run_live() in BIDI streaming mode
  4. Gemini processes speech + vision simultaneously, calls tools or responds with audio
  5. Commerce tools hit FlowBlinq's ACP (Agent Commerce Protocol) endpoints for live product search and cart management
  6. Audio streams back to the browser speaker in real time

We used Gemini's native audio model (gemini-2.5-flash-native-audio-preview) — no STT → LLM → TTS pipeline. This eliminates the 1-2 second latency typical of voice agents and makes interruption handling feel natural.

Guida's avatar was generated with Gemini 2.0 Flash image generation. The system prompt gives her a warm grandmother persona with strict commerce-first behavior rules.

Challenges we ran into

ADK BIDI streaming is powerful but opaque. The run_live() loop interleaves audio chunks, transcriptions, tool calls, and interruption signals in a single event stream. We spent significant time adding event logging to understand what Gemini was actually sending back and in what order.

Camera + audio sync. Sending video frames as base64 JSON alongside binary PCM audio on the same WebSocket required careful message framing to avoid blocking the audio pipeline.

React state in Devpost forms. Not a technical challenge — just ironic that a hackathon submission form resists automation.

Accomplishments that we're proud of

  • Sub-second voice response times with no traditional STT/TTS pipeline
  • True interruption handling — Gemini detects mid-sentence interruptions and stops cleanly
  • Live commerce: actual product search + cart creation from a voice conversation
  • Camera vision + voice creates a genuinely new shopping modality — showing your pantry and asking what's missing is more natural than any search interface

What we learned

  1. Native audio models change the UX ceiling — the latency difference vs. STT+TTS is immediately noticeable
  2. ADK's session and queue model is well-designed — once you understand it, adding tools and modifying agent behavior is fast
  3. Vision + voice is the right interface for discovery shopping — the ability to show context (your kitchen, a product label) before asking removes the vocabulary barrier
  4. Gemini's function calling works seamlessly in audio mode — tool calls mid-conversation don't interrupt the audio stream

What's next for Guida

  • Expand beyond baby food to any high-consideration purchase category (supplements, pet food, specialty diets)
  • Multi-turn memory — Guida remembers what your baby has tried and what caused reactions
  • Proactive reorder detection — Guida notices when you're running low (via camera) and suggests reorders
  • White-label for brands: any merchant can deploy their own Guida persona on top of FlowBlinq ACP

Built With

Share this project:

Updates