Inspiration
Every 10 minutes of delay in trauma care costs lives. We kept coming back to a simple, brutal fact: when someone dials emergency services, a human dispatcher is expected to simultaneously calm a panicking caller, identify injury severity from broken audio, locate the nearest available ambulance, alert the hospital, and give first aid instructions all at once, with no AI assistance, in under 60 seconds. We wanted to build something that felt genuinely useful, not just technically impressive. The "golden hour" the first 60 minutes after a traumatic injury is the window where coordinated action determines whether someone survives. Emergency response hasn't fundamentally changed in 40 years. We thought multimodal AI could change it today.
What it does
How we built it
The Golden Hour Dispatcher is a real-time AI agent network for emergency response. A caller presses SOS on the mobile app. From that moment the system does three things simultaneously that no human dispatcher can do alone. It listens Gemini Live API streams 16kHz audio in real time, detecting not just words but emotional urgency. When it hears panic, affective dialog shifts its tone to shorter, calmer sentences. It sees the caller's camera sends one JPEG frame per second; Gemini's vision identifies injury type, severity, and scene hazards in real time. And it acts three AI agents fire in parallel: a Dispatch Agent routes the nearest ambulance via Google Maps, a Medical Agent generates step-by-step first aid and delivers it in ElevenLabs HD voice, and an ER Agent prepopulates the hospital dashboard with injuries, severity score, and ETA before the patient arrives. The ER team is preparing the trauma bay while the ambulance is still 7 minutes away.
Challenges we ran into
The Gemini Live tool dispatch loop was our biggest early blocker. The session silently hangs if you send FunctionResponses incorrectly one send per tool instead of all tools batched together, or failing to echo the exact fn_call.id. There is no error. The session just stops. We wrote five pytest tests specifically to catch this before it hit the demo. The native-audio model (gemini-2.5-flash-native-audio-latest) cannot be used with the ADK's generateContent path it's a Live-only model. We had to maintain two separate model env vars: one for the Live WebSocket session and one for ADK agent reasoning. Conflating them caused cryptic 404 errors that took time to diagnose. Affective dialog requires both the v1alpha API version and enable_affective_dialog=True in the session config. Neither alone is sufficient. The SDK version we installed (0.8.0) didn't expose the field in its Pydantic model, so we added a try/except fallback that upgrades gracefully as the SDK matures. ElevenLabs library voices require a paid plan we discovered this at 2am and pivoted to premade voices, which work on the free tier and sound better anyway.
Accomplishments that we're proud of
The parallel agent architecture genuinely works. Three independent agents routing, medical, hospital execute concurrently and complete in under 10 seconds from the moment Gemini finishes triage. Watching the ER dashboard populate in real time while the caller is still on the line was the moment we knew the project was real. The two-voice architecture is something we haven't seen elsewhere. Gemini handles all reasoning and speaks during the early triage phase. When it detects sustained distress, it hands off voice delivery to ElevenLabs a warmer, more human sound for the moment that matters most. The voice badge on the ER dashboard switches visibly between "Gemini Live" and "ElevenLabs HD." It's a demo moment, but it's also genuinely the right design decision. We passed 8/8 API connectivity and scenario tests all three emergency scenarios (car accident, cardiac event, house fire) produced structured triage JSON, dispatched ambulances, and notified hospitals. The system is not a prototype that only works on a happy path. It handles missing data, tool failures, and API timeouts gracefully.
What we learned
Multimodal live APIs are a fundamentally different programming model. You are not sending requests and receiving responses. You are managing a persistent session, a streaming event loop, and concurrent tool execution all while the user is speaking. Bugs that would surface immediately in a request/response model can silently break a live session for 30 seconds before you notice. Separating reasoning from voice was the right call architecturally and acoustically. Gemini's native audio is fast and intelligent. ElevenLabs is emotionally expressive. Trying to do both with one system would mean compromising on one. Two systems, one clear job each, connected by a single env var switch that is the correct pattern for high-stakes voice AI. Demo resilience is a feature. We built DEMO_MODE=true (fixture replay), VOICE_MODE=gemini (ElevenLabs fallback), and a full incident simulator with synthetic asset generation. Every backup is a single env var. On demo day, nothing should require a code change.
What's next for The Golden Hour Dispatcher
The immediate next step is real CAD (Computer-Aided Dispatch) integration replacing the mock ambulance tool with a live feed from actual dispatch systems. The architecture is already set up for this: swap one async function, the rest of the pipeline is unchanged. Drone video ingestion is the feature we most want to build next. A 20fps aerial feed from a first-responder drone gives Gemini vision a complete scene picture before any ground unit arrives. The pipeline already handles JPEG frames scaling to video rate and higher resolution is an infrastructure problem, not an AI one. Multilingual support via Gemini's real-time translation would let a dispatcher in English communicate through the system with a caller speaking Spanish, Mandarin, or Hindi with ElevenLabs synthesising the response in the caller's language. And wearable biometric fusion ingesting heart rate and SpO2 from an Apple Watch or Fitbit at the scene would let Gemini correlate what it sees with what the body is reporting, closing the loop between visual and physiological triage. The golden hour has always belonged to chance and traffic. It should belong to AI.
Log in or sign up for Devpost to join the conversation.