Inspiration

In the United States, the average 911 call takes 10 seconds just to be answered. After that, a human dispatcher must manually ask questions, assess the situation, look up the nearest units, and coordinate the response — a process that can add 1–2 more minutes before help is on the way.

We kept asking: what if the AI could start helping the moment the caller speaks? Not after a form is filled out, not after a supervisor approves — immediately. When someone says "there's a fire and people are trapped," every second of delay has real consequences.

Gemini 3's structured output and image generation capabilities gave us the missing pieces to build a system that doesn't just listen, but understands the situation, generates actionable intelligence, and creates visual context — all in real time.

What it does

Sentinel 911 is a smart city emergency dispatch command center where AI handles the full lifecycle of a 911 call:

  1. Real-time voice conversation — The AI talks directly with the caller as a trained 911 dispatcher, asking progressive follow-up questions ("Is anyone injured?", "Are there any hazards?")

  2. Autonomous resource deployment — While the conversation is happening, the AI proactively dispatches fire trucks, police, ambulances, or HAZMAT teams to the caller's location. It doesn't wait for a human to click a button.

  3. Live structured intelligence — Gemini 3 Flash continuously analyzes the conversation every 1.5 seconds, extracting a structured incident report: situation summary, threat level, persons involved, infrastructure status, and the caller's emotional state (Calm → Panic on a 5-point scale).

  4. Aerial visual reconnaissance — Gemini 3 Pro generates a photorealistic aerial image of the incident location, giving commanders a visual overview before units even arrive. When responders reach the scene, the image automatically updates to show the active response.

  5. Autonomous escalation — A separate AI loop runs every 6 seconds, independently deciding whether to escalate: dispatch backup, alert hospitals, evacuate buildings, cut utilities, or reroute traffic — all without human input.

Feature Breakdown

Feature What it does
Voice Dispatcher Bi-directional real-time voice call with the emergency caller using natural conversation
Auto-Dispatch AI calls tools to deploy police, fire, ambulance, or HAZMAT units the moment it hears relevant information
Structured Extraction Every 1.5s, Gemini 3 Flash parses the conversation into a structured JSON incident report
Tone Detection Classifies caller emotion (Calm → Controlled → Urgent → Distressed → Panic) with stabilization to prevent flickering
Interactive Tactical Map Dark-themed Leaflet map with incident pin, animated dispatch routes following real roads, and lockdown perimeter visualization
Animated Dispatch Routes Vehicles travel along real road paths (via OSRM routing) at speeds proportional to actual distance
Real Station Lookup Finds actual nearby fire stations, police stations, and hospitals via geographic search — not hardcoded positions
Dynamic Lockdown AI chooses perimeter radius based on threat severity (partial vs. full lockdown)
Aerial Recon Imagery Gemini 3 Pro generates top-down drone-style photos of the incident address
Before/After Scene Recon image updates with emergency responders on-site once units arrive
Smart City Commands AI suggests tactical actions (e.g., "Lockdown Sector A") that operators can execute with one click
Autonomous Escalation Independent AI loop that proactively decides to dispatch backup, alert schools, cut utilities, or reroute traffic
Live Incident Board Real-time dashboard showing situation summary, threats, infrastructure status, and active protocol checklists
Call Transcript Chat-bubble transcript with auto-scroll, showing caller, AI, and system messages

How we built it

Sentinel 911 is a React + TypeScript web application built with Vite.

The core architecture orchestrates three Gemini models working together: Since gemini 3 doesn't natively support audio, we used 2.5 flash native audio for the dispatcher. This is not representative of the main functionality of the application(placeholder for real dispatcher, who would normally be human).

  • Gemini 2.5 Flash (native audio) handles the real-time voice conversation via WebSocket. We chose this model for its audio streaming capability — when Gemini 3 models gain native audio support in the future, they can be swapped in directly. The voice connection also carries autonomous tool declarations (dispatch units, lock down sectors, deploy drones, generate reports), so the AI takes action mid-conversation.

  • Gemini 3 Flash powers two parallel analysis loops. The first runs every 1.5 seconds, sending the growing transcript through a strict JSON response schema to extract structured intelligence. The second runs every 6 seconds for autonomous escalation decisions. We use Gemini 3 Flash's structured output with responseSchema to guarantee reliable, typed JSON every time.

  • Gemini 3 Pro (image generation) creates aerial reconnaissance imagery of the incident location. The prompt instructs the model to generate a photorealistic top-down drone photograph of the specific address, and later regenerates with emergency vehicles once responders arrive.

For the map, we use Leaflet with a dark CARTO basemap. When a unit is dispatched, we search for real nearby stations using OpenStreetMap/Nominatim (e.g., searching "fire station" within 15km of the incident), then fetch an actual road-following route from the OSRM routing API. Vehicles animate along this path at a speed proportional to the real distance.

One key design decision: we implemented a tone stability algorithm to prevent the caller emotion indicator from flickering. The tone only updates when the shift is \(\geq 2\) levels on the 5-point scale — for example, jumping from "Calm" directly to "Urgent" is allowed, but oscillating between "Urgent" and "Distressed" frame-by-frame is suppressed.

Challenges we ran into

Preventing the AI from over-dispatching. Our first version deployed every available unit the moment someone said "emergency." Getting the AI to escalate progressively — location first, then one unit type, then wait for more information — required careful prompt engineering in the system instruction.

Audio echo loops. The AI's spoken response was picked up by the microphone and fed back in, creating feedback loops. We solved this by using the browser's Web Speech API for caller transcription, completely separate from the AI's audio stream.

Structured output reliability. Getting consistent, parseable JSON from the Flash model every 1.5 seconds required strict responseSchema definitions with explicit types and enums. Without the schema constraint, the model would occasionally return malformed data that broke the dashboard.

Map animation in background tabs. Browsers throttle requestAnimationFrame when a tab is unfocused, causing vehicles to freeze. We switched to setInterval so dispatch animations continue running even when users switch tabs.

Accomplishments that we're proud of

The moment that made this project feel real: during testing, we said "there's a fire at 742 Evergreen Terrace" and watched the AI lock the map, dispatch a fire truck from an actual fire station it found nearby, generate an aerial photo of the neighborhood, and ask "Is anyone trapped inside?" — all within about 3 seconds, without us clicking anything.

The multi-model orchestration is something we're particularly proud of. Three Gemini models running simultaneously — one handling voice, one continuously analyzing text, one generating images — coordinated through a single React application with no external backend.

What we learned

  • Structured output changes everything. Being able to define an exact JSON schema and receive reliable typed data from Gemini 3 Flash every 1.5 seconds made building a real-time dashboard surprisingly straightforward. This is the feature that makes Gemini 3 a platform for building applications, not just a chatbot.

  • The AI is better when constrained. An unconstrained dispatcher AI tries to do too much at once. Adding a progressive response protocol ("only dispatch based on confirmed information, one action per turn") made the AI dramatically more useful and realistic.

  • Image generation adds unexpected value. We initially added the aerial recon imagery as a visual flourish, but during testing it became one of the most compelling features — it gives operators spatial context they can't get from text alone.

What's next for Sentinel 911

  • Real dispatch integration — Connect to Computer-Aided Dispatch (CAD) systems used by actual 911 centers
  • multi-lingual support — this allows anyone, despite their English proficiency, to communicate clearly with a dispatcher by breaking the language barrier
  • Multi-incident handling — Support multiple concurrent calls with prioritized triage across incidents
  • Live camera feeds — Feed real drone or CCTV streams into Gemini for visual scene analysis instead of generated images
  • Dispatcher training mode — Use the system as a simulator for training new 911 operators with AI-generated callers
  • Full Gemini 3 migration — When Gemini 3 models gain native audio streaming capabilities, migrate the voice pipeline for improved reasoning and lower latency across the entire system

Built With

Share this project:

Updates