Inspiration A child dials 911 from the backseat of a crashed car. "Mommy isn't moving." Across the street, a witness calls the same number. So does the injured driver in the other vehicle. Three callers. Three fragments of the same truth.

In today's system, each caller reaches a different dispatcher — if they reach one at all. 25–30% of US 911 dispatcher positions sit vacant. 240 million calls hit dispatch centers every year. When a major incident triggers a flood of simultaneous calls, the fragments live in separate call records that may never be correlated until long after responders arrive.

AI is entering this space fast — Axon spent $1.4B on acquisitions, RapidSOS deployed AI copilots across 22,000+ agencies, Motorola launched three AI assistants in January 2026. But every one of these systems is assistive. AI sits alongside a human. The human catches errors.

What happens when AI handles the call directly? Large language models hallucinate. A hallucinated address sends responders to the wrong location. A fabricated medical detail causes harm. No deployed emergency AI system has a dedicated verification layer for LLM outputs.

That's what inspired Nexus911 — the question: can you build autonomous AI agents for 911 that you can actually trust?

What it does Nexus911 deploys a dedicated AI voice agent for every 911 caller using the Gemini Live API. When three people call about the same car accident, each gets their own Gemini-powered agent that conducts the full emergency interview — full-duplex audio, barge-in support, role-adapted conversation.

VerifyLayer is the core innovation: a hallucination verification middleware that validates every AI-extracted fact through Natural Language Inference (NLI) before it enters the dispatch record. The pipeline:

Fact Extraction — Gemini extracts discrete verifiable claims from caller speech NLI Verification — A second Gemini instance checks: does the transcript actually support this claim? (Entailment / Contradiction / Neutral) Cross-Call Contradiction Detection — Pairwise comparison across callers. "Blue sedan" vs. "red truck"? Caught immediately. Confidence Scoring — Composite score weighted by NLI entailment (40%), caller role credibility (30%), cross-caller corroboration (20%), and contradiction penalty (10%) The Knowledge Graph lets agents share intelligence mid-conversation — when Agent 2 learns the accident location, Agent 1 can confirm it with their caller in real time.

A Deduplication Engine using Haversine geo-distance, temporal windowing, and semantic text similarity merges multiple calls about the same event into one enriched incident.

Result: a complete dispatch package with confidence-scored facts, source attribution, and recommended response units — in under 20 seconds.

How we built it Voice Agents: Google ADK (Agent Development Kit) with Gemini 2.5 Flash native audio model via the Gemini Live API — full-duplex voice conversations with barge-in, no STT/TTS intermediary VerifyLayer Pipeline: Gemini 2.5 Flash configured as an NLI engine, async LRU cache (1024 entries, 300s TTL), cross-call contradiction detection via pairwise fact comparison, weighted penalization engine Agent Architecture: 1:1 caller-to-agent mapping with six role-specific personas (CHILD, VICTIM, WITNESS, BYSTANDER, OFFICIAL, UNKNOWN) — each adapts vocabulary, pacing, and question strategy Backend: FastAPI with WebSocket endpoints for real-time voice streaming and dashboard updates, fully async Frontend: React 19 + TypeScript + Tailwind CSS v4 + Framer Motion — landing page at /, real-time dispatch dashboard at /dashboard with live incident cards, conversation transcripts, and VerifyLayer confidence scores Knowledge Graph: In-memory incident graph with real-time WebSocket push, designed for Firestore migration Deduplication: Three-factor clustering — Haversine geo-distance (200m threshold) + temporal window + semantic text similarity Simulation: Multi-caller scenario runner with text fallback for reliable demonstration when voice API is unavailable Deployment: Google Cloud Run via Cloud Build with multi-stage Docker build and Secret Manager for API key security Testing: 26/26 tests passing covering VerifyLayer pipeline, deduplication engine, and knowledge graph

Challenges we ran into The Fail-Open Dilemma. Our first instinct was to block unverified facts from entering the knowledge graph. This is wrong for 911. If a caller screams "there's a fire at 5th and Main" and verification times out, you cannot block that fact. Someone might die. We redesigned VerifyLayer as fail-open middleware — intelligence always enters immediately, verification runs in the background. Counterintuitive but correct.

First-Fact Bootstrapping. The first fact for any incident has nothing to verify against — empty premise. NLI returns 0.0 because there's no prior context. We solved this with baseline confidence scoring, letting cross-caller corroboration from subsequent callers increase it over time.

Contradiction Detection at Scale. Pairwise comparison of every fact against every other fact across callers grows quadratically. We scoped comparisons to facts within the same incident and used caching to avoid redundant NLI calls.

Role Classification Without Training Data. No labeled dataset exists for 911 caller roles. We used heuristic classification from conversation cues within the agent prompt — vocabulary, emotional state, and information specificity.

Voice Pipeline Reliability. Gemini Live API sessions can fail under load or quota limits. We built a text-based simulation fallback that still exercises the full pipeline for demo reliability.

Accomplishments that we're proud of VerifyLayer is genuinely novel. No deployed emergency AI system — not Axon, not RapidSOS, not Motorola — has NLI-based hallucination verification with cross-call contradiction detection. The confidence scoring system. Every dispatched fact carries a composite score with full provenance — which caller said it, their credibility weight, corroboration status, and NLI entailment score. 1:1 caller-to-agent mapping. Each caller gets their own dedicated Gemini agent, not a shared assistant. True parallel processing of simultaneous emergencies. Fail-open design. We solved the fundamental tension between verification rigor and emergency urgency — and we believe it's the right architectural pattern for any safety-critical AI system. 26/26 tests passing with comprehensive coverage of the full verification, deduplication, and knowledge graph pipeline.

What we learned Anti-hallucination is an architecture problem, not a prompt problem. "Don't make things up" in a system prompt is not a verification layer. You need separate inference, separate scoring, and cross-referencing across information sources.

NLI is underused in production AI. Natural Language Inference has been a standard NLP benchmark for years, but it's rarely deployed as runtime verification middleware. Pairing a generative model with an NLI model creates a check-and-balance that neither provides alone.

ADK makes multi-agent orchestration practical. Google's Agent Development Kit handles session management, tool calling, and agent coordination that would otherwise take weeks to build from scratch.

Confidence scores change everything. When every fact has a number attached, downstream systems can make proportional decisions — two ambulances for a 0.97-confidence injury, one for a 0.45-confidence report.

What's next for Nexus911

Firestore-backed knowledge graph for persistence and multi-instance coordination across Cloud Run replicas Vertex AI integration for production-grade model serving with SLA guarantees Multilingual support — extending NLI verification to Spanish, Mandarin, and other high-volume 911 languages

Vision integration — callers sharing camera feeds for visual situational awareness via Gemini's multimodal capabilities

Load testing — validating the 200ms VerifyLayer latency budget under real concurrent call volumes Agency partnerships — piloting with real 911 centers to validate against ground-truth dispatch outcomes

Built With

  • docker
  • fastapi
  • framer-motion
  • gemini-2.5-flash
  • gemini-live-api
  • google-adk
  • google-cloud-build
  • google-cloud-run
  • google-secret-manager
  • httpx
  • lucide-react
  • pydantic
  • python
  • react
  • react-router
  • tailwind-css
  • typescript
  • uvicorn
  • vite
  • websockets
Share this project:

Updates