RELAY — Devpost Submission

Inspiration

During Hurricane Harvey in 2017, Houston's 911 system received over 56,000 calls in a single day, which is more than double its normal volume. Operators were overwhelmed. And that was in English. For the tens of thousands of Houston residents who speak Spanish, Vietnamese, Hindi, or Arabic as their primary language, the barrier wasn't just the flood. It was the phone call itself.

Relief organizations running mass-care operations face a version of this problem every time a major disaster hits: thousands of simultaneous calls, dozens of languages, and no free human to triage each one. Translation lines exist, but they add minutes per call — minutes that matter when someone hasn't had water in two days.

We built RELAY because the problem isn't a lack of resources. It's a routing problem. The shelter exists. The water station is 1.2km away. The gap is connecting the person in crisis to the resource in real time, in their language, without a human bottleneck.

What it does

RELAY is a multilingual voice intake and routing agent for disaster-relief organizations. A caller in crisis speaks in their own language. RELAY:

Listens — transcribes the call in real time using Deepgram streaming STT, with support for multiple languages through automatic language detection
Understands — Claude reads the transcript and extracts structured triage data: number of people, injuries, location, and needs. If critical information is missing, RELAY asks a follow-up question in the caller's language rather than triaging on incomplete data
Matches — a deterministic routing engine filters real relief resources by availability and capability, ranks by distance, and produces a dispatch
Shows — a live map surfaces the caller's location, nearby resources, and a routed line to the best match, color-coded by priority (P1/P2/P3)

RELAY is not a 911 replacement. It is the multilingual intake layer for the moment the formal system is overwhelmed — when volume and language barriers are too high for humans to triage alone.

How we built it

The pipeline has four stages, each owned by one team member and built against locked data contracts so all four could develop in parallel:

The Ear (Deepgram) — a Node.js WebSocket server captures browser mic audio, downsamples it to 16-bit PCM at 16kHz, and streams it to Deepgram's streaming STT API. Deepgram handles multilingual transcription, interim results (live "typing" effect), and turn-taking via utterance_end events. A confidence threshold of $c \geq 0.6$ gates whether a transcript is sent forward or triggers a re-prompt. After $N = 2$ failed attempts, the system escalates to a human operator.

The Brain (Claude / Anthropic) — on each final utterance, the transcript is sent to Claude with a strict system prompt that returns only a structured Triage JSON object. Claude extracts people count, injuries, location (as stated by the caller, never device GPS), needs, and priority. Required slots are: location, people, needs. If any are missing, Claude sets nextQuestion and readyToRoute: false. The Brain never decides the match — it only extracts facts.

The Matchmaker — a pure TypeScript function:

$$\text{score}(r) = \text{distanceKm}(\text{caller}, r) \quad \text{subject to: } r.\text{has} \cap \text{needs} \neq \emptyset \text{ and } r.\text{availableCapacity} > 0$$

Candidates are filtered by capability and availability, then ranked by distance using the Haversine formula. The top match and two alternatives are returned as a Dispatch object. If nothing fits, matched is null — the function never fabricates a resource.

The Map (Mapbox GL JS) — a Next.js frontend renders the caller pin, resource pins, and an animated routing line to the matched resource. The triage card shows the P1/P2/P3 priority chip, dispatch text, and an escalation banner for "911" or "human" cases.

Stack: Next.js (App Router) + TypeScript + Tailwind · Node.js WebSocket server · Deepgram Streaming STT · Anthropic Claude API · Mapbox GL JS + Geocoding API · Redis (session memory)

Challenges we ran into

WebSocket audio streaming was the riskiest piece of plumbing in the project. Getting the mic audio from the browser into a format Deepgram accepts — 16-bit PCM, mono, 16kHz — required a custom AudioWorklet (pcm-worklet.js) to downsample from the browser's native sample rate. A subtle buffering bug caused dropped audio frames that looked like low-confidence transcripts, which took several hours to isolate.

Keeping Claude grounded took significant prompt engineering. Early versions of the Brain occasionally invented a location or returned prose mixed into the JSON. The fix was a combination of strict output schema enforcement in the prompt, a JSON parse validation layer, and a fallback triage that escalates to a human rather than guessing when the parse fails.

Parallel development across four people meant the data contracts in types/index.ts were locked before a line of UI or audio code was written. Even then, Person B and Person C had subtle type mismatches in the Triage.location shape and the Dispatch.matched nullability that only surfaced at the chain-it integration session. We resolved them by treating B's brain/src/types.ts as the source of truth for Triage and C's types/index.ts as the source of truth for Dispatch and Resource.

Git merge conflicts in package.json and package-lock.json appeared on nearly every branch merge because each sub-project (brain/, server/, root) added its own dependencies. The reliable fix was deleting package-lock.json and running npm install fresh after each merge rather than hand-resolving the lock file.

Live voice in a noisy room remains the demo's biggest risk. We built and rehearsed a full backup audio clip path through the same Deepgram pipeline so the fallback is indistinguishable from the live run.

Accomplishments that we're proud of

The Matchmaker is fully testable without AI. Because routing is deterministic TypeScript with no Claude dependency, we could prove the guardrails work — especially the matched: null case — with plain unit tests before a single line of audio or UI code existed. Judges can read the code and see exactly why it can never fabricate a resource.

The follow-up question loop. When the Brain detects a missing required slot, RELAY asks one calm, targeted follow-up in the caller's language rather than routing on incomplete information. This is the single beat that makes RELAY an agent rather than a transcriber — and it was deliberately scripted into the hero call so judges see it in the demo.

Four people, four parallel workstreams, zero blocking. By locking the data contracts on day one and having each person build against typed stubs rather than each other's actual code, we reached the chain-it integration session with four independently working stages. The integration bugs that surfaced were in the contracts, not the implementations — which is exactly what you want.

18 real Houston relief resources with real addresses, coordinates, and phone numbers — scoped to the Texas Medical Center cluster, downtown shelters, and Red Cross distribution points. The dataset is small enough to be fast and honest enough to be credible.

What we learned

The moat is in the routing, not the translation. Early in the project we spent energy on the voice pipeline. The thing judges respond to is the moment the map lights up — when chaos becomes a routed dispatch. The translation is what gets us in the door; the Matchmaker is the product.

Structured output discipline is harder than it looks. Getting an LLM to return only valid JSON, every time, with no markdown fences and no prose, requires more than one instruction in the prompt. It requires schema validation, a fallback path, and a test suite that runs against the real API — not a mock.

Escalation is a feature, not a failure. The system's willingness to say "I don't know enough to route this — routing to a human" is the thing that makes it trustworthy in a crisis context. We learned to treat every escalation path as a product decision, not an error case.

Git discipline under time pressure saves hours. The practice of one branch per person, merge into an integration branch first, test before merging to main — this slowed down the first merge and saved us on every subsequent one.

What's next for RELAY

Real capacity feeds. The current dataset is a realistic mock. FEMA's National Shelter System (NSS) publishes real shelter data during activations. Connecting RELAY to a live NSS feed would turn the mock dataset into ground truth.

Phone intake via Twilio. The demo uses a browser mic to keep the stack simple. In real deployment, callers need a phone number — a Twilio IVR that pipes audio into the same Deepgram WebSocket. This is the roadmap item closest to production-ready.

The urgency classifier. A TF-IDF + Logistic Regression classifier trained on synthetic labeled transcripts would replace Claude's priority field with a measured, auditable score. The story: "it never confuses a P1 for a P3," demonstrated with a confusion matrix. This is the "real ML, not a wrapper" credibility layer.

Human-operator dashboard. The current UI surfaces escalations but doesn't give a human operator a full intake view. A dispatcher panel — showing all active calls, their priority, and one-click confirmation — is the next UI milestone.