Inspiration

I’ve lived in Toronto long enough to have a constant background feed of “what if” scenarios in my head: PATH flooding, DVP pileups in a blizzard, Billy Bishop runway issues, downtown hospital overloads during a storm. At the same time, as a computer science student obsessed with security, systems, and multi-agent AI, I kept noticing a gap:

We use AI to autocomplete emails and summarize PDFs, but we rarely use it to explain and stress-test high-stakes decisions in a way a human commander—or even the public—can actually understand.

Crisis Coordinator grew out of that tension. I wanted a safe, simulated environment where I could explore questions like:

  • How would multiple AI “roles” coordinate during a city-scale emergency?
  • Can they make their reasoning transparent instead of being a black box?
  • What would a human-centered mission control UI for that actually look like?

The result is a decision-support prototype and teaching tool—not a real dispatcher—that lets people play through Toronto-flavoured disaster scenarios and see how multi-agent AI could help (and where its limits are).


What it does

Crisis Coordinator is a simulated emergency operations center for a fictional Toronto.

On a small city grid, it runs scenario-based simulations like:

  • PATH network flooding during rush hour
  • A DVP whiteout with chain-reaction crashes
  • A Billy Bishop airport incident with limited access routes

At each time step (“tick”):

  • New incidents appear (fires, medical emergencies, traffic accidents, floods)
  • Units move (ambulances, fire trucks, police cars)
  • Incidents progress: [ \text{reported} \rightarrow \text{assigned} \rightarrow \text{en~route} \rightarrow \text{on~scene} \rightarrow \text{resolved} ]
  • Hospitals and shelters with limited capacity fill up or overload

On top of this, a set of deterministic reasoning agents operate on the state:

  • Triage Agent – ranks incidents by severity + time waiting
  • Resource Agent – assigns available units to the highest-priority calls
  • Logistics Agent – routes people to hospitals/shelters based on distance and capacity
  • Medical Agent – monitors hospital load and raises overload warnings
  • Communications Agent – picks which incidents/areas to highlight in public messaging
  • Commander – synthesizes everyone’s outputs

These agents produce structured decisions (prioritized incident lists, unit assignments, routing choices, overload flags), which then go through an LLM layer (Claude) to generate:

  • Per-agent rationales (“why we chose Incident A over B”)
  • A Commander Summary of what’s happening and what matters now
  • Public-facing messaging: a short SMS-style alert plus a calmer, slightly longer advisory

The main UI is a three-column mission control console:

  • Left: stylized city map + incident list with filters and severity badges
  • Center: agent cards showing their latest decisions, each with a “show reasoning” toggle
  • Right: tabs for Commander Summary, Public Alert, and an Event Log timeline

How we built it

We built Crisis Coordinator as a Next.js 14 + TypeScript app with a clear separation between simulation, agents, and UI.

Core stack:

  • Frontend: Next.js 14 (App Router), React, TypeScript
  • Styling: Tailwind CSS with a custom “control-room” theme (dark, focused, not generic admin UI)
  • LLM layer: Anthropic Claude via a small client in lib/utils/llmClient.ts
  • Data / infra: MapTiler key for maps; optional Supabase integration for scenarios/storage
  • Environment: .env.local holds ANTHROPIC_API_KEY, MapTiler key, and Supabase keys if used

Simulation engine

The heart of the system is a TypeScript simulation engine in lib/simulation/SimulationEngine.ts. It maintains a world state ( S_t ) containing:

  • Incidents and their attributes
  • Units and their locations/status
  • Hospitals/shelters and their capacities/loads
  • An event log

Each tick applies a state transition:

[ S_{t+1} = f(S_t, A_t) ]

where ( A_t ) are agent decisions (which units to dispatch, who goes to which hospital, etc.). Scenario scripts in lib/scenarios/*.ts define:

  • Toronto-specific base maps and points of interest
  • Timed incident injections (e.g., “at T+5, spawn a second crash near the DVP”)
  • Initial resources, hospital capacities, and any scripted stressors

Multi-agent layer

On top of ( S_t ), we run deterministic “role” agents:

  • Each agent receives a filtered view of the world state relevant to its job.
  • They output structured JSON: ranked incident IDs, assignments, hospital targets, overload flags, etc.
  • These outputs are then optionally passed to the LLM layer using role-specific prompts in lib/agents/agentPrompts.ts.

The API route /api/agents orchestrates this:

  1. Read current simulation state
  2. Run relevant agents (some every tick, some on state changes)
  3. Call Claude as needed (unless we’re in Demo Mode)
  4. Return decisions + explanations back to the console UI

Demo Mode

Because this is built to be demo-able (including in hackathon settings), we added a Demo Mode:

  • Uses cached / rule-based decisions instead of live LLM calls
  • Throttles heavy agent runs
  • Reduces context size and token budgets when Claude is enabled

This gives two modes:

  • Real-time LLM mode for depth and nuance
  • Instant demo mode for reliability under time and rate-limit pressure

Challenges we ran into

State complexity and synchronization

Even a small city simulation gets complicated fast. Early on, we had issues like:

  • Units “teleporting” because updates weren’t applied in a single, authoritative place
  • Incidents that were assigned but never resolved because a status transition was skipped
  • UI occasionally showing stale data if multiple sources tried to update the same state

We fixed this by:

  • Centralizing state transitions in the simulation engine
  • Treating each tick as a pure function ( S_{t+1} = f(S_t, A_t) )
  • Having the UI only subscribe to the authoritative state instead of mutating it

LLM latency, cost, and reliability

Multiple agents × multiple ticks × multiple scenarios can easily explode into:

  • Latency spikes (waiting several seconds per tick)
  • Rate limits during demos
  • Higher costs than we want for a teaching/demonstration tool

To handle this, we:

  • Added a Demo Mode with deterministic decisions and minimal or stubbed LLM calls
  • Throttled agent runs so not every tick triggers full multi-agent reasoning
  • Shrunk prompts and enforced strict, compact JSON schemas to reduce tokens

Designing explainable agents, not just “smart” ones

It’s one thing to compute:

Assign Ambulance 3 to Incident 12.

It’s another to generate a faithful explanation like:

“Ambulance 3 was sent to Incident 12 instead of Incident 8 because severity is higher and it has been waiting 7 minutes longer, while still within a reasonable travel time.”

We had to:

  • Design structured outputs so we knew which factors (severity, wait time, distance, capacity) drove decisions
  • Write prompts that forced Claude to base its explanation on those actual factors, not hallucinate extra ones

Avoiding a generic CRUD dashboard

It was very easy to slip into “here’s a table, here’s a side panel.” That didn’t feel like an emergency operations center. We spent time iterating on:

  • A three-column console with clear mental models:
    • Left: where and what is happening
    • Center: what the agents propose and why
    • Right: what the commander and public see
  • A visual language that felt like a calm mission control, not a sales CRM
  • Making sure explanations were short, scannable bullet lists, not giant language-model paragraphs

Accomplishments that we're proud of

  • End-to-end multi-agent pipeline: From simulation state → agent decisions → LLM explanations → UI, all wired up and inspectable.
  • Human-centered explainability: Every decision is traceable. You can click into an agent card and see why it prioritized a call, or why a hospital overload warning was raised.
  • Toronto-specific scenarios: PATH flooding, DVP storms, Billy Bishop issues, downtown hospital stress—grounded enough to feel real, but still safely fictional.
  • Mission control UX, not a template: A cohesive console that feels like a mini emergency operations center instead of “yet another admin dashboard.”
  • Demo-friendly design: The Demo Mode + deterministic fallbacks mean the project is actually usable under hackathon conditions and live demos, not just in ideal offline tests.

What we learned

This project reinforced a few big ideas for me:

  • Multi-agent AI is as much API design as it is ML.
    Clear roles, contracts, and schemas matter just as much as clever prompts.

  • Explainability requires planning, not just post-hoc text.
    If you don’t design your decisions around a few key factors (e.g., severity, wait time, distance, capacity), it’s very hard to later ask an LLM to explain them faithfully.

  • Realism vs clarity is a constant trade-off.
    We deliberately simplified some aspects of real emergency response to keep the system understandable and teachable.

  • Designing for demos is a real engineering constraint.
    Handling latency, rate limits, and failure modes early is what made this a practical, shippable prototype rather than a fragile toy.

  • Metrics change the conversation.
    Once we added response-time and capacity metrics, discussions shifted from “this looks cool” to “this policy reduces average response time but overloads specific hospitals—do we accept that trade-off?”


What's next for Crisis Coordinator

There are a few directions we’re excited about:

  • Richer scenarios and stress tests
    More layered scenarios (e.g., a storm plus a secondary unrelated incident) to really test triage and resource allocation policies.

  • Human-in-the-loop controls
    Letting a human commander tweak agent priorities (e.g., “favour pediatric cases” or “protect critical infrastructure”) and watch how the agents adapt downstream.

  • Uncertainty and partial information
    Introducing noisy reports and delayed updates so agents (and humans) have to reason under uncertainty instead of perfect state.

  • Deeper analytics and policy comparison
    Being able to run the same scenario under different triage/dispatch policies and compare metrics: [ \Delta \text{avg_response_time},\quad \Delta \text{overload_probability},\quad \Delta \text{unresolved_incidents} ]

  • Educational tooling
    Packaging Crisis Coordinator as a tabletop exercise platform for classes in emergency management, public policy, or AI ethics, where students can run scenarios and debate the agents’ decisions.

Overall, Crisis Coordinator is just a starting point—but it already shows how multi-agent AI, when wrapped in the right constraints and UI, can help humans see trade-offs more clearly in some of the hardest situations we face.

Built With

Share this project:

Updates