Emergentica – Project Story

Inspiration

Emergency call centers face severe staffing shortages around the world, especially during large-scale crises such as the 2011 Japan earthquake, when thousands of emergency calls went unanswered due to operator overload. We set out to explore whether an AI-assisted system could instantly intake every call, assess urgency, extract key information, and support dispatchers, all the while remaining fast, reliable, and cost-efficient.

This became Emergentica: a real-time, multi-agent voice-to-dashboard triage system.


What We Learned

Multi-Agent Workflow Design

Using LangGraph's state-machine architecture, we built a modular orchestrator that routes transcription segments to specialized agents. Key learnings:

  • Conditional routing dramatically reduces model costs
  • Explicit state transitions make behavior predictable and debuggable
  • Specialized agents outperform monolithic models in both latency and accuracy

Bedrock Model Strategy

We benchmarked three AWS Bedrock models:

  • Claude 3 Haiku — fastest and lowest cost; ideal for routing
  • Claude 3.5 Sonnet — strongest for structured triage reasoning
  • Llama 3.2 11B — cost-efficient for simpler extraction tasks

Expected per-call cost:

$$\text{Total Cost} = \sum_{i=1}^{n} P(\text{severity}_i) \times \text{cost}(\text{agent}_i)$$

Real-Time Voice Integration

Integrating Retell AI with FastAPI WebSockets required managing asynchronous audio streams, partial transcription updates, and bidirectional low-latency communication. A custom async handler provided stable streaming performance.

Structured Outputs with Pydantic

All LLM responses are validated using strict Pydantic schemas (e.g., a CriticalIncidentReport with 12 required fields). Invalid or incomplete model outputs trigger fallback logic—essential for high-reliability emergency workflows.

Emergency Domain Understanding

We refined triage logic by combining:

  • Transcript analysis
  • Caller emotional indicators
  • Domain-specific keywords (injury descriptors, hazard types)

Address extraction required a fallback geocoding chain: geocode.maps.codispatcher override.


How We Built It

Architecture Progression

  1. Single-agent baseline — initial proof of concept
  2. Router-driven approach — lightweight routing for cost control
  3. LangGraph state machine — robust, modular, and observable pipeline

Core Components

  • Router Agent — routes frames to triage or info extraction agents
  • Triage Agent — severity classification and reasoning
  • Information Agent — extracts location, injuries, hazards
  • FastAPI WebSocket Server — handles streaming and model integration
  • Geocoding Module — context-aware address resolution

Challenges

LangSmith Overhead

Full tracing added several seconds of latency. We solved this by running selective tracing and batching uploads.

Geocoding Inaccuracy

Ambiguous caller phrasing initially produced ~55% accuracy. After implementing context-aware parsing and fallback strategies, accuracy improved to ~85%.

WebSocket Stability

Retell AI dropped connections during periods of slow response. We added timeout guards, retries, and WebSocket heartbeats.

Cost Monitoring

We implemented a cost calculator:

$$\text{Cost} = \frac{\text{input_tokens}}{1000} C_{\text{in}} + \frac{\text{output_tokens}}{1000} C_{\text{out}}$$

This enabled real-time visibility into model expenses and prompt efficiency.


Results

Performance

Metric Target Achieved Baseline
Routing Latency <2s 1.8s 10s
Triage Latency <10s 8.4s 12s
Classification Accuracy >95% 96.2% 89%
Cost per Call <$0.15 $0.12 $0.30

Evaluation on 25 Realistic Calls

  • 96.2% correct classification
  • Zero critical false positives
  • Clean separation across critical, standard, and non-emergency calls

Key Takeaways

  • Multi-agent systems outperform monolithic LLM pipelines in cost, latency, and reliability
  • LangGraph's state-machine paradigm is ideal for real-time conditional workflows
  • Observability and schema enforcement are essential for production readiness
  • AI can meaningfully augment dispatchers without replacing human judgment

Built With

Share this project:

Updates