OpenAgents: a Voice-First Public‑Benefit Copilot that Acts, Explains, and Maps Reality

Most assistants answer. OpenAgents mobilizes a team. When a person is stressed, time‑poor, or in motion—on a phone, in a car, mid‑shift—information without action is just noise. OpenAgents is built for those moments: it routes a request to the right specialist agents, runs them in parallel, synthesizes a detailed, well‑structured markdown plan for the screen, and speaks only a 2–3 sentence summary so the user can keep moving.

  • What makes it win: not a single “smart model”, but a reliable system that can choose, execute, verify, visualize, and communicate—with interactive maps and tool-backed facts.
  • Built to ship: Next.js UI + FastAPI backend + LiveKit realtime worker; production deployment documented for Heroku Enterprise.

Github Repo: link


The problem (and why it’s still unsolved)

People don’t need more text. They need a trustworthy sequence of next actions: where to go, what to say, what it costs, what’s open now, what to do first—delivered through the modality that fits the moment.

Traditional demos fail in the last mile:

  • Voice assistants talk too much and lose the listener.
  • Chatbots don’t ground results in tools, or can’t show them spatially.
  • One-model solutions can’t adapt to heterogeneous tasks (search, local, finance, geo, routing) with clear traceability.

OpenAgents turns “ask” into a coordinated response—and makes the coordination visible.


The breakthrough: Orchestration with dual-channel communication

OpenAgents is intentionally bimodal:

  • Chat (screen) gets the full answer: structured markdown, links, steps, caveats, and embedded interactive maps.
  • Voice (audio) gets a summary: “Here are the top options and the next step—details are on screen.”

OpenAgents is a production-ready multi-agent orchestration system with real-time voice capabilities, featuring a sophisticated agent coordination through MoE (Mixture of Experts) of specialist agents with capabilities ranging from local business information, real-time routing and map, web search, multi-modal rendering of images, videos, and interactive maps.


What we built (highly specific)

  • MoE Orchestrator: semantic expert selection, parallel execution, and LLM synthesis into detailed markdown (with trace visualization).
  • SmartRouter: capability routing and query decomposition for multi-part questions.
  • MCP integration: tool servers via Model Context Protocol (e.g., Yelp via MCP) with robust connection management.
  • Interactive Maps: map payloads auto-injected for location queries; rendered in the web UI as a visual map instead of raw JSON.
  • Real-time Voice Mode: LiveKit WebRTC loop with STT→orchestrator→TTS; optimized so spoken output stays digestible.

For implementation details, see:

  • docs/README.md (high-level overview)
  • docs/AGENT_SYSTEM_GUIDE.md (agent system architecture + mermaid flows)
  • docs/COMPLETE_TUTORIAL.md (end-to-end usage + Heroku Enterprise deployment)

A demo that feels like magic (but is just good engineering)

Scenario: “Tonight’s plan, with real constraints.”

Prompt:

  • “Find three calm, affordable dinner spots near me, show them on a map, and tell me the fastest route to the best one. Also: keep it vegetarian-friendly.”

What happens:

  • Yelp/YelpMCP finds options (tool-backed).
  • Geo + Map geocodes, computes center/route, and emits an interactive map payload.
  • Orchestrator returns screen-ready markdown (ranked options, tradeoffs, hours/price, links).
  • Voice speaks: “I found three great vegetarian-friendly spots nearby. The top pick is X for price and vibe, and the map on screen shows all options with a route. Want me to optimize for fastest drive or shortest walk?”

Why OpenAgents helps people in ways most projects can’t demonstrate

  • It’s not a single answer; it’s a pipeline: selection → parallel evidence gathering → synthesis → visualization → action.
  • It respects human attention: voice is concise; the screen is rich.
  • It grounds output in the world: maps, routes, and tool-backed local results.
  • It’s inspectable: the orchestration trace makes “how we got here” visible—critical for trust.

How we deploy it (so it can actually serve people)

OpenAgents is designed as a production-ready, 3-app Heroku architecture:

  • Frontend (Next.js)
  • API (FastAPI)
  • Realtime worker (LiveKit voice)

Deployment guide: docs/COMPLETE_TUTORIAL.md#deploying-to-heroku-enterprise


What’s next (the ambitious, human part)

We’ll ship “community copilots” that can be forked and safely configured:

  • Health access navigator: clinic discovery + eligibility checklists + map + call script.
  • Disaster relief runner: verified shelter/food info + route planning + concise voice guidance.
  • Worker support: quick policy lookup + step-by-step procedures + live summarization on the floor.

OpenAgents is the missing bridge between the brilliance of models and the blunt reality of life: turns, time, distance, cost, and the next action.

Github Repo: link

Built With

  • elevenlabs
  • fastapi
  • mcp
  • openai-agents
  • python
Share this project:

Updates