What if you could talk to your Agent orchestrator?
ElevenClaw is an AI agent orchestration platform — a dashboard where Claude sessions, tasks, documents, and agents all live. It already had a rich web UI, but I wanted to push it further: what if you could control the entire dashboard with your voice?
The idea started simple — plug in ElevenLabs' conversational AI and let it call a few APIs. But the real challenge was bridging the gap between a voice agent (which lives on ElevenLabs' servers) and a complex, stateful web application.
How it works:
The voice agent runs as an ElevenLabs conversation with typed client tools. When you say "show me my tasks," the browser-side tool handler calls the backend, which executes the action and broadcasts a socket event to navigate the UI. The key insight was making everything URL-driven — every module, every selected document, every open session is encoded in the URL, so navigation is just a URL push via websocket.
The hardest problems:
- ElevenLabs' LLM struggles with untyped schemas. A generic
execute_action(params: object)tool kept sending empty payloads. I had to switch to 11 dedicated tools, each with explicit typed parameters and enums, so the model always knows exactly what to send. - Internal API proxy. The dashboard sits behind Cloudflare Access, so the voice agent's tool calls couldn't hit the API directly. The solution: a proxy endpoint that calls
127.0.0.1internally, bypassing auth while keeping the external surface locked down. - Async completion callbacks. When you say "research X for me," the voice agent delegates to a Claude session that might run for minutes. I used
sendContextualUpdate()to push the result back into the live voice conversation — the CallPage subscribes to the delegated session's socket room, captures the final assistant message, and injects it as context so the voice agent can speak the result.
What I learned:
Voice interfaces need a fundamentally different tool design than chat interfaces. Chat agents handle ambiguity well — voice agents need guardrails. Typed enums, explicit parameters, and short descriptions matter more than detailed instructions. The LLM behind the voice has less reasoning budget, so you design for precision, not flexibility.
Built With
- claude
- elevenlabs
- node.js
- postgresql
- react
- redis
- tinyfish

Log in or sign up for Devpost to join the conversation.