Walkie-Talkie (wt)
A walkie-talkie is a orchestration backend for agent harness to multiply and collaborate. We have evaluated Devin+Opus 4.8 to perform 20% better on swe-bench pro benchmark with wt cli installed to tackle hard tasks.
Inspiration
- Agents are great solo but have no clean way to talk to each other or coordinate work.
- We wanted a primitive where one agent can decompose a goal, dispatch orthogonal pieces to child agents, and integrate the result — a closed control loop.
What it does
- Allows the prime harness to create children harness to delegate decomposed tasks
- Allow prime agent to communicate with children harness to coordinate on solve hard problems(frontend and backend separation, parallel backtesting etc.)
- Prime agent can launch multiple children harness in clean git worktree environment to test solutions in isolation which improved Devin's performance on swe-bench pro
How we built it
- Rust workspace, single
wtbinary:wt-proto(wire/IPC types, no I/O),wt-core(identity, SQLite store, auth, transport, services),wt-daemon(accept loop, delivery worker, IPC, mDNS, harness supervisor),wt-cli(clap client). - Transport: iroh (QUIC + relays + DNS discovery), one Ed25519 identity per install used for both transport mTLS and token signing.
- Persistence: one SQLite DB (WAL) with a combined outbox/inbox message log; receiver-side dedup via composite PK; delivery worker resumes after restart.
- Orchestration: in-daemon message bus, per-child supervisor over Claude Code stream-json (
kill_on_droplifecycle), per-session worktree/new-folder workspaces.
Challenges we ran into
- Eval, we used harbor + devin for the eval, the eval on swe-bench pro consumed tremendous amount of tokens and took a lot of time, we have to use daytona for running the eval efficiently in parallel
- Testing, We have to debug through a multi-agent harness system where there is no single point of failure, sort and render multi-agent communication to make sure the agent-agent communication is successful
Accomplishments that we're proud of
- Crack swe-bench pro by 20% We improved Devin's performance on swe-bench pro by 20% as
wtis proved to improve the problem resolving capability of agent harness. - Real cross-internet exchange: bidirectional messaging between a laptop behind residential NAT and a cloud sandbox, hole-punched direct (no relay), macOS↔Linux.
- A single binary that is both daemon and CLI, with green unit + subprocess e2e tests and CI gates (build/test/clippy/fmt).
- A genuinely harness-agnostic orchestration model with a written, closed-loop operating discipline.
What we learned
- Multi-harness beats single harness on hard tasks
Log in or sign up for Devpost to join the conversation.