Walkie-Talkie (wt)

A walkie-talkie is a orchestration backend for agent harness to multiply and collaborate. We have evaluated Devin+Opus 4.8 to perform 20% better on swe-bench pro benchmark with wt cli installed to tackle hard tasks.

Inspiration

  • Agents are great solo but have no clean way to talk to each other or coordinate work.
  • We wanted a primitive where one agent can decompose a goal, dispatch orthogonal pieces to child agents, and integrate the result — a closed control loop.

What it does

  • Allows the prime harness to create children harness to delegate decomposed tasks
  • Allow prime agent to communicate with children harness to coordinate on solve hard problems(frontend and backend separation, parallel backtesting etc.)
  • Prime agent can launch multiple children harness in clean git worktree environment to test solutions in isolation which improved Devin's performance on swe-bench pro

How we built it

  • Rust workspace, single wt binary: wt-proto (wire/IPC types, no I/O), wt-core (identity, SQLite store, auth, transport, services), wt-daemon (accept loop, delivery worker, IPC, mDNS, harness supervisor), wt-cli (clap client).
  • Transport: iroh (QUIC + relays + DNS discovery), one Ed25519 identity per install used for both transport mTLS and token signing.
  • Persistence: one SQLite DB (WAL) with a combined outbox/inbox message log; receiver-side dedup via composite PK; delivery worker resumes after restart.
  • Orchestration: in-daemon message bus, per-child supervisor over Claude Code stream-json (kill_on_drop lifecycle), per-session worktree/new-folder workspaces.

Challenges we ran into

  • Eval, we used harbor + devin for the eval, the eval on swe-bench pro consumed tremendous amount of tokens and took a lot of time, we have to use daytona for running the eval efficiently in parallel
  • Testing, We have to debug through a multi-agent harness system where there is no single point of failure, sort and render multi-agent communication to make sure the agent-agent communication is successful

Accomplishments that we're proud of

  • Crack swe-bench pro by 20% We improved Devin's performance on swe-bench pro by 20% as wt is proved to improve the problem resolving capability of agent harness.
  • Real cross-internet exchange: bidirectional messaging between a laptop behind residential NAT and a cloud sandbox, hole-punched direct (no relay), macOS↔Linux.
  • A single binary that is both daemon and CLI, with green unit + subprocess e2e tests and CI gates (build/test/clippy/fmt).
  • A genuinely harness-agnostic orchestration model with a written, closed-loop operating discipline.

What we learned

  • Multi-harness beats single harness on hard tasks
Share this project:

Updates