Inspiration

I use multiple coding agents in the same project every day. A task starts in Cursor, continues in Codex or opencode while I'm on the train, gets reviewed by Greptile on the PR, and then comes back to Cursor for the fix-up pass. Every switch costs me ten minutes of re-explaining the goal, the constraints I'd already tried, the tests that were red, and the design decisions I'd made and didn't want the next agent to relitigate.

The framing I kept coming back to:

Agents don't only need more context. They need transferable task state with memory, retrieval, and review.

AGENTS.md already covers repo-level rules (use TypeScript, run tests, don't edit generated files). Nothing covered the live, per-task layer: "we're adding the failed-payment webhook, do not change the schema, the org-scope check is still missing, here are the three files you've already touched." Relay is that missing layer.

What it does

Relay is a tool-agnostic handoff layer for coding agents. It captures the live state of a task as a compact capsule stored in .relay/, then renders target-specific handoffs so the same task survives a switch from Cursor to opencode to Codex to Claude Code, picks up Greptile's review feedback from the PR, and comes back with everything still intact.

The capsule holds the goal, constraints, decisions, assumptions, touched files, verification state, open questions, next steps, and review findings. Everything else lives in an append-only event log next to it.

How I built it

Node + TypeScript CLI, single relay binary, pnpm + vitest. Three layers, built in three phases:

Phase A — Reliable CLI path.

  • Capsule schema (src/core/schema.ts) and storage (src/core/storage.ts) writing to .relay/capsules/<id>.json.
  • Git snapshot in src/core/git.ts — diff stats, branch, touched files, current verification state.
  • Adapter registry in src/adapters/ so each target (cursor, opencode, markdown) renders the same neutral capsule into its own framing.

Phase B — Nia retrieval.

  • A NiaClient interface with two implementations: a real REST client against apigcp.trynia.ai/v2, and an in-process mock that always works offline.
  • Factory in src/integrations/nia/factory.ts resolves which to use from RELAY_NIA_MODE, with a graceful fallback to mock if the key is missing so the demo never hard-fails.
  • relay handoff --hydrate injects retrieved snippets directly into the generated Markdown.
  • relay close indexes the closed capsule back into Nia as a local_folder source so future tasks can retrieve prior decisions.

Phase C — Greptile review loop.

  • Three modes (mock, gh, real). The gh mode was the one that ended up mattering: it reads Greptile's PR review comments through the gh CLI, so it works for anyone who has the GitHub app installed without needing a Greptile API key.
  • relay review import dedupes findings by stable external_id (gh:rc:<id> / gh:rv:<id>) so reimports don't multiply.

Challenges I ran into

Greptile's PR comments are HTML soup. The bot embeds severity as <img alt="P1"> badges, wraps everything in <picture>/<a> tags, and tacks on "Fix in Cursor" links with massive percent-encoded query strings and suggestion code fences. I wrote summarizeBody in src/integrations/greptile/gh.ts to strip all of that down to the prose an agent can actually read.

Severity detection was a trap. Greptile encodes severity two ways: P-grade badges and word form ("critical"/"high"/"medium"/"low"). Body prose often contains "low" or "high" in unrelated senses, so a naive regex misclassifies almost everything. I had to prioritize the badge alt-text and standalone P-token first, then fall back to word matching.

Tool-agnostic without writing N integrations. Building a Cursor plugin and an opencode plugin and a Codex plugin would have eaten the whole hackathon. I chose Markdown handoffs with target-specific framing instead. Every agent reads Markdown natively, and the adapter layer is small enough that adding Claude Code or Copilot CLI is a ~30-line file.

Capsule size vs. completeness. A handoff is only useful if the next agent will actually read it, which means it has to stay compact. But I also wanted full history. The split — small typed capsule + separate append-only events/<id>.jsonl — let me keep both without either one rotting.

Offline-first by default. Conference Wi-Fi is unreliable and demo accounts get rate-limited. Every external integration (Nia, Greptile) has a mock client that's always available, and the factories fall back to mock with a warning rather than crashing if a key is missing.

What I learned

  • Most "give the agent more context" complaints are really "give the agent more task state." Repo-level guidance isn't the gap; per-task continuity is.
  • When an AI product has a GitHub app, gh api is often a better integration surface than chasing a REST key. It works for every user of that app, immediately.
  • A plain Markdown file is a shockingly powerful interop format between agents. Every harness reads it without translation.
  • A small typed snapshot plus an append-only event log beats trying to keep one growing JSON object internally consistent.
  • Designing for the offline demo first made the online demo more reliable too.

What's next

  • A Cursor sidebar extension for capsule editing instead of the CLI.
  • Capsule-aware retrieval over closed capsules so a new task can be seeded with relevant prior decisions automatically.
  • A real Greptile REST path when the API key flow opens up, sitting alongside the gh adapter.

Built With

Share this project:

Updates