Inspiration

AI agents write code confidently and commit it — without ever running it. The bottleneck in software development used to be writing code. Now it's verifying that the AI's code actually works before it lands in your repo.

I wanted to close that loop inside the agent itself, not in CI after the fact.

What we built

Duo Conductor is a GitLab Duo flow that runs AI-generated code in a kernel-isolated sandbox before committing anything.

The loop is simple:

  1. The agent reads a GitLab issue and writes the implementation
  2. The code executes in a gVisor sandbox — its own kernel, no shared host kernel
  3. If exit_code=0 → commit and open an MR
  4. If it fails → the agent sees stderr, fixes, and retries

The sandwich between writing and committing is the whole point.

How we built it

The sandbox runs on GKE with gVisor nodes. We used kubernetes-sigs/agent-sandbox CRDs — a Kubernetes SIG project for AI agent runtimes. SandboxWarmPool keeps two pods pre-warmed so sandbox startup is under 100ms. Without that, cold-starting a gVisor pod takes 3–5 seconds and the agent loop feels sluggish.

Network isolation runs at two levels: a Kubernetes NetworkPolicy and VPC Firewall rules on the node tag. The firewall is enforced at the hypervisor — you can't bypass it from inside the sandbox. GitLab's built-in SRT uses bubblewrap (process-level). gVisor uses a separate kernel. That's the difference between "probably isolated" and "actually isolated."

The MCP server is a small Go binary that bridges GitLab Duo to the sandbox cluster over SSE. GitLab Duo sends execute_code(code, language, network) — the server claims a warm pod, runs the code, and streams back stdout/stderr.

Challenges

MCP SSE transport — duo-cli expects the server to emit an endpoint event on connection with the public URL clients should POST to. Setting PUBLIC_URL wrong and the client and server talk past each other silently.

Single vs. multi-agent flows — I designed a planner→coder pipeline first. Multi-step flows stop after the first agent completes in the ambient environment. This isn't documented anywhere - discovered it through trial and error. Single-agent with a structured system prompt works better in practice.

File writes — the agent's runWriteFile writes to the runner's local filesystem, not to GitLab. You need an explicit create_commit API call to actually push code to the repo. Obvious in retrospect, not obvious at 2am.

IP addresses in SRT allowlistallowed_domains doesn't accept raw IPs. Used nip.io as a free DNS layer over the GKE load balancer IP.

What we learned

kubernetes-sigs/agent-sandbox is genuinely production-ready. WarmPool works, gVisor isolation is solid, and the whole thing runs on spot nodes for ~$0.06/hr.

GitLab Duo Agent Platform is more capable than the docs suggest and rougher than a GA product. Single-agent flows with clear tool contracts are reliable. Multi-agent orchestration in ambient mode needs more work.

The real insight: AI-generated code is untrusted code by definition. Running it before committing isn't a nice-to-have — it's the only way to give the agent actual feedback instead of hope.

Built With

Share this project:

Updates