Inspiration
This project grew out of a design philosophy we've written about extensively: The Scalpel, Not the Hammer and Stateful MCP Architecture.
The core idea: the agent isn't smart — the database is smart. The agent is just hands. If you accept that, then the orchestration layer's job isn't to make agents smarter. It's to constrain their scope, externalize their state, and make every decision auditable.
Kilo's agent loop is powerful, but it has a blind spot: there's no mechanism to constrain what an agent is allowed to touch, verify that sub-agents stay within their parent's permissions, or audit exactly what happened after the fact.
We've all seen it — an agent asked to "fix the login bug" that decides to refactor your database schema, edit your CI config, and helpfully delete your test fixtures along the way. The problem isn't intelligence. The problem is scope. As we put it: Tokens = Scope x Iterations x Verbosity. The scalpel cuts all three.
Warm Agents applies that philosophy directly to agent orchestration. Define workspace boundaries explicitly. Externalize all state to durable storage. Make every tool call pass through a validation gate before it can execute. The agent proposes, the engine validates.
What it does
Warm Agents adds a deterministic orchestration layer behind a single --warm flag.
Scoped permissions — Every task declares which paths it can touch and which operations it can perform. Tool calls outside scope are blocked before execution, not after.
| Tool Call | Result | Why |
|---|---|---|
| read src/auth/login.ts | Allowed | Within declared scope |
| read /etc/passwd | BLOCKED | Path outside scope |
| bash rm -rf / | BLOCKED | "execute" operation not declared |
| webfetch https://evil.com | BLOCKED | "network" operation not declared |
Hierarchical sub-agent enforcement — When an orchestrator spawns a sub-agent, the child's scope is automatically inferred from its task description and validated as a subset of the parent's. A sub-agent scoped to src/auth/ cannot write to src/ui/, even though the parent could.
| Scope Level | Paths |
|---|---|
| Parent task | /projects/myapp/** |
| Sub-agent (narrowed) | /projects/myapp/src/auth/** |
The sub-agent can read src/auth/login.ts but gets blocked from writing src/ui/dashboard.ts — even though the parent has access to both.
Append-only audit trail — Every state transition, permission check, and scope decision is logged to JSONL. For any session, you can reconstruct exactly what happened — what was allowed, what was blocked, and why.
Warmness scoring — Agents accumulate familiarity scores based on files loaded, tools used, and task success. The scheduler routes new tasks to the most context-aware agent instead of cold-spawning every time.
Zero overhead when off — All integration uses lazy dynamic imports behind flag checks. Normal Kilo usage is completely unaffected.
How we built it
The system is 17 TypeScript modules in packages/opencode/src/warm/:
- State machines with Zod-validated schemas for agent lifecycles (cold, warming, warm, executing, cooling) and task lifecycles (pending, claimed, executing, postchecked, completed, failed, rolled_back)
- Invariant engine that pre-checks every tool call against the active task's declared scope — path matching, operation classification, and MCP tool allowlisting
- Hierarchical scope validation with regex-based path inference from task descriptions and parent-child containment checks
- Durable state store that externalizes all agent/task state to disk — process crashes don't lose orchestration context
- Integration bridge that provides safe access to warm context from existing code, with lazy initialization on first tool call
Only 4 existing files were modified, with about 160 lines of surgical, guarded additions. The rest is purely additive — a new warm/ directory that the existing session loop opts into incrementally.
154 tests passing across 12 test files with 349 assertions.
Challenges
Circular dependencies — Dynamic imports through barrel files caused undefined values at runtime. Solved by importing directly from specific modules instead of through index re-exports.
Path anchoring — When a parent task has an absolute scope and a sub-task mentions relative paths, the scope inference needs to anchor relative paths within the parent root. This required a two-pass approach: try raw paths first, then try anchored versions.
Integration without disruption — The hardest constraint was touching as little existing code as possible. Every integration point had to be fully opt-in, lazy-loaded, and invisible when --warm isn't active.
What we learned
The most interesting insight: scope enforcement is more useful than we expected. It's not just about preventing catastrophic mistakes — it's about making agent behavior auditable. When every tool call is logged with its permission decision, you can reconstruct an agent's entire session from the audit trail alone. That changes debugging from "what did the agent do?" to "here's exactly what it did, what it was allowed to do, and where it was stopped."
What's next
- Live testing with production-quality models (current demo exercises the full API without an LLM)
- Warmness-based routing in multi-agent scenarios
- CI/CD replay mode — re-execute audit trails to verify agent behavior in pipelines
- MCP tool schema drift detection and runtime routing fallback
Built With
- bun
- node.js
- typescript
- zod
Log in or sign up for Devpost to join the conversation.