Project Helm

sponsor's tools backend
agent architecture
Control Plane

Inspiration

AI agents don’t usually fail loudly. They drift — slowly increasing retries, cost, and hallucination risk while continuing to ship outputs.
Inside large companies, custom control planes already monitor and govern this behavior. Outside those environments, teams are left to babysit agents manually or turn them off entirely.
Helm was inspired by the need to make that internal governance model explicit, portable, and usable by anyone running agents in production.

What it does

Helm is a runtime control plane for AI agents.

It continuously observes agent behavior, builds rolling baselines, detects drift, and intervenes autonomously while agents are running.
Helm pauses execution, tightens constraints, and attempts recovery before failures reach users.
All actions are recorded in a full incident timeline for audit and review.

How we built it

Helm is built as a single control-plane service with a clear separation between execution and governance.

Worker agents run real workloads using Claude and realistic synthetic data from Tonic
Runtime signals (tokens, retries, confidence, execution rate) are continuously collected
Observers analyze behavior and detect drift
Hierarchical governors apply constraints and escalation logic
A control-plane UI exposes live metrics and incident history in real time

The system runs on AWS and streams updates via WebSockets.

Challenges we ran into

The hardest challenge was defining intervention without overreaching.
We intentionally avoided permission-based kill switches or infrastructure rollbacks and focused on governing behavior at runtime.
Designing clear boundaries between agent execution and control logic — while keeping the system simple enough for a hackathon — required careful tradeoffs.

Accomplishments that we’re proud of

Demonstrating real agents doing real work, not simulations
Detecting behavioral drift using external signals rather than agent self-assessment
Implementing hierarchical, autonomous governance without human-in-the-loop
Making autonomous decisions observable and auditable through a control-plane UI

What we learned

Autonomy doesn’t fail because agents are “too smart.”
It fails because systems lack guardrails, baselines, and clear ownership of recovery.
Separating execution from governance — and making decisions based on behavior, not prompts — is what makes large-scale autonomy viable.

What’s next for Project Helm

The next phase of Helm focuses on turning the control plane into a production-grade platform for large-scale agent systems.

Remote Execution Plane Integration
Support secure telemetry ingestion from agents running in customer environments, with clear trust boundaries between execution and governance.
Policy-as-Code for Agent Behavior
Define formal, versioned policies for cost, quality, latency, and hallucination risk that can be evaluated automatically at runtime.
Cross-Agent Reasoning & Correlation
Detect systemic failures by correlating behavior across multiple agents and services, not just individual drift.
Predictive Intervention
Move from reactive drift detection to early-warning signals that predict runaway cost or quality degradation before thresholds are crossed.
Enterprise-Grade Governance
Add multi-tenant isolation, role-based access, and long-term incident retention for compliance and auditability.
Control-Plane APIs & Ecosystem
Expose Helm as an extensible platform with APIs and integrations so teams can plug governance directly into their agent workflows.