Inspiration
AI agents don’t usually fail loudly. They drift — slowly increasing retries, cost, and hallucination risk while continuing to ship outputs.
Inside large companies, custom control planes already monitor and govern this behavior. Outside those environments, teams are left to babysit agents manually or turn them off entirely.
Helm was inspired by the need to make that internal governance model explicit, portable, and usable by anyone running agents in production.
What it does
Helm is a runtime control plane for AI agents.
It continuously observes agent behavior, builds rolling baselines, detects drift, and intervenes autonomously while agents are running.
Helm pauses execution, tightens constraints, and attempts recovery before failures reach users.
All actions are recorded in a full incident timeline for audit and review.
How we built it
Helm is built as a single control-plane service with a clear separation between execution and governance.
- Worker agents run real workloads using Claude and realistic synthetic data from Tonic
- Runtime signals (tokens, retries, confidence, execution rate) are continuously collected
- Observers analyze behavior and detect drift
- Hierarchical governors apply constraints and escalation logic
- A control-plane UI exposes live metrics and incident history in real time
The system runs on AWS and streams updates via WebSockets.
Challenges we ran into
The hardest challenge was defining intervention without overreaching.
We intentionally avoided permission-based kill switches or infrastructure rollbacks and focused on governing behavior at runtime.
Designing clear boundaries between agent execution and control logic — while keeping the system simple enough for a hackathon — required careful tradeoffs.
Accomplishments that we’re proud of
- Demonstrating real agents doing real work, not simulations
- Detecting behavioral drift using external signals rather than agent self-assessment
- Implementing hierarchical, autonomous governance without human-in-the-loop
- Making autonomous decisions observable and auditable through a control-plane UI
What we learned
Autonomy doesn’t fail because agents are “too smart.”
It fails because systems lack guardrails, baselines, and clear ownership of recovery.
Separating execution from governance — and making decisions based on behavior, not prompts — is what makes large-scale autonomy viable.
What’s next for Project Helm
The next phase of Helm focuses on turning the control plane into a production-grade platform for large-scale agent systems.
Remote Execution Plane Integration
Support secure telemetry ingestion from agents running in customer environments, with clear trust boundaries between execution and governance.Policy-as-Code for Agent Behavior
Define formal, versioned policies for cost, quality, latency, and hallucination risk that can be evaluated automatically at runtime.Cross-Agent Reasoning & Correlation
Detect systemic failures by correlating behavior across multiple agents and services, not just individual drift.Predictive Intervention
Move from reactive drift detection to early-warning signals that predict runaway cost or quality degradation before thresholds are crossed.Enterprise-Grade Governance
Add multi-tenant isolation, role-based access, and long-term incident retention for compliance and auditability.Control-Plane APIs & Ecosystem
Expose Helm as an extensible platform with APIs and integrations so teams can plug governance directly into their agent workflows.
The goal is not more autonomy — it’s controlled autonomy that scales safely.
Built With
- amazon-web-services
- anthropic
- fastapi
- html5/css3
- python
- restapi
- tonic
- websocket
Log in or sign up for Devpost to join the conversation.