Anvil Enterprise - CI/CD for AI Agents
Inspiration
We built Anvil SDK - an open-source self-healing runtime for AI agents that solves "Tool Rot" (when APIs change and break agent tools). The SDK uses LLMs to generate and regenerate tool code on-the-fly.
But as we talked to teams using Anvil, we heard the same requests:
"How do I know which tools are healthy across all my agents?" "Can I require approval before Anvil deploys a fix to production?" "We need to run generated code in a secure sandbox, not on our servers." "I need audit trails for compliance."
These are enterprise problems. The SDK handles the execution engine - but enterprises need a control plane.
That's what we built this hackathon: Anvil Enterprise - the monitoring, governance, and security layer on top of Anvil SDK.
What it does
The Anvil SDK (Pre-existing)
The foundation that already existed:
- JIT Code Generation - Define intent (
"get stock prices"), Anvil generates the implementation using Claude/GPT-4 - Self-Healing Core - Automatic failure detection and regeneration
- Smart Parameterization - Tools are reusable, not hardcoded to specific values
- Framework Adapters - Works with LangChain, CrewAI, AutoGen, OpenAI Agents
- Local/Docker Sandbox - Basic code verification
- CLI -
anvil init,anvil doctor,anvil run,anvil verify
Anvil Enterprise (Built This Hackathon)
1. Daytona Sandbox Integration
- Real integration with Daytona SDK for secure, isolated code execution
- Every LLM-generated tool runs in a Daytona sandbox before deployment
- Lazy initialization and connection pooling for performance
2. Policy-as-Code Engine
- Human-in-the-loop approval gates for sensitive operations
- Define policies like:
python Policy( name="pii_approval", trigger="tool accesses user data", action="require_approval", approvers=["security-team"] ) - Blocks execution until approved, with full audit trail
3. Code Validator
- Security scanning of LLM-generated code
- Risk assessment (low/medium/high/critical)
- Detects dangerous patterns:
eval(),exec(), shell injection, hardcoded secrets - Integrates with the policy engine for automatic escalation
4. Enterprise Dashboard
- Real-time tool health monitoring across all agents
- Visual repair pipeline status
- Health metrics and trend analysis
- One-click manual repairs
5. Enterprise CLI Commands
anvil login- Authenticate with API key or emailanvil logout- Clear sessionanvil sync- Push/pull tool manifests to cloudanvil status- View enterprise status and statsanvil ingest- Scan and register legacy tools
6. Tool Ingestion Pipeline
- Scan existing codebases for tools
- Wrap legacy tools for Anvil management
- Generate manifests for tracking
How we built it
Architecture
┌─────────────────────────────────────────────────────────────┐
│ ANVIL ENTERPRISE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Dashboard │ │ Policy │ │ Code Validator │ │
│ │ (Streamlit) │ │ Engine │ │ (Security Scan) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ┌─────────────────────────┴───────────────────────────┐ │
│ │ DAYTONA SANDBOX LAYER │ │
│ │ (Secure isolated code execution) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────┴───────────────────────────────┐
│ ANVIL SDK (Core) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ JIT Generator│ │ Self-Healing │ │ Adapters │ │
│ │ (Claude) │ │ Core │ │ (LC, Crew) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
Key Files Built This Hackathon
| File | Purpose | Lines |
|---|---|---|
anvil/sandbox.py |
Daytona integration | +125 |
anvil/enterprise/policy.py |
Policy-as-code engine | 104 |
anvil/enterprise/validator.py |
Security scanner | 260 |
anvil_dashboard.py |
Streamlit monitoring UI | 795 |
anvil/cli.py |
Enterprise CLI commands | +400 |
demo/ |
Automated demo setup | 868 |
Tech Stack
- Python 3.10+ - Core SDK
- Claude API - Code generation
- Daytona SDK - Secure sandbox execution
- Streamlit - Enterprise dashboard
- Click + Rich - Beautiful CLI
Challenges we ran into
1. Daytona SDK Integration The Daytona SDK is new, and documentation was limited. We had to dig into the source to understand the sandbox lifecycle. Implemented lazy loading and proper cleanup to avoid resource leaks.
2. Policy Engine Design Balancing security with usability. Too strict = developers hate it. Too loose = security theater. We landed on sensible defaults with full customization.
3. Security Scanning Without False Positives LLM-generated code often uses patterns that look suspicious but are fine in context. We built a context-aware validator that understands intent, not just patterns.
4. Dashboard Real-time Updates Streamlit's rerun model made live updates tricky. Used session state and auto-refresh to show repair pipeline progress without losing context.
Accomplishments that we're proud of
Real Daytona Integration - Not mocked. Every tool actually executes in Daytona.
Production-Ready Policy Engine - Approval workflows, audit trails, escalation paths.
Full Enterprise CLI - Login, sync, status - everything you'd expect from an enterprise tool.
End-to-End Demo - Automated script that shows the complete flow: working → broken → self-healed.
Zero Breaking Changes - Enterprise features layer on top of the SDK. Existing users aren't affected.
What we learned
Enterprise ≠ Complexity - Good enterprise features should be invisible until you need them.
Sandboxing is table stakes - For LLM-generated code, secure execution isn't optional. Daytona made this possible.
Developers want visibility - The dashboard was an afterthought but became the most-requested feature in testing.
Policy-as-code > Policy-as-config - Letting teams define policies in Python (not YAML) made adoption much easier.
What's next for Anvil
Immediate:
- Cloud-hosted dashboard with multi-tenant support
- Slack/PagerDuty alerting integration
- More sandbox providers (E2B, Modal)
Roadmap:
- Private tool registry for sharing "golden" tools across orgs
- Predictive healing (detect API deprecations before they break tools)
- SOC 2 compliance package
- Multi-agent coordination layer
Try It
# Install
pip install git+https://github.com/Kart-ing/anvil_enterprise.git[anthropic]
# Setup (prompts for API keys including Daytona)
anvil init
# Check configuration
anvil doctor
# Run the demo
cd demo && ./run_demo.sh
Built With
- claude-api
- daytona
- python
- streamlit
Log in or sign up for Devpost to join the conversation.