Anvil

Anvil Enterprise - CI/CD for AI Agents

Inspiration

We built Anvil SDK - an open-source self-healing runtime for AI agents that solves "Tool Rot" (when APIs change and break agent tools). The SDK uses LLMs to generate and regenerate tool code on-the-fly.

But as we talked to teams using Anvil, we heard the same requests:

"How do I know which tools are healthy across all my agents?" "Can I require approval before Anvil deploys a fix to production?" "We need to run generated code in a secure sandbox, not on our servers." "I need audit trails for compliance."

These are enterprise problems. The SDK handles the execution engine - but enterprises need a control plane.

That's what we built this hackathon: Anvil Enterprise - the monitoring, governance, and security layer on top of Anvil SDK.

What it does

The Anvil SDK (Pre-existing)

The foundation that already existed:

JIT Code Generation - Define intent ("get stock prices"), Anvil generates the implementation using Claude/GPT-4
Self-Healing Core - Automatic failure detection and regeneration
Smart Parameterization - Tools are reusable, not hardcoded to specific values
Framework Adapters - Works with LangChain, CrewAI, AutoGen, OpenAI Agents
Local/Docker Sandbox - Basic code verification
CLI - anvil init, anvil doctor, anvil run, anvil verify

Anvil Enterprise (Built This Hackathon)

1. Daytona Sandbox Integration

Real integration with Daytona SDK for secure, isolated code execution
Every LLM-generated tool runs in a Daytona sandbox before deployment
Lazy initialization and connection pooling for performance

2. Policy-as-Code Engine

Human-in-the-loop approval gates for sensitive operations
Define policies like: python Policy( name="pii_approval", trigger="tool accesses user data", action="require_approval", approvers=["security-team"] )
Blocks execution until approved, with full audit trail

3. Code Validator

Security scanning of LLM-generated code
Risk assessment (low/medium/high/critical)
Detects dangerous patterns: eval(), exec(), shell injection, hardcoded secrets
Integrates with the policy engine for automatic escalation

4. Enterprise Dashboard

Real-time tool health monitoring across all agents
Visual repair pipeline status
Health metrics and trend analysis
One-click manual repairs

5. Enterprise CLI Commands

anvil login - Authenticate with API key or email
anvil logout - Clear session
anvil sync - Push/pull tool manifests to cloud
anvil status - View enterprise status and stats
anvil ingest - Scan and register legacy tools

6. Tool Ingestion Pipeline

Scan existing codebases for tools
Wrap legacy tools for Anvil management
Generate manifests for tracking

How we built it

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    ANVIL ENTERPRISE                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  Dashboard  │  │   Policy    │  │   Code Validator    │  │
│  │ (Streamlit) │  │   Engine    │  │  (Security Scan)    │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
│                            │                                 │
│  ┌─────────────────────────┴───────────────────────────┐    │
│  │              DAYTONA SANDBOX LAYER                   │    │
│  │         (Secure isolated code execution)             │    │
│  └──────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────┴───────────────────────────────┐
│                      ANVIL SDK (Core)                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │ JIT Generator│  │ Self-Healing │  │   Adapters   │       │
│  │   (Claude)   │  │     Core     │  │  (LC, Crew)  │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
└─────────────────────────────────────────────────────────────┘

Key Files Built This Hackathon

File	Purpose	Lines
`anvil/sandbox.py`	Daytona integration	+125
`anvil/enterprise/policy.py`	Policy-as-code engine	104
`anvil/enterprise/validator.py`	Security scanner	260
`anvil_dashboard.py`	Streamlit monitoring UI	795
`anvil/cli.py`	Enterprise CLI commands	+400
`demo/`	Automated demo setup	868

Tech Stack

Python 3.10+ - Core SDK
Claude API - Code generation
Daytona SDK - Secure sandbox execution
Streamlit - Enterprise dashboard
Click + Rich - Beautiful CLI

Challenges we ran into

1. Daytona SDK Integration The Daytona SDK is new, and documentation was limited. We had to dig into the source to understand the sandbox lifecycle. Implemented lazy loading and proper cleanup to avoid resource leaks.

2. Policy Engine Design Balancing security with usability. Too strict = developers hate it. Too loose = security theater. We landed on sensible defaults with full customization.

3. Security Scanning Without False Positives LLM-generated code often uses patterns that look suspicious but are fine in context. We built a context-aware validator that understands intent, not just patterns.

4. Dashboard Real-time Updates Streamlit's rerun model made live updates tricky. Used session state and auto-refresh to show repair pipeline progress without losing context.

Accomplishments that we're proud of

Real Daytona Integration - Not mocked. Every tool actually executes in Daytona.
Production-Ready Policy Engine - Approval workflows, audit trails, escalation paths.
Full Enterprise CLI - Login, sync, status - everything you'd expect from an enterprise tool.
End-to-End Demo - Automated script that shows the complete flow: working → broken → self-healed.
Zero Breaking Changes - Enterprise features layer on top of the SDK. Existing users aren't affected.

What we learned

Enterprise ≠ Complexity - Good enterprise features should be invisible until you need them.
Sandboxing is table stakes - For LLM-generated code, secure execution isn't optional. Daytona made this possible.
Developers want visibility - The dashboard was an afterthought but became the most-requested feature in testing.
Policy-as-code > Policy-as-config - Letting teams define policies in Python (not YAML) made adoption much easier.

What's next for Anvil

Immediate:

Cloud-hosted dashboard with multi-tenant support
Slack/PagerDuty alerting integration
More sandbox providers (E2B, Modal)

Roadmap:

Private tool registry for sharing "golden" tools across orgs
Predictive healing (detect API deprecations before they break tools)
SOC 2 compliance package
Multi-agent coordination layer

Try It

# Install
pip install git+https://github.com/Kart-ing/anvil_enterprise.git[anthropic]

# Setup (prompts for API keys including Daytona)
anvil init

# Check configuration
anvil doctor

# Run the demo
cd demo && ./run_demo.sh

Built With

claude-api
daytona
python
streamlit

Updates

Kartikey Pandey started this project — Jan 24, 2026 06:22 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.