SkillEvolve
Self-improving agent skills through execution feedback
The Problem
Agent Skills provide agents with specialized knowledge and capabilities—but they're static. Once created, a skill doesn't improve from experience. An agent using a frontend design skill will make the same mistakes on day 1 and day 100.
Traditional approaches require manual updates or expensive retraining. But what if skills could evolve on-the-job, learning from natural execution feedback? Like how an intern can rise up to be the CEO so long as they learn fast and persist.
What We Built
SkillEvolve implements continual learning for agent skills using principles from Agentic Context Engineering (ACE). Instead of fixed instructions, skills become living knowledge bases that:
- Accumulate patterns from successful executions
- Refine through feedback (user preferences, validation results, errors)
- Grow incrementally without catastrophic forgetting
- Remain interpretable (human-readable markdown, not black-box weights)
Think of it as on-the-job training for AI agents—learning from doing, not just from initial instructions.
How It Works
The core evolution loop:
- Generate - Agent produces multiple candidate solutions using current skill
- Execute - Deploy candidates to sandboxes (via Daytona) for validation
- Reflect - Analyze what worked/failed and extract learnings
- Curate - Update skill with new patterns, refinements, edge cases
Unlike traditional prompt optimization (which compresses context), SkillEvolve preserves detailed domain knowledge as structured, retrievable patterns—directly inspired by ACE's approach to preventing LLM brevity bias and context collapse.
The Demo
We evolve Anthropic's frontend design skill through building a personal budgeting app.
User: A frontend engineer who wants to improve their app's design
Goal: Transform generic AI-generated UI into something unique and aesthetically pleasing
Interface: Claude Code terminal with SkillEvolve + evolved frontend-design skill
Evolution arc (2-3 iterations):
- Iteration 0: Basic, generic layout (classic "AI slop" app)
- Iteration 1: Learns user preferences from first round of feedback
- Iteration 2: Produces polished, unique design incorporating accumulated patterns
Each iteration, the agent generates 3 design candidates, spins them up in Daytona sandboxes, user picks their favorite and provides feedback. The skill evolves with each choice.
Tech Stack
- Claude Agent SDK - Agent orchestration
- Daytona - Sandbox environments for generated code
- Sentry - Error tracking and logs
- CodeRabbit - Automated code review
- Galileo - LLM observability and evaluations
Why This Matters
Most AI systems are frozen at deployment. SkillEvolve explores a different paradigm: agents that improve through use, accumulating institutional knowledge and adapting to user preferences—without manual retraining or fine-tuning.
Skills as living documentation. Context as capability.
Built With
- anthropic
- claude
- coderabbit
- daytona
- intel-galileo
- nextjs
- sentry
- typescript

Log in or sign up for Devpost to join the conversation.