Inspiration
Agents has been really cool in what they do but it takes so much time and compute to train them to make them better at one task.
What it does
We introduce a new framework inspired from genetic and evolutionary algorithm to update the agent without actually tweeting the weights of the model.
How we built it
We have 4 main agents:
- Solver: SLM (Phi-4mini) that solves the given task (in our case math problems)
- Critic: Gemini (2.5 pro) that critics every n times.
- Updater: Gemini (2.5 pro) that updates the prompts of the solver based on the critics output.
- Tool Creator: Gemini (2.5 pro) that dynamically creates new tools based on the agent patterns.
Using weave to analyze the the traces for every n steps, we use the information to do the following:
- Prompt Tuning: Iteratively do small changes to the system prompts of the agents based on its shortcomings. We track all the prompts from basic to evolved in Weave.
- Automated Tool Creation: Create new tools and dynamically add it to tool registry based on the common pattern analyzed through weave traces. The tools creation is a 3 fold process of analyzing the weave traces, ideating tool ideas and using Daytona to code, test and evaluate the tool before adding to registry.
Challenges we ran into
- Agent orchestration was difficult. Deciding which framework to work with - explored crewai, custom and decided to go with langchain.
- Researching about genetic, evolutionary and RL style algorithms to come up with SEA.
Accomplishments that we're proud of
- The prompt iteration and tracking with weave
- Working with Dayton and APIs to automatically test new code files and tools
- End-to-end working and the modular nature of the project - provided we can use it to improve any underlying agent.
What we learned
Multi Agent orchestration, Self evolving and tool creation.
What's next for SEA (Self Evolving Agents)
Right now we tested only on Math dataset but know SEA is even more capable and would like to extend it further to other domains. Also convert this to a modular framework that can be plugged into any agentic system for self evolution.
Log in or sign up for Devpost to join the conversation.