Inspiration
Inspired by AlphaEvolve, a project that used LLMs to rediscover and optimize mathematical algorithms like matrix multiplication, I became fascinated with the idea of evolutionary intelligence—systems that improve through structured generations instead of brute force.
That curiosity led to the creation of Pegasus, a generational algorithmic problem solver that evolved agents with distinct roles and tasks to improve their performance over time. But Pegasus raised an even bigger question: could this framework be generalized to solve a wider class of problems, from scientific discovery to prompt optimization?
That’s how this project was born: a multi-agent, generation-based simulation framework that evolves intelligent behaviors over time—be it for creativity, cybersecurity, reasoning, or search.
What it does
This system launches a population of agents, each with a unique task and role, and evolves them over multiple generations. Each agent produces two children that mutate or refine their approach. The most effective agents are selected based on performance and passed into the next generation.
Key features:
- Agents are assigned roles (e.g., physicist, cybersecurity expert, writer) and specific tasks.
- Each generation produces twice the number of agents (up to a max), creating exponential diversity.
- Tasks evolve via natural-language mutation (e.g., “optimize X using fewer recursive calls”).
- A fitness function evaluates each agent’s result to determine which survive.
- The system is model-agnostic and supports OpenAI APIs or local LLMs.
🛠️ How we built it
The project is structured around two main components:
Manager class:
- Controls the generational loop.
- Manages reproduction, evaluation, and population capping.
- Tracks all agent metadata across generations.
Worker module:
- Contains the
Agentclass. - Each agent holds attributes like
generation,role,task, and output history. - Each agent can mutate itself to produce children.
Each agent in generation 0 is initialized with a random role and a variation of the prompt. For every generation:
- Each agent spawns 2 children.
- Tasks are mutated (e.g., using vectorization, simplifying logic, etc.).
- Top-scoring agents are kept based on a customizable evaluation function.
- The total number of agents is capped at a user-defined
MaxAgents.
🚧 Challenges we ran into
- Balancing meaningful task mutations without degrading prompt quality.
- Preventing performance bottlenecks with large numbers of LLM calls.
- Designing a flexible architecture that works across use cases (math, security, creativity).
- Ensuring that child agents don't drift too far from useful task space without collapsing into noise.
Accomplishments that we're proud of
- Built a fully generational, multi-agent simulation system in just a few days.
- Created an abstract but extensible agent model with support for role/task mutation and prompt chaining.
- Designed a system that works for a wide range of optimization tasks—math, creativity, adversarial defense, and more.
- Successfully capped agent populations and ran multiple generation cycles with LLM evaluation at each step.
What we learned
- Task mutation via LLMs is powerful but fragile; context preservation is key.
- Adding structure (roles, memory, generations) can significantly boost LLM behavior.
- Evolutionary agent models are not only feasible, but surprisingly efficient if properly managed.
- The right architecture can make experimentation with prompt-based AI truly modular and scalable.
What's next for Anroy
- Integrate a local model backend (e.g., Mixtral or LLaMA) for faster, cheaper iteration.
- Add agent memory and cross-agent communication (e.g., critic/writer/editor models).
- Expand to real-world use cases like:
- Red-teaming prompt defenses
- Generating scientific hypothesis variations
- Auto-discovering prompt chains that outperform humans
- Launch a web-based visualizer for tracking generations, agent history, and output evolution.
Log in or sign up for Devpost to join the conversation.