Inspiration

Prompt engineering is trial and error. Biology solved optimization billions of years ago. Conway's Game of Life showed that simple rules produce complex emergent behavior. MiroFish proved that OASIS can simulate realistic social worlds at scale. We combined both ideas by dropping a population of AI agents into OASIS social simulations, letting them compete, and evolving the best one through natural selection. No manual tuning, just evolution.

What it does

You describe the agent you want (e.g. "best negotiator", "best teacher", "best mass manipulator") and Agent Kitchen evolves one for you. Under the hood, our program uses OASIS ,a social simulation platform by CAMEL-AI, to test agents in realistic environments that go far beyond simple LLM conversations. Agents negotiate in private chat rooms, post and comment on public feeds, build followings, and respond to crises, all within a persistent social world where their actions have real consequences (likes, follows, replies from other agents).

An orchestrator drives the evolutionary loop: given only a high-level goal, it automatically generates diverse scenarios, a scoring rubric, and an initial population. In each generation, every agent is evaluated across all scenarios by an LLM judge scoring dimensions like Argument Strength, Empathy, Adaptability, and Progress Towards Goal. Fitness is averaged across scenarios, producing robust generalists rather than narrow specialists. Top-performing agents then produce "offspring" via biologically-inspired operators like point mutations (small tweaks), rewrites, insertions, deletions, and crossover between two parents. After the final generation, the winning agent's prompt emerges battle-tested, never written by a human.

How we built it

Frontend

Golang with the Bubble Tea framework, a real-time TUI that streams every simulation live, with parallel scenario views, expandable agent conversations, and a results dashboard with fitness charts and one-key prompt export.

Backend

Python with OASIS (CAMEL-AI) as the simulation engine. We built custom agent subclasses on top of OASIS to produce focused, natural interactions in both private chat and public feed scenarios. The evolutionary genome representation, mutation operators, LLM evaluation, and natural selection runs as an orchestrator coordinating parallel simulations across scenarios and generations.

Challenges we ran into

OASIS is built for social media simulation, not structured dialogue, so we built custom agent subclasses that override the prompt for each interaction mode while still leveraging OASIS's social infrastructure. Separately, parallel simulations corrupted the JSONL event stream between Python and Go, which we solved with a file descriptor duplication trick to isolate our pipe from OASIS's stray stdout output.

What we learned

Agents measurably improve across generations without any human guidance. If the judge is inconsistent, the whole system drifts randomly instead of evolving. We also learned that using OASIS's full social features (posts, comments, follows) instead of just chat produces richer simulations and gives agents more dimensions to compete on, which is exactly what natural selection needs to work.

What's next for Agent Kitchen

Replacing our LLM judge with Meta's Meta-Rewarding approach, a self-improving evaluation loop where the model judges its own judgments, making fitness evaluation sharper with each generation. It'd also be cool to have co-evolution, human-in-the-loop steering, and multi-objective fitness.

Built With

Share this project:

Updates