Inspiration

It all started with one dumb-fun question: what if you actually locked 50 of the most brilliant, chaotic, and legendary minds in history inside the same hackathon?

Picture Elon roasting Napoleon’s product roadmap, or Kanye dropping a wild creative bomb on Gandhi. We were obsessed with Stanford’s Generative Agents paper and the raw energy of real hackathons, so we decided to build the whole thing as a multi-agent social simulation. Every persona has its own personality baked in — not just a name, but actual psychological traits.

What We Learned

  • Prompt engineering 50 unique humans is no joke. We went deep: childhood stories, biggest controversies, signature quotes, risk tolerance, empathy levels, stubbornness scores… the works. A persona isn’t just “act like Elon,” it’s a whole cognitive model.
  • Getting agents to actually collaborate instead of talking past each other was brutal. We had to build a strict session flow: Pitch → Debate → Decision, and force every agent to throw at least one real objection.
  • ELO voting is shockingly elegant. We straight-up stole the chess formula and it just works:

$$E_A = \frac{1}{1 + 10^{(R_B - R_A)/400}}$$

$$R'_A = R_A + K \cdot (S_A - E_A)$$

How We Built It

Three main pieces:

  1. Social Simulation Engine (social-simulation)
    Python async beast that runs 7 full rounds. Agents go through free chat → team formation → brainstorming → polishing → final pitch. Persistent memory across rounds via Supermemory.ai.

  2. LLM Inference Backend (modal-backend)
    Deployed on Modal with vLLM serving Qwen on 4× B200 GPUs (tensor parallel). Full OpenAI-compatible API, streaming, handles 55 concurrent agent chats without breaking a sweat.

  3. Arena Frontend (arena-web)
    Next.js 16 + React 19 on Cloudflare Workers. 1v1 voting, live ELO updates, Neon Postgres in the back.

Challenges We Faced

  • Keeping personality consistent across hours of conversation without turning into robots. Fixed it with locked persona params + a context manager that protects the core traits but lets ideas evolve.
  • Temperature was everything. Too high early = total chaos, too low late = boring. So we ramped it down from 0.95 in round 1 to 0.60 by round 7.
  • Running 50 agents in parallel without melting the GPU budget. Lots of async retries, careful orchestration, and some “please don’t die” logic.

Built With

Category Technologies
Languages Python, TypeScript
LLM Inference vLLM, SuperMemory AI, Qwen2.5-7B-Instruct
Cloud Modal (GPUs), Cloudflare Workers
Frontend Next.js 16, Tailwind CSS v4
Database Neon PostgreSQL
APIs OpenAI-compatible, Supermemory.ai
DevOps Cloudflare Pages, Wrangler

Built With

+ 4 more
Share this project:

Updates