What To Pass? — Multi-Agent Policy Deliberation Engine
Inspiration
Legislation and policy-making usually takes hours, days, or even weeks to complete and is often shaped by whoever has the loudest voice in the room. We wanted to experiment with agentic AI to see if it could produce more equitable policies by simulating blind deliberation between diverse stakeholders. The question we set out to answer: can a system of AI agents with competing interests converge on a policy that benefits the greatest number of people?
What it does
'What to Pass?' (WtP) takes a proposed policy and runs it through a multi-round deliberation process powered by five AI agents, each assigned a randomized persona (e.g., blue-collar worker, startup CEO, single parent, retiree). A mediator distributes the policy to all agents, who propose changes in their own interest. Each proposal is then blindly rated by the other agents on a scale of -1.00 to 1.00. A consensus-adjusted scoring algorithm selects the winning proposal, which becomes the base for the next round. The process repeats with fresh personas each generation until the policy converges on broad approval — or hits a maximum number of rounds. The result is a policy that no single perspective would have written, but that a diverse group can agree on.
How we built it
- Backend: Python with FastAPI handling the mediator orchestration loop, agent management, and scoring engine
- AI Agents: Direct Gemini API calls (gemini-2.5-flash-lite) with structured persona-driven system prompts
- Database: MongoDB Atlas storing cases, simulation state, round history, persona pools, and ratings
- Scoring: A consensus-adjusted score (CAS = mean - λ × std_dev) that rewards broad agreement over polarized approval
- Frontend: React dashboard with live round tracking, proposal cards, convergence charts, and persona displays
Challenges we ran into
- Google API token management turned out to be trickier than expected — rate limits and error handling required careful retry logic across 25+ parallel calls per round
- Frontend and backend were developed simultaneously by separate team members, so consolidating them into a working system took extra coordination and debugging
Accomplishments that we're proud of
- We started with a focus on economic policy but quickly expanded to general law and policy-making. We even experimented with ethical dilemmas and social policy (e.g. the trolley problem) by engineering our prompts to the Gemini API properly.*
- We were able to develop most of the backend and frontend in parallel and consolidate inconsistencies in time for presentation while still adding features
- All of the core deliberation logic — the scoring algorithm, blind rating system, convergence detection, and mediator loop — was designed, implemented, and validated during testing before the frontend was even complete
Disclaimer: We do not believe LLMs or Agentic AI is suitable to make ethical decisions, this was done purely out of curiosity.
What we learned
- How to work with Google's Gemini API effectively, including structured JSON output, async parallel calls, and token management
- Using MongoDB Atlas as a real-time state store for multi-step AI workflows
- Building a mediator architecture taught us a lot about orchestrating independent systems: the mediator became the main bus for all data flow, and designing it well made everything else simpler
- The importance of a clear scoring metric: without the consensus-adjusted score, the system had no meaningful way to distinguish "good for some" from "good for most"
What's next
WTP has to scale up in order to prove, without shadow of a doubt, that it is effective. It needs better models, more agents, and longer deliberation periods. Right now the input is a short prompt written by the user, but the end goal is a system that ingests a full PDF of proposed legislation, spins up hundreds of agents across many parallel simulations, and outputs suggested policy changes that are theoretically in the interest of every stakeholder involved. We also want to explore letting users tune the consensus penalty (λ) in real time to see how the definition of "good policy" itself changes the outcome.
Log in or sign up for Devpost to join the conversation.