What To Pass? — Multi-Agent Policy Deliberation Engine

Inspiration

Legislation and policy-making usually takes hours, days, or even weeks to complete and is often shaped by whoever has the loudest voice in the room. We wanted to experiment with agentic AI to see if it could produce more equitable policies by simulating blind deliberation between diverse stakeholders. The question we set out to answer: can a system of AI agents with competing interests converge on a policy that benefits the greatest number of people?

What it does

'What to Pass?' (WtP) takes a proposed policy and runs it through a multi-round deliberation process powered by five AI agents, each assigned a randomized persona (e.g., blue-collar worker, startup CEO, single parent, retiree). A mediator distributes the policy to all agents, who propose changes in their own interest. Each proposal is then blindly rated by the other agents on a scale of -1.00 to 1.00. A consensus-adjusted scoring algorithm selects the winning proposal, which becomes the base for the next round. The process repeats with fresh personas each generation until the policy converges on broad approval — or hits a maximum number of rounds. The result is a policy that no single perspective would have written, but that a diverse group can agree on.

How we built it

Backend: Python with FastAPI handling the mediator orchestration loop, agent management, and scoring engine
AI Agents: Direct Gemini API calls (gemini-2.5-flash-lite) with structured persona-driven system prompts
Database: MongoDB Atlas storing cases, simulation state, round history, persona pools, and ratings
Scoring: A consensus-adjusted score (CAS = mean - λ × std_dev) that rewards broad agreement over polarized approval
Frontend: React dashboard with live round tracking, proposal cards, convergence charts, and persona displays

Challenges we ran into

Google API token management turned out to be trickier than expected — rate limits and error handling required careful retry logic across 25+ parallel calls per round
Frontend and backend were developed simultaneously by separate team members, so consolidating them into a working system took extra coordination and debugging

Accomplishments that we're proud of

We started with a focus on economic policy but quickly expanded to general law and policy-making. We even experimented with ethical dilemmas and social policy (e.g. the trolley problem) by engineering our prompts to the Gemini API properly.*
We were able to develop most of the backend and frontend in parallel and consolidate inconsistencies in time for presentation while still adding features
All of the core deliberation logic — the scoring algorithm, blind rating system, convergence detection, and mediator loop — was designed, implemented, and validated during testing before the frontend was even complete

Disclaimer: We do not believe LLMs or Agentic AI is suitable to make ethical decisions, this was done purely out of curiosity.

What we learned

How to work with Google's Gemini API effectively, including structured JSON output, async parallel calls, and token management
Using MongoDB Atlas as a real-time state store for multi-step AI workflows
Building a mediator architecture taught us a lot about orchestrating independent systems: the mediator became the main bus for all data flow, and designing it well made everything else simpler
The importance of a clear scoring metric: without the consensus-adjusted score, the system had no meaningful way to distinguish "good for some" from "good for most"

What's next

WTP has to scale up in order to prove, without shadow of a doubt, that it is effective. It needs better models, more agents, and longer deliberation periods. Right now the input is a short prompt written by the user, but the end goal is a system that ingests a full PDF of proposed legislation, spins up hundreds of agents across many parallel simulations, and outputs suggested policy changes that are theoretically in the interest of every stakeholder involved. We also want to explore letting users tune the consensus penalty (λ) in real time to see how the definition of "good policy" itself changes the outcome.