Inspiration
Every AI coding tool has the same flaw: it picks one model and trusts it completely.
I kept running into this while building software with AI assistants. Claude would catch a security issue that GPT missed. GPT would suggest a library that Gemini flagged as deprecated. Gemini's architecture was clean, but Claude's error handling was more robust. There's no single AI that's always right — each model has different training data, different strengths, and fundamentally produces outputs by sampling from a probability distribution.
The insight that drove this project: ensemble learning works in ML, so why not apply it to AI-assisted software engineering? If you run $N$ independent models and synthesize their outputs, the result should be better than any single model alone.
The second problem was control. Tools like Devin run autonomously and hand you a result. If it's wrong, you don't know why. I wanted the full deliberation exposed — every argument, every disagreement, every decision — with the developer as a participant, not a spectator.
What it does
AI Roundtable runs Claude, GPT, and Gemini through 7 structured rounds — Requirements, Architecture, Development, Code Review, QA, DevOps, and Execution & Analysis. Each round is a debate: agents speak in randomized order (to prevent first-speaker anchoring bias), see each other's outputs, and iterate up to 3 times before a rotating judge calls consensus.
| # | Round | Role |
|---|---|---|
| 1 | Requirements | Principal Product Engineer |
| 2 | Architecture | Distinguished Software Architect |
| 3 | Development | Principal Software Engineer |
| 4 | Code Review | Staff Engineer |
| 5 | QA | Principal QA Engineer |
| 6 | DevOps | Senior Platform Engineer |
| 7 | Execution & Analysis | SRE / Runtime Debugger |
After consensus in code-producing rounds, a synthesis step merges the best contributions at the function level — not "use Claude's output," but "use Claude's base, replace the auth function with GPT's argon2 implementation, apply Gemini's transaction rollback."
The developer stays in control throughout: toggle any round on/off, inject a note mid-round to redirect the debate, retry any round with updated context, or upload existing files so agents extend your codebase instead of starting from scratch.
After DevOps generates a Dockerfile, the Execution & Analysis round auto-builds and runs the container, streams live stdout/stderr to the browser, and has the agents analyze failures and apply fixes. Frontend and fullstack projects also run in-browser via the WebContainer API. Sessions persist across restarts — resume any session from the Projects dashboard.
Auth0 secures every layer:
- JWT verification (RS256 + JWKS) on every backend request — sessions scoped to the user's
subclaim - API key vault — AI provider keys are Fernet-encrypted server-side and stored in Auth0
user_metadata, never in our database - GitHub Token Vault — the AI agent pushes generated code to GitHub entirely server-side using the user's OAuth token from Auth0 Token Vault; the token never reaches the browser
User clicks "Export to GitHub"
→ POST /api/github/push
Server reads token from Auth0 Token Vault (Management API)
Server pushes files using user's own GitHub token
Returns repo URL — token never sent to browser
How we built it
| Layer | Stack |
|---|---|
| Backend | FastAPI, SQLite, Fernet encryption, slowapi |
| Frontend | Next.js 14 (App Router), React, Tailwind CSS |
| Auth | Auth0 Next.js SDK, RS256 JWT verification, Management API (M2M) |
| Streaming | Server-Sent Events (SSE) — tokens from all 3 agents over one connection |
| Execution | Docker subprocess + WebContainer API for in-browser frontend projects |
Each round runs a debate loop — agents speak in random order, see each other's outputs, and a rotating judge (randomly chosen per round) calls consensus (up to 3 iterations). On consensus, the BASE agent re-generates all files incorporating the judge's merge directives — blending the best contributions from every agent at the function level.
Challenges we ran into
Token limits across three providers. GPT-4o has a ~30K TPM limit that makes sending a full codebase for review impossible. The solution: a chunked code review pass that sorts files by priority (app code → config → tests), groups them into ≤20K token chunks, accumulates findings across chunks, and runs a single consensus check on the findings text only — keeping the final call well under the limit.
Code synthesis correctness. Getting three models to output parseable FILE: path blocks
reliably required careful prompt engineering. Models occasionally omit closing fences, output
partial files, or hallucinate file paths — each failure mode needed a specific mitigation.
Auth0 Token Vault setup complexity. Wiring the GitHub Social connection specifically for
Token Vault (not just authentication) required a non-obvious Auth0 dashboard configuration:
the connection purpose must be set to "Authentication and Connected Accounts for Token Vault,"
and the M2M app needs read:user_idp_tokens scope on the Management API. Getting this right
was the most time-consuming Auth0 integration step.
SSE streaming across multiple concurrent AI calls. Each round streams tokens from up to 3 AI providers simultaneously into a single SSE connection. Managing backpressure, handling partial failures (one provider times out mid-round), and keeping the frontend state machine consistent required careful async coordination in FastAPI.
Developer control vs. automation tension. Fully automated pipelines are fast but opaque. Full manual control is slow. The final design — streaming deliberation visible in real time, confirmation required between rounds, mid-round injection always available, one-click retry per round — took several iterations to feel right.
Accomplishments that we're proud of
The synthesis step actually works. Getting a judge to produce structured merge directives
(BASE=claude. INCORPORATE: gpt's error handling. MUST_FIX: missing validation) and having
the base agent faithfully execute them — producing code that's genuinely better than any
single agent's output — was the hardest thing to get right, and it works reliably.
Auth0 Token Vault as the authorization primitive for AI agents. The GitHub push flow demonstrates a clean answer to "who authorized this agent to act?": the user grants OAuth consent once, Auth0 stores the token in Token Vault, and the agent retrieves it server-side only when the user explicitly requests an action. The raw token never travels through the application layer at any point.
A genuinely transparent AI workflow. Every token from every agent streams to the UI in real time. Every disagreement is surfaced. Every consensus decision is explained. The developer can intervene, redirect, or roll back at any point. This level of visibility is rare in AI coding tools.
Configurable rounds that actually change the workflow. Running just Requirements + Architecture gives a detailed spec in minutes. Running just Developer + Code Review gives working code with a critique. The round system is flexible enough to be useful for very different tasks without any code changes.
What we learned
Multi-model output is genuinely better — but synthesis is the hard part. Getting three models to debate is straightforward. Getting them to produce structured, parseable code blocks consistently, and then merging those outputs at the function level without losing context, is where most of the engineering effort went.
Auth0's Token Vault is the right primitive for agentic authorization. The core problem with AI agents that take real-world actions is: whose credentials are they using, and how are those credentials protected? Token Vault answers both questions cleanly — the user authorizes once via OAuth, the token lives in Auth0's infrastructure, and the agent retrieves it server-side only when needed.
SSE is underrated for AI agent UIs. Streaming tokens from three concurrent AI calls, interleaving them with round metadata, synthesis events, and Docker logs — all over a single SSE connection per session — gave the UI a live, transparent feel that polling never could.
Configurable workflows matter more than full automation. The most useful feature turned out to be the ability to run only the rounds you need. Forcing users through all 7 rounds every time would have made the tool impractical for quick tasks.
What's next for ai-roundtable
- Per-round debate iteration config — let users set more than 3 iterations for complex rounds where deeper debate would help
- Larger token budgets for code review — tiered plans where higher tiers skip chunking and send the full codebase in one pass, enabling relationship-aware review across files
- LangGraph migration — replace the custom round loop in
orchestrator.pywith a declarative graph: nodes per round, conditional edges for retry/consensus, built-in human-in-the-loop and state persistence - Cross-round memory — persistent context across rounds so long-running projects maintain continuity across sessions
- Push to existing repos — currently exports to new repos only; adding push-to-existing and open-PR support would cover the most common real-world workflow
- Python and Deno in WebContainer — currently supports npm projects only
- User-selectable judge model — let users trade cost for accuracy on the consensus step
Bonus Blog Post
How Auth0 Token Vault Solved My Biggest Security Problem
When I started building AI Roundtable, the GitHub export feature was an afterthought. Users would generate code, and I'd give them a button to push it to GitHub.
Then I thought about what that actually meant. A raw GitHub token sitting in browser memory. Potentially logged by network interceptors. Passed through my application layer. It felt wrong — especially for a tool that's supposed to be trusted with your code and your credentials.
That's when I found Auth0 Token Vault — and it turned out to be exactly the right primitive for the problem.
The core insight is elegant: instead of your application handling the token, Auth0 holds it. The user consents once via GitHub OAuth, Auth0 stores the token in Token Vault, and from that point on, your server retrieves it server-side using the Management API. The token never appears in a browser response. Never gets logged. Never passes through client-side code.
Implementing it wasn't without friction. The Auth0 dashboard configuration is non-obvious — you have to set the GitHub Social connection's purpose specifically to "Authentication and Connected Accounts for Token Vault," and grant the proper access scope to your M2M app. It took longer than I'd like to admit — the documentation exists, but the specific combination of settings required for Token Vault isn't laid out in one place.
But once it clicked, the result was clean. The entire GitHub push flow — create repo, commit files, return URL — happens server-side in a single Next.js API route. The browser receives only the final repo URL. The token is never in scope.
This is what "authorized to act" actually means. Not just authentication — but a clear, auditable chain: user consents → Auth0 holds the credential → server acts on behalf of the user → user sees the result. No shortcuts, no raw tokens flying around.
Built With
- auth0
- claude
- docker
- gemini
- openai
- python
- sqlite
- typescript
Log in or sign up for Devpost to join the conversation.