Debate Room

Inspiration

We wanted to see what happens when AI agents don't just answer questions — they fight over them. Most AI tools give you one perspective. We wanted four, with receipts. The idea: simulate a real structured
debate where each AI persona has a distinct worldview, live web evidence, and memory of every argument it has ever made.

## What it does

Debate Room pits 4 AI personas (Skeptic, Advocate, Historian, Pragmatist, and 5 others) against each other across 5 structured phases — Opening Statements, Cross-Examination, Rebuttals, Free-for-All, and Closing Statements — then synthesizes an executive summary with a verdict.

Every agent is backed by:

Yutori Research — a deep pre-debate intelligence briefing fetched before the first word is spoken
Tavily Search — live, phase-aware web evidence injected per agent per turn
GLiNER — structured claim extraction (main claim, evidence, argument type) from every response
Neo4j — a reasoning graph that persists cross-debate memory so agents get smarter over time
Yutori Scout — autonomous topic monitoring that arms a scout and auto-fires a debate when new developments are detected

The result streams live to an 8-bit retro web UI via Server-Sent Events, and every debate exports a full markdown transcript.

## How we built it

TypeScript + Node.js backend with a modular provider system supporting Anthropic, OpenAI, and Google models simultaneously
Express + SSE for real-time streaming to the browser
SQLite (better-sqlite3) as the primary persistence layer for debates, turns, and decisions
Neo4j AuraDB for the reasoning graph — Agent, Turn, Claim, and Evidence nodes with MADE_CLAIM, CHALLENGED, and SUPPORTED_BY edges
FastAPI (Python) microservice bridging GLiNER2 for structured NLP extraction on every agent turn
Yutori Research API polled async before Phase 1 to generate a shared intelligence briefing
Tavily API called per agent per phase with phase-aware queries (counterarguments in cross-exam, latest developments in free-for-all)
Yutori Scout API polled server-side every 30 seconds to detect new developments and auto-trigger debates
85 passing unit and integration tests throughout

## Challenges we ran into

Async prompt pipeline — making buildPrompt async to support concurrent Tavily searches without blocking the race-mode phases required threading changes across the entire call chain
GLiNER in stub mode — the PIONEER_API_KEY wasn't being picked up by uvicorn since it doesn't auto-load .env; fixed by adding load_dotenv() to the FastAPI service
Neo4j context cards empty — the graph only populates Claim nodes when GLiNER analysis is present; without GLiNER working, the context query returned 0 cards and silently fell back to SQLite, making it look like Neo4j wasn't connected at all
Streaming + spinner conflict — the ora spinner for the Yutori briefing fetch conflicted with streaming agent output; had to carefully sequence spinner stop before first chunk arrives
Cross-exam pairing bug — a i < n vs i < Math.floor(n/2) off-by-one produced n² exchanges instead of n, caught only by the pairing unit tests

## Accomplishments that we're proud of

A fully working 5-phase structured debate engine with parallel race modes and fixed-order sequential modes in the same pipeline
Every integration degrades gracefully — no API key means the system still runs, just without that layer. The debate never crashes due to an external service
Cross-debate agent memory — agents genuinely accumulate history across debates; a Skeptic that conceded a point last time enters the next debate knowing it
Auto-Pilot mode — arm a scout on any topic and walk away; the system watches the web and fires a debate automatically when something happens
85 passing tests maintained throughout all four integration phases

## What we learned

Structured debate is a surprisingly effective forcing function for LLMs — the phase constraints (you must cross-examine this opponent, you must give a JSON closing with a confidence score) produce far more rigorous reasoning than open-ended prompting
Graph memory changes agent behavior in measurable ways — prompts with Neo4j context cards are richer and agents reference prior positions more explicitly
The hardest part of multi-agent systems isn't the agents — it's the plumbing: async coordination, streaming, graceful degradation, and making sure one slow API doesn't freeze the whole debate

## What's next for Debate Room

Human-in-the-loop — let a real person take one of the four seats and debate against AI personas
Verdict scoring — use a judge LLM to score each agent's performance and track win rates over time
Topic leaderboard — surface the most contested topics from the Neo4j graph across all debates
Voice mode — text-to-speech per persona so you can listen to the debate like a podcast
Public arena — let anyone submit a topic, watch the debate live, and vote on the winner

Built With

claude
fastino
gliner
javascript
neo4j
openai
sqlite
tavily
yutori

Updates

Amrit Kochar started this project — Feb 27, 2026 07:05 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.