Inspiration

Ask ChatGPT for feedback on your startup idea. It'll say it's brilliant. Tell it to be critical — it'll give you a polite version of the same thing. That's sycophancy. Every large language model is optimized for engagement, not truth. They're designed to keep you talking, not to challenge you.

The real cost isn't wrong answers, it's missed answers. The non-obvious idea that could change everything never gets surfaced, because a single model is architecturally incapable of producing it. Research backs this up: the CONSENSAGENT paper (ACL Findings 2025) identified sycophancy as "a critical yet overlooked challenge" in multi-agent LLM systems.

We asked: what if you could replace one model's opinion with a thousand structurally different perspectives and then run the equivalent of Excel's UNIQUE function on the results? Not averaging toward consensus, but preserving every distinct idea regardless of how many agents said it -- innovation comes from the minority, not the majority.

What it does

Council is a multi-agent ideation engine. Drop in any idea, problem, or decision. A thousand AI agents across eight model families — open-source and closed-source — independently think about it. Then the system strips away sycophantic consensus and surfaces the non-obvious insights a single model can't find.

The pipeline:

  1. Distill: Claude Sonnet structures your input, extracts constraints, and generates 1,000 unique personas (from a skeptical VC to a retired industry veteran to a Gen Z user)
  2. Swarm: 1,000 agents across 8 providers fire simultaneously, each with a unique persona, cognitive frame, and temperature
  3. Organize: Responses get embedded, clustered, and deduplicated using a two-step UNIQUE function that preserves every distinct idea
  4. *Tournament: Three frontier models (Claude Opus, GPT-4o, Gemini Pro) debate the top ideas across three rounds, evaluating for transformative potential — not just feasibility
  5. Results: Top 5 ranked ideas, the consensus view, AND the non-obvious insights that only surfaced because of structural diversity

Council isn't confined to one domain. Career decisions, startup strategy, research questions, creative projects, any problem where you need perspectives you don't have access to.

How we built it

The key architectural decision was using the Vercel AI SDK as a unified interface to all 8 providers. We wrote the swarm logic once and it works across Google (Gemini Flash), Groq (Llama 3.3 70B), Fireworks (Mixtral), Anthropic (Haiku/Opus), OpenAI (GPT-4o-mini/4o), DeepSeek, Cerebras (Llama 3.1 8B at 969 tok/s), and Moonshot (Kimi K2.5). 60%open-source, 40% closed-source.

Every design decision is backed by research. We A/B tested three prompt engineering approaches— role-based persona diversity scored 92.9/100 and won. We tested titled vs. raw text tournament formats — titled was 1.9x faster. We tested visionary vs. conservative judge prompts — visionary surfaced 3x more genuinely new ideas. We tested compressed cross-referencing in Round 2, it backfired (lost nuance), so we rolled it back.

Two developers, many Claude Code agents, one repo. Zave and Luke each ran autonomous coding agents on separate branches simultaneously, with strict file ownership rules and append-only shared files to prevent merge conflicts.

The pipeline stages are all pure functions in separate files — no HTTP logic leaks into pipeline code. The SSE stream orchestrates everything server-side while the client renders real-time visualizations: a force-directed graph showing agents firing by provider color, a three-column debate view with changed-mind highlights, and ranked results with download options.

Challenges we ran into

Rate limiting across 8 providers. Firing 1,000 API calls in parallel means navigating 8 different rate limit
policies simultaneously.

Defeating sycophancy is a research problem. Our first prompt attempts produced a thousand variations of the same safe answer — exactly the sycophancy problem the CONSENSAGENT paper quantified. That led us to 14 cognitive prompt frames, 5 temperature bands, and the core insight: you need structural model diversity (different training distributions) on top of prompt diversity. You can't sycophancy-hack different training distributions.

Time pressure with full-stack ambition. Real-time swarm visualization, three-round debate UI with changed-mind highlights, geodesic sphere animation, and a full multi-provider pipeline — all in 24 hours. Modular architecture saved us: each pipeline stage is a pure function we could build and test independently.

Accomplishments that we're proud of

We ran 1,000 agents through the full pipeline. Real API calls to 8 real providers, real embedding and clustering, real three-round tournament debate. End to end.

Council solved its own problems. Halfway through the build, we hit bottlenecks with runtime orchestration and visualization. So we ran Council on itself — fed our own architectural challenges into the pipeline. It surfaced non-obvious approaches we hadn't considered. The tool designed to find insights you'd miss found insights we'd missed.

Every design decision is A/B tested. We ran 6 controlled experiments on prompt formats, tournament structures, and judge system prompts, measured the results, and shipped the winners.

What we learned

Structural diversity beats prompting tricks — 8 providers with different training data, architectures, and cultural backgrounds produce genuinely unique ideas that no single model can, no matter how well prompted. The interesting ideas are in the tails: the insights only one or two agents out of a thousand surface have the highest return on
consideration.

What's next for Council

Batch-conditioned diversity where Wave 2 agents see Wave 1's cluster summaries and must propose something structurally different, plus feedback loops where users rate which insights were actually valuable so the tournament learns what "worth further consideration" really means. Long-term, Council as an SDK (npm install council) and a standard for how LLMs think together — the MCP for deliberation.

Built With

Share this project:

Updates