Inspiration
Every brainstorming session follows the same pattern — a few loud voices dominate, half the room stays silent, and the best ideas get lost in a sea of sticky notes nobody revisits. We wanted to fundamentally rethink how groups ideate together. What if an AI could actively facilitate the conversation, pull ideas from everyone's voice in real time, visualize the collective intelligence of the room as a living 3D swarm, and then help the group converge on the strongest ideas through fair, mathematical voting? The Cognitive Swarm was born from the belief that the best ideas emerge when every voice is heard, every connection is surfaced, and consensus is built transparently.
What it does
The Cognitive Swarm is a real-time, multimodal brainstorming platform where teams ideate collaboratively with an AI-powered anchor facilitating the entire session.
An admin creates a room with a topic, and participants join with a simple room code. From there, the session flows through three structured phases:
Explore (Divergent Phase): Participants speak freely while Gemini Live listens, extracts ideas from natural speech, and places them into a 3D swarm visualization. The AI anchor acts as a playful emcee — it names contributors, nudges quiet participants to speak up, suggests unexplored directions when the room goes silent, and even plays devil's advocate to challenge groupthink. A synthesizer agent runs in the background, discovering hidden connections between ideas and drawing edges in the swarm.
Vote (Convergent Phase): The group uses quadratic voting to signal preferences. Each additional vote on the same idea costs quadratically more credits, preventing any single person from dominating consensus. This mechanism surfaces genuine collective preference rather than mob rule.
Forge (Artifact Phase): The highest-weighted ideas are synthesized into a structured Mermaid diagram — the system intelligently infers the right diagram type (flowchart, mind map, entity-relationship, class diagram, or journey map) based on the topic and idea clusters. The result is an actionable artifact that captures the group's collective intelligence.
Throughout the session, ideas float in a 3D space where proximity reflects semantic similarity (powered by Gemini embeddings projected into 3D), edges show discovered connections, and node size reflects voting weight.
How we built it
Frontend: React 19 with Vite, rendered with a 3D swarm visualization using React Three Fiber and Three.js. Tailwind CSS and Framer Motion handle styling and animations. Mermaid renders the final diagram artifacts. XYFlow provides the flow-based artifact canvas.
Backend: A Node.js/Express server with Socket.IO handles all real-time communication. Session state is managed through a custom store abstraction that supports in-memory (development), Redis (distributed state), and Firestore (durable persistence) backends.
AI Layer — Dual Gemini Live Architecture: This is the heart of the system. We maintain two concurrent Gemini Live sessions per room:
- A per-user conversational session that receives live audio (16kHz PCM) and video frames (1fps JPEG), enabling full-duplex, interruption-aware conversation. This session has tool-calling capabilities —
extractIdea,getIdeas,getSessionSnapshot, andgenerateMermaid— letting the AI anchor interact with session state in real time. - A dedicated anchor announcement session for broadcasting spoken cues like direction suggestions and audience nudges, ensuring a consistent AI voice without needing a separate TTS service.
Non-live Gemini calls power the background agents: a synthesizer that finds idea connections every 45 seconds, a devil's advocate that generates critiques every 90 seconds, and an artifact forger that builds Mermaid diagrams from grouped ideas.
Infrastructure: Dockerized multi-stage builds deployed to Google Cloud Run via Terraform, with Memorystore Redis for distributed state, Firestore for durability, Secret Manager for API keys, and GitHub Actions CI/CD with keyless Workload Identity Federation.
Challenges we ran into
Audio resampling in the browser. Gemini Live requires 16kHz PCM 16-bit audio, but browsers capture at their native sample rate (usually 44.1kHz or 48kHz). We had to build interpolation-based resampling in the frontend to downsample on the fly without introducing artifacts.
Keeping the AI anchor contextually grounded. A live conversational AI that facilitates a room of people needs to know the current topic, all ideas so far, who's contributing, and who's been quiet — all while maintaining a consistent personality across a long session. Getting the system instruction and tool-calling pattern right so the anchor felt like a real emcee rather than a generic chatbot took significant iteration.
Socket.IO reconnection and admin presence. Temporary network blips would cause the server to think the admin had left, immediately closing the room for everyone. We solved this with a 15-second grace period for admin disconnects and automatic room rejoin on socket reconnection.
Quadratic voting UX. The math behind quadratic voting is simple, but making it intuitive to users who've never encountered the mechanism required careful UI work — showing remaining credits, making the escalating cost visible, and labeling it in an approachable way.
3D positioning from high-dimensional embeddings. Gemini embeddings are 3072-dimensional vectors. Projecting them into meaningful 3D positions that preserve semantic relationships required random projection matrices with proper normalization and scaling, plus fallback procedural positioning when embedding calls fail.
Accomplishments that we're proud of
- The dual Gemini Live architecture — using two concurrent live sessions (one conversational, one for announcements) to create a seamless multimodal facilitation experience without any separate TTS service.
- A 3D idea swarm that actually means something — ideas aren't randomly placed; their positions reflect semantic similarity through Gemini embeddings, edges show AI-discovered connections, and size reflects voting weight.
- Quadratic voting in a brainstorming tool — bringing mechanism design from governance theory into collaborative ideation, ensuring fair consensus without majority tyranny.
- Intelligent artifact generation — the system doesn't just dump ideas into a list; it infers the right diagram type from the topic and synthesizes weighted ideas into structured Mermaid diagrams.
- Production-ready infrastructure — full Terraform IaC, Docker multi-stage builds, Redis-backed distributed state, Firestore durability, and CI/CD with keyless GCP auth. This isn't a demo; it scales.
- Background intelligence agents — the synthesizer, devil's advocate, and direction suggester run autonomously, keeping sessions alive and productive even when the room goes quiet.
What we learned
- Gemini Live's tool-calling capability is a game-changer for real-time applications. Having the AI anchor call
extractIdeaorgetSessionSnapshotmid-conversation means the AI can interact with application state without breaking the conversational flow. - State management in real-time collaborative apps is hard. The jump from single-user to multi-user, from in-memory to distributed, from ephemeral to durable — each transition introduced new consistency challenges. Our layered store abstraction (memory → Redis → Firestore) was the right call.
- Quadratic voting needs careful credit budgeting. Too many credits and it degenerates to linear voting; too few and participants can't express nuanced preferences. The balance matters.
- 3D visualization of ideas is more than eye candy. When participants see semantically similar ideas clustering together in space, it triggers new associations and connections that flat lists never surface.
- Grace periods and reconnection logic are essential for any real-time app over imperfect networks. Treating every socket disconnect as a permanent departure destroys user trust.
What's next for The Cognitive Swarm
- Persistent session history — let teams revisit past brainstorming sessions, compare artifacts across sessions, and build on previous work.
- Multi-room orchestration — run parallel breakout rooms that feed ideas into a combined swarm for cross-team synthesis.
- Richer artifact types — expand beyond Mermaid diagrams to support design briefs, project plans, decision matrices, and exportable documents.
- Mobile-first experience — optimize the 3D swarm and voice interaction for mobile devices so participants can join from anywhere.
- Custom AI personas — let admins configure the anchor's personality, expertise domain, and facilitation style to match different team cultures.
- Analytics dashboard — surface participation patterns, idea velocity, voting distributions, and session health metrics to help facilitators improve over time.
Built With
- geminisdk
- typescript
Log in or sign up for Devpost to join the conversation.