Inspiration

Meetings are where decisions happen — but the follow-through is broken. Someone has to manually take notes, create Jira tickets, draft recap emails, and dig up the Slack thread everyone half-remembered. We wanted to build an AI that actually participates in meetings, not just observes them. Something that answers questions, takes action, and keeps everyone accountable, right in the flow of conversation — no tab-switching required.

What it does

Quorum (Q) is an AI that joins your Google Meet, Zoom, or Teams call as a live participant. You summon it by saying "Hey Q" and it responds in real-time with its own voice.

From inside the meeting, Q can:

  • Search your tools — look up Asana tasks, Slack threads, Notion docs, GitHub PRs, and Gmail, then read results aloud
  • Take action — create and update Asana tasks, send emails, create Google Calendar events, draft emails for review
  • Control a shared screen — open URLs, navigate websites, fill out forms, and render charts visible to everyone via a shared browser window
  • Remember decisions — logs key decisions made during the meeting and can recall them or cross-reference past meetings
  • Summarize and brief — summarize what's happened so far or ask Claude for deeper analysis

Q has two modes: On-Demand (only responds when called by name, safe for live demos) and Active (proactively surfaces relevant context as topics come up).

How we built it

The system has four main layers:

Voice I/O: Recall.ai joins the meeting as a bot. Deepgram streams real-time transcription with speaker diarization. ElevenLabs converts Q's responses back to speech, which Recall.ai injects as audio through the bot's microphone.

Orchestration: A debounce layer combines speech fragments, detects whether Q was addressed, and manages conversation state (IDLE / ENGAGED). The QOrchestrator dispatches to a tool-calling agent loop that runs up to 4 LLM iterations per request.

Integrations: Q routes queries to the right source (Gmail, Asana, Slack, Notion, GitHub) via an LLM-guided router. Each integration returns structured IntegrationResult objects. Gmail and Asana also support write operations (send, create, update).

Screen automation: A Dockerized service runs Playwright + Chromium in a virtual display (Xvfb), exposed via Flask API. A vision loop — screenshot → GPT-4o → action → execute — lets Q navigate the web autonomously. Participants can watch via noVNC at port 6080.

The LLM stack tries a local Hermes (Ollama) first, then falls back to OpenRouter. Q's system prompt separates spoken output (clean, 2-sentence TTS) from chat output (markdown with URLs).

Challenges we ran into

  • Real-time latency: Getting end-to-end voice → transcription → LLM → TTS → audio injection took careful debouncing and async pipeline design. We tuned separate wait times for "complete-sounding" vs. fragment segments.
    • Speech misrecognition: "Q" gets transcribed as "Hugh," "Cue," "Que," "Ku," and more. We built a phonetic alias list to catch all of them without false positives.
    • Context window management: Fitting meeting transcript, tool definitions, conversation history, and a useful system prompt in one call required aggressive trimming and context windowing.
    • Thread-safe browser automation: Playwright is single-threaded. We built a work queue to serialize Playwright operations while keeping the Flask API concurrent.
    • OAuth mid-hackathon: Getting Gmail OAuth2 refresh tokens working under time pressure with a fresh Google Cloud project was its own adventure.

Accomplishments that we're proud of

  • A fully working end-to-end voice loop: Q hears you, thinks, and talks back — with zero manual typing required
    • Vision-guided browser automation that Q can operate live while meeting participants watch the shared screen
    • Seamless fallback chain: local LLM → cloud LLM → graceful degradation, so the bot never hard-crashes during a demo
    • Clean separation of spoken vs. chat output — Q says "I found 3 tasks, check chat for links" and sends the actual URLs separately
    • 18 registered tools working in a single agentic loop across 6 external services

What we learned

  • Real-time audio pipelines require a fundamentally different architecture than request-response APIs — buffering, debouncing, and state machines matter enormously
    • Vision-based browser automation (screenshot → LLM → action) is surprisingly capable for navigating unfamiliar UIs without pre-built selectors
    • Separating what the bot says aloud from what it sends in chat is a small design decision with huge UX impact
    • Local LLMs (Hermes/Ollama) are fast enough for simple tool dispatch but still lose to hosted models on complex reasoning — a hybrid approach works well in practice

What's next for Quorum

  • Vector memory: Replace keyword search with embeddings so Q can recall semantically relevant context across meetings
    • Proactive nudges: Q surfaces action items or relevant decisions automatically as topics are detected, without being called
    • Multi-bot coordination: Multiple Q instances in different calls that share a knowledge base and can relay information across meetings
    • Post-meeting reports: Auto-generated summaries, decision logs, and Asana task lists delivered after every call
    • Calendar-aware context: Q knows what the meeting is about before it starts and pre-loads relevant documents and task

Built With

Share this project:

Updates