Inspiration

We have all been there: 3-hour meetings where decisions are made, tasks are assigned, and context is lost the moment the call ends. We realized that while RAG is great for searching past documents, it fails at the "Now."

We wanted to build something that doesn't just record meetings, but participates in them. An AI that acts like a senior engineer sitting next to you, detecting contradictions ("Wait, didn't we say the API was native?"), capturing tasks in real-time, and ensuring no context is left behind.

We were inspired by the idea of an "Active Listener": an AI that doesn't wait for you to type a prompt, but interrupts you (gently) when it detects actionable insights.

What it does

Comet is a cross-platform (macOS/Windows) AI Co-Pilot that lives in your system tray.

  1. Real-Time Transcription: It captures system audio (Google Meet, Discord, Zoom) and microphone input simultaneously.
  2. Live Insights: As you speak, Comet detects Tasks, Missions, and Fact Checks. If you contradict a previous statement or a known fact, it alerts you immediately.
  3. Active Memory: It keeps a buffer of the conversation and uses Google Gemini 3 to analyze context dynamically.
  4. Smart Summaries: At the end of the meeting, it generates a structured JSON resume with Action Items, Sentiment Analysis, and blockers, automatically synced to your dashboard.
  5. Stealth Mode: It runs locally with a minimal footprint, using global shortcuts (Cmd+Shift+Space) to toggle interaction.

How we built it

We chose a stack designed for performance and portability:

Core: Built in Go for high concurrency and low latency. We use Goroutines to handle audio streams and AI requests in parallel without blocking the UI.

Frontend/GUI: Wails (Go + Svelte). This allows us to have a native-feeling desktop app with the flexibility of web technologies.

AI Brain: Google Gemini 3. We leverage its massive context window and superior reasoning capabilities to feed meeting buffers. Gemini 3's native ability to handle structured outputs (JSON Mode) guarantees clean data for our UI.

Speech-to-Text: An optimized implementation of Whisper (C++ binding) running locally, ensuring privacy and speed.

Architecture: We followed Hexagonal Architecture (Ports & Adapters) to decouple our business logic from the UI framework, making the codebase testable and modular.

Challenges we ran into

The biggest beast was Audio Drivers.

Windows vs. macOS: Capturing "System Audio" (what other people say in the call) is a nightmare. Windows has Loopback, but macOS requires virtual drivers like Blackhole. Implementing a seamless experience that detects the OS and chooses the right strategy required deep dives into low-level audio libraries.

Concurrency Hell: Managing two audio streams (Mic + System) writing to a shared buffer while simultaneously sending chunks to Gemini generated some nasty race conditions. We had to implement strict Mutex locking and channel synchronization to prevent data corruption.

Accomplishments that we're proud of

The "Fact Check" Feature: Watching the AI correct us in real-time during testing ("Conflict Detected") was a magic moment. It felt like the AI actually understood the meeting.

Whisper Integration: Getting whisper.cpp to run efficiently within a Wails application without freezing the main thread was a tough engineering challenge, but we pulled it off.

Cross-Platform Build: We managed to build a single codebase that compiles native binaries for both macOS and Windows with native notifications on both.

What we learned

Gemini 3's Power: We learned that Gemini 3 is surprisingly good at "soft skills"—detecting sarcasm or hesitation in text transcripts better than previous models we tested.

Go + GUI: We discovered that Go is an incredible language for desktop apps. The Wails ecosystem is mature enough for production apps, provided you handle the Go-to-JS bridge carefully.

Latency Matters: In a real-time app, 500ms feels like an eternity. Optimization isn't optional; it's the core feature.

What's next for Comet

Comet Gateway: We are currently architecting the "Comet Gateway," a centralized backend service written in Go. This will move API key management from the client-side to a secure proxy, enabling enterprise-grade security, rate limiting, and user authentication.

Transition to SaaS: We plan to evolve Comet from a standalone desktop utility into a full B2B SaaS platform. This includes team management features, where a "Manager" can view aggregated insights from their team's meetings (with privacy controls).

Deep Integrations: Beyond just detecting tasks, we want to execute them. The next version will allow users to connect their Jira, Trello, or GitHub accounts, turning a spoken "Mission" directly into a ticket on the board without leaving the meeting.

Context Caching at Scale: leveraging Gemini's Context Caching to load entire project documentations (PDFs, Codebases) into the Gateway, so Comet knows exactly what "Project XYZ" is before the meeting starts, reducing latency and costs.

Built With

  • go
  • svelte
  • wails
Share this project:

Updates