Ducksy

Inspiration

AI has changed from just being a chatbot to an agent that works for us. In the tech world, this has boosted productivity to levels we've never seen before. However, this power is currently gated behind technical complexity, accessible mainly to those who know how to use an IDE or navigate a terminal.

What if we minimize that leverage? Our project aims to bring the Agentic experience to everyone's everyday workflow. Instead of just being a coding assistant, our agent acts as a personal butler, transforming messy meetings into actionable milestones and turning passive assistance into bold execution.

What it does

Ducksy is a native desktop application (macOS/Windows) that acts as a "second brain" for your computer.

Key Features:

  • Ghost Mode (Intelligent Capture): Ducksy lives in your system tray and can be summoned with a global hotkey (Ctrl+Alt+M). It provides a minimal, non-intrusive overlay that floats over your apps, allowing you to trigger actions without leaving your current context.
  • Magic Lens (Visual Context): Ducksy can "see" your screen. With Magic Lens, you can select any region of your desktop to capture it. Ducksy instantly analyzes the visual context using Gemini's multimodal capabilities, whether it's debugging code, translating a comic, or extracting data from a chart.
  • Drag & Drop Analysis: Simply drop a PDF, image, or audio file onto the floating Ghost bubble. Ducksy instantly processes the file using Gemini's multimodal capabilities to give you summaries or answers.
  • Deep Memory (RAG): Ducksy remembers. It uses a local vector database to store session history. You can "chat" with past recordings (e.g., "What did we agree on regarding the budget last week?") to retrieve exact details.
  • Action-Oriented Integrations:
    • Google Calendar: "Schedule a sync with the design team for next Tuesday at 2 PM." Ducksy extracts the intent from your voice command and creates the event directly.
  • Live Dashboard: A beautiful, responsive dashboard to review past sessions, manage generated action items, and organize your captured knowledge.

How we built it

Ducksy is a modern Electron application built with performance and privacy in mind.

Component Tech Stack
Frontend Built with Next.js and Tailwind CSS v4 for a buttery smooth, responsive UI that looks great in Dark Mode.
Backend/AI Powered by Google Gemini 3 (and 2.0 Flash) for blazing-fast multimodal processing. We utilize Gemini's streaming capabilities for real-time "Thinking Process" feedback.
Data Layer A local-first architecture using embedded databases (PouchDB/RxDB) ensures your data stays on your machine until you choose to share it.
Integrations OAuth flows implemented to securely connect with Google Workspace (Calendar) and Notion.

Challenges we ran into

  • Handling Real-time Streams: Managing WebSocket connections and IPC (Inter-Process Communication) events between the Electron main process and the React renderer was complex, especially for streaming audio and "thinking" status updates.
  • Window Management: Creating a seamless "Ghost Mode" overlay that stays on top but passes through mouse clicks required deep diving into Electron's native window APIs.

Accomplishments We're Proud Of

  1. "Native" Feel: We spent a lot of time on details—animations, sound effects, and glassmorphism—to make Ducksy feel like a premium OS feature, not just a web wrapper.
  2. Transparency: We added a "Thinking Indicator" so users aren't left guessing. You can see when Ducksy is listening, processing, or acting.
  3. Cross-Platform: We successfully built and tested valid builds for both macOS and Windows.

What we learned

  • Multimodal is the Key: Text prompts are archaic. Being able to "show" the AI a problem or "feed" it a meeting audio file creates a workflow velocity that text-only interfaces can't match.
  • Gemini 3.0 is Fast: We initially thought we needed complex streaming, but Gemini 3.0 is so fast that a simple Request/Response model feels almost real-time, simplifying our architecture significantly.
  • Context is King: The value of an AI agent isn't just in its IQ, but in its access to your current context (screen and sound).

What's next for Ducksy

  • [ ] Full Desktop Control: Giving Ducksy the ability to click and type for you (Agentic Mode).
  • [ ] More Integrations: Deep integration with Slack, Linear, and VS Code.
  • [ ] Local LLM Support: Adding support for on-device models for fully offline privacy.

Built With

Share this project:

Updates