AI Desktop Companion — Glitch

Download here
Check out our landing page!
Github Repo
Download PPT

A living, breathing AI that shares your desktop.
Not a tab. Not a chatbot. A companion!!


Inspiration

(Just kidding, don't try that!!)

We grew up dreaming of companions like JARVIS — agents that don’t just listen, but act. Somewhere along the way, assistants got stuck in browser tabs.

We wanted to break the Fourth Wall of the Operating System.

Glitch was born from a simple idea:
What if an AI actually shared your workspace?

An AI that:

  • Lives on your desktop
  • Sees what you see
  • Acts when needed
  • Feels alive

We fused the nostalgia of desktop pets (Clippy, Tamagotchi) with the power of Google Gemini Multimodal Live, creating an AI that isn’t just helpful — it’s present.


What it does

Glitch is a fully multimodal, autonomous desktop agent.

✨ Core Capabilities

  • 🖥️ Lives on Your Screen
    A transparent, always-on-top overlay that blends into your OS.
    Drag Glitch anywhere. Watch him chase his cat. He reacts to your work with a new, sleek Floating Action Button (FAB) UI.

  • 👁️ Multimodal Vision
    Glitch can see your screen in real-time:

    • “What is this error?”
    • “Review this UI”
    • “Does this outfit look good?”
  • 🧠 Autonomous Agent Mode
    Powered by nut-js, Glitch can:

    • Control mouse & keyboard
    • Open apps like Notepad effortlessly
    • Execute multi-step workflows

  • 🎙️ Real-time Voice & Personality
    No external keys needed. Powered entirely by Gemini Live.
    • Dynamic Voices: Switch personas instantly (Glitch, Puck, Charon, Kore, Fenrir).
    • Humor Mode: Glitch is witty, slightly chaotic, breaks the fourth wall, and makes tech jokes.

  • 🧑‍💻 Developer Accelerator
    Just say: > “Create a Next.js crypto dashboard”

Glitch will:

  • Scaffold the project
  • Generate boilerplate
  • Open VS Code
  • Bring your idea to life

  • 📝 Smart Notes
    Glitch has intelligent note-taking capabilities: > "Write Best bitcoin data API's for Developer in Notepad"

Glitch will generate the recipe internally, open Notepad, and magically paste it — no Google searches required unless you ask for them.

  • 🎨 Image Generation (Bonus Power)
    Glitch can also generate images on demand — perfect for design inspiration, concepts, or quick visuals — seamlessly integrated into your workflow.

How we built it

Glitch is powered by a Hybrid Agent Architecture:

  • 🧩 Core Framework — Electron
    Enables a frameless, transparent, click-through desktop overlay.

  • 🧠 The Brain & Voice — Google Gemini Multimodal Live API
    Handles:

    • Natural conversation & Reasoning
    • Vision analysis (Screen context)
    • Native Audio Streaming (Low latency voice, no 3rd party TTS required)
  • 🤖 The Body — Nut.js
    Grants OS-level automation:

    • Mouse control & Keyboard bridge
    • Clipboard integration (for instant typing)
    • Application workflows
  • 🎮 The Soul (UI) — PixiJS & React/Tailwind

    • PixiJS: Renders pixel-art characters and physics.
    • React: Powers the new Settings Wizard and FAB controls.

Challenges we ran into

  • The Click-Through Paradox
    Characters needed to be clickable — empty space needed to be transparent.
    We dynamically toggle Electron’s setIgnoreMouseEvents in milliseconds.

  • Smart Context Switching
    Teaching the AI when to "type" versus when to "speak". We built a specialized bridge that allows Glitch to paste long texts instantly into Notepad rather than painfully typing character-by-character.

  • Safety in Automation
    OS control is powerful — and dangerous.
    We built:

    • A global STOP protocol
    • Verified visual context before acting

Accomplishments we’re proud of

  • Seamless OS-level overlay that feels native.
  • Removing complex dependencies (ElevenLabs) to make it pure Gemini-powered.
  • A character with personality, not just intelligence (Sarcastic, Funny, or Professional).
  • A companion that feels present, playful, and productive.


What we learned

  • Multimodal is the Future
    Giving AI sight changes everything.

  • Personality Drives Adoption
    Users engage more when AI feels alive, makes jokes, and has a unique voice.

  • Simplicity Wins
    Consolidating voice and reasoning into one model drastically reduced latency.


What’s next for Glitch

  • 🧠 Long-Term Memory
    Vector databases (Pinecone) to remember projects across weeks.

  • 🧩 Skill Ecosystem
    Developer-written plugins: Spotify control, Git commits, CI helpers.

  • 👥 Multi-Agent Teams
    Multiple on-screen characters collaborating:

    • Designer
    • Developer
    • Researcher

Glitch isn’t just an assistant.
He’s a presence.

Built With

Share this project:

Updates