๐Ÿ’ก Inspiration

It was the night before my exam, and I was deep in debugging hell. My workflow looked like this:

  1. Screenshot VS Code terminal
  2. Upload to ChatGPT
  3. Get advice to try something I'd already done
  4. Run out of tokens
  5. Repeat until sunrise

I realized the problem wasn't the AIโ€”it was the process. What if the AI could just see my screen instead of me explaining it?

After my exams, I spent 4 months grinding on this idea. Testing combinations. Breaking things. Rebuilding. The result? Octateโ€”an AI that saves me time by actually seeing what I'm working on.

No more screenshots. No more context-pasting. Just press a hotkey and ask.


๐ŸŽฏ What It Does

Octate is an AI assistant powered by Gemini 2.5 Flash with custom prompting tuned for:

  • ๐Ÿ› Debugging โ€“ sees your errors, suggests fixes
  • ๐Ÿ’ป Coding help โ€“ explains code visually
  • ๐ŸŽ“ Learning โ€“ understands your workflow context
  • ๐ŸŽญ Roasting โ€“ optional brutally honest feedback (because bland is boring)

It lives as an invisible overlay on your screen, captures context when you ask questions, and disappears when you need privacy.

Core features:

  • Screenshot-based AI assistance with full visual context
  • Ghost mode for instant invisibility during screen shares
  • Hotkey-driven workflow (zero mouse dependency)
  • Free forever (bring your own Gemini API key)

๐Ÿ› ๏ธ How We Built It

Tech Stack:

  • Electron.js โ€“ Cross-platform desktop app framework
  • React.js โ€“ UI components and state management
  • Tailwind CSS โ€“ Utility-first styling
  • Supabase โ€“ Backend, database, and real-time features
  • OAuth โ€“ GitHub and Google authentication for seamless login
  • Gemini 2.5 Flash API โ€“ Vision-capable AI model

Architecture:

  • Persistent overlay window with always-on-top functionality
  • IPC (Inter-Process Communication) handlers for global keyboard shortcuts
  • Screenshot capture pipeline with automatic context injection
  • Click-through regions for non-intrusive user experience

๐Ÿšง Challenges We Ran Into

1. Overlay Engineering Building a window that stays on top of everything while remaining responsive was harder than expected:

  • Balancing always-on-top behavior with user control
  • Implementing click-through buttons (some interactive, some passthrough)
  • Preventing the overlay from interfering with other apps

2. Keyboard Shortcut Registration The IPC handler system for global shortcuts required extensive testing:

  • Handling edge cases (app minimized, focus stolen, multiple monitors)
  • Preventing shortcut conflicts with other applications
  • Ensuring shortcuts work consistently across Windows/Mac/Linux

3. Screenshot Timing Capturing the screen before the overlay appears (to avoid capturing itself) required precise timing coordination between the main and renderer processes.


๐Ÿ† Accomplishments We're Proud Of

โœ… Octate exists and works โ€“ 80% of the original vision is now reality
โœ… The roast feature โ€“ adds personality and makes debugging less painful
โœ… Ghost mode execution โ€“ instant invisibility actually works flawlessly
โœ… Free and accessible โ€“ no paywalls, no subscriptions, no gatekeeping
โœ… Didn't give up โ€“ navigated complex technical challenges without compromising the core vision

The biggest win? Watching Octate solve in 10 seconds what used to take me 10 minutes of screenshot-upload-explain cycles.


๐Ÿ“š What We Learned

Technical:

  • Electron's IPC communication patterns for complex desktop apps
  • Balancing overlay UX with system-level window management
  • Integrating vision-capable AI models with real-time workflows

Personal:

  • Persistence matters more than perfection
  • Complex stacks and edge cases don't define youโ€”your solutions do
  • Sometimes the best features come from your own frustrations

Philosophy: Your project reflects who you are. When the code gets complicated and the bugs feel endless, the choice to keep building anyway is what separates ideas from shipped products.


๐Ÿš€ What's Next for Octate

๐ŸŽ™๏ธ Voice Recognition & Context Memory

The next major feature: voice-driven assistance with conversational context.

How it works:

  • Transcribe conversations happening on-screen and off-screen
  • Understand spoken debugging sessions during pair programming
  • Store conversational context in the database
  • Self-train on user patterns for increasingly personalized assistance

Use cases:

  • "Hey Octate, what did we just discuss about that API endpoint?"
  • Transcribe whiteboard sessions during team calls
  • Build a searchable knowledge base from your own debugging conversations

๐Ÿ”ฎ Other Planned Features

  • ๐Ÿ”Œ Plugin system for custom workflows
  • ๐Ÿ’พ Offline mode with local AI models
  • ๐ŸŒ Multi-language support for international codebases
  • ๐Ÿ“Š Context memory across sessions

Built With

Share this project:

Updates