Inspiration

I type constantly, essays, code comments, messages, forms, but my thoughts often come faster than my fingers. Tools like voice dictation exist, but most feel clunky: you have to switch apps, fix "um"s and false starts, and manually copy-paste the result.

I wanted something that feels invisible, hold a key, speak, and polished text appears wherever you're already typing. That’s how JustSayIt was born.

What it does

Just Say It is a desktop voice typing utility for Windows. You hold Right Ctrl(or click to record), speak naturally, and the app:

  1. Transcribes your speech in real time using Deepgram
  2. Cleans the transcript with Google Gemini removing filler words, fixing grammar, keeping your tone
  3. Delivers the result by auto-pasting into your active app, or showing a minimal floating card if paste isn’t possible

A small cursor overlay shows what’s happening: Listening → Cleaning → Pasting → Done.

How I built it

The app is an Electron + React + TypeScript desktop application in a Turborepo monorepo.

Frontend (renderer)

  • React hooks manage the full voice session lifecycle
  • Real-time audio streams to Deepgram over WebSocket (nova-3)
  • Minimal, utility-style UI inspired by tools like Raycast and Wispr Flow

Backend (Electron main process)

  • IPC bridges transcription, AI cleanup, and text delivery
  • Global push-to-talk via node-global-key-listener and a Windows Right Ctrl poller
  • Auto-paste uses the clipboard + simulated Ctrl+V, with logic to detect whether the focused window is safe to paste into
  • Three renderer surfaces: main window, cursor overlay HUD, and floating result card

AI pipeline

  • Deepgram: live speech-to-text
  • Google Gemini: transcript cleanup with a custom system prompt tuned for natural, sendable text

What I learned

  • Building system-level desktop UX is very different from web apps, window focus, global hotkeys, and paste timing all matter
  • Real-time transcription needs careful state management (recording → cleaning → pasting → idle) so the UI never fights the pipeline
  • The best voice tools stay out of the way, small overlays and whitespace beat big dashboards

Challenges

  • Global hotkey reliability on Windows, Right Ctrl needed a dedicated poller alongside the key listener to feel responsive
  • Paste without breaking focus hiding the overlay, restoring the target window, and timing clipboard writes before simulating Ctrl+V
  • Latency vs. quality: streaming transcription for speed, then a separate Gemini pass for polish before delivery
  • Graceful fallback: when auto-paste isn’t possible, showing cleaned text in a floating card instead of failing silently

Built With

Share this project:

Updates