Inspiration

One in four U.S. adults lives with a disability, yet 96.8% of home pages still have detectable accessibility issues. Screen readers surface what's on the page, every decorative link, every footer ad, every "SUBSCRIBE" modal, but never what matters.

Citrus Hack's theme this year is Operation: Innovation. The web is a wild place, and what users actually need is a field handler, a calm voice in the ear that reads the room, flags pop-ups, and executes your commands. Not another boring screen reader. A mission briefing, like a special agent.

What it does

Ghost is a Chrome extension (MV3 compliant). Press Alt+Shift+G on any page and:

  1. It analyzes the scene. analyzer.js walks the DOM and builds a structured scene graph, landmarks, headings, actionable targets, adversarial modals.
  2. It briefs you. ai.js sends the scene graph to Gemini 3 Flash, which returns a short, spy-handler-style briefing of the page.
  3. It speaks. voice.js streams the briefing through ElevenLabs. If the network hiccups, SpeechSynthesis takes over mid-breath.
  4. It listens. speech.js opens the Web Speech API and accepts natural commands, "open the first article", "click sign up", "type transformers into search", "stop".
  5. It renders a HUD. hud.js displays a noir shadow-DOM overlay with scanlines, radar sweep, threat meter, and a live transcript. Host pages cannot style, block, or even detect it.

There's a one-click "Try a demo!" button that runs a 9-step staged tour.

How we built it

  • Chrome Manifest V3 service worker proxies every API call so keys are never exposed to the host page.
  • Vanilla JavaScript, zero dependencies
  • Shadow DOM HUD, so styles from hostile host pages can't leak in and we can't leak out.
  • Gemini 3 Flash for multimodal page reasoning. We pass a compact scene graph (not the raw HTML) so the prompt stays under or about 7 KB, keeping Flash fast and affordable while accurate.
  • ElevenLabs streaming with cache in the service worker so repeat lines play instantly.
  • Web Speech API wrapped in an error-count guard that auto-restarts up to 3× before falling back to silent mode.
  • Deterministic fallbacks at every layer, heuristic briefings when Gemini is down, SpeechSynthesis when ElevenLabs is down. The handler never goes silent.

Challenges we ran into

  • Shadow-DOM remount loop. On some pages, a style recalculation made contains return false while the element was still live. The HUD re-created itself every frame and reset all animations. Fix: check shadowHost.isConnected instead. Some edge cases still need testing.
  • Demo risk. Every hackathon demo dies on network. We spent extra time building a scripted demo path so the pitch works offline, then baked it into a one-click "Try a demo!" button.
  • ElevenLabs streaming across navigations. Audio kept playing into the next page. We added an abort registry keyed by tab ID.
  • Longer query We added a local override that checks our fuzzy-match confidence against Gemini's pick and wins when the local confidence is > 0.6.

Accomplishments that we're proud of

  • 400 ms time-to-first-word. Press the shortcut, hear the handler. Faster than most native screen readers start speaking, in spite of internet dependencies.
  • - Zero dependencies, under 120 KB. Vanilla JS, no build step
  • It works on real, variable sites! Unintuitive and hard-to-use sites still allow this extension to work with ease

What we learned

  • Shadow DOM is the most underrated browser feature for extension builders, like used in Ghost
  • For accessibility, latency is the product. A perfect briefing that takes 3 seconds to start is worse than a good-enough one that starts in 400 ms.

What's next for Ghost

  • Firefox + Safari ports via the WebExtension polyfill.
  • Per-site memory, "last time you were on this page, you opened the first article".
  • PDF and Google Docs narration.

Built With

Share this project:

Updates