Inspiration

Existing screen readers read everything word-for-word with no understanding of the content. Visually impaired users can't skim, ask questions, or get summaries. They are forced to listen to every word linearly, making complex pages like research papers, math textbooks, or news articles exhausting to navigate.

We wanted to build something better. A tool that doesn't just read the web, but actually understands it.

What it does

VoiceLens is a Chrome extension that layers AI on top of native VoiceOver to make web content truly accessible and interactive for visually impaired users. Instead of robotic text-to-speech, users get an intelligent reading companion that summarizes and restructures page content into digestible audio, answers questions mid-read, and adapts to each user's reading preferences.

Our extension intercepts page content, processes it through Gemma 4, and delivers it back through ElevenLabs TTS with:

  • Page structure briefing on entry
  • Paragraph-by-paragraph interactive reading
  • Mid-read Q&A ("What does this mean?")
  • Math & image description via Gemma 4 Vision. For example, expressions like \(x^2 + 2x - 3\) are converted to natural language before being read aloud
  • Voice and speed control via keyboard shortcuts
  • Export page summaries in Braille-ready PDF format
  • In-text subtitle overlay for low-vision users

Designed specifically for visually impaired students accessing educational content like Khan Academy, textbooks, and research papers.

How we built it

  • Chrome Extension (MV3) — content.js handles DOM parsing, ARIA live region injection, floating UI, and subtitle overlay
  • Gemma 4 API — page classification, summarization, Q&A, and image/math description via Vision API
  • ElevenLabs TTS — natural voice output with stereo separation between agent voice and content
  • Web Speech API — voice input recognition for mid-read Q&A and user commands
  • Flask backend — API endpoints for Gemma integration and page analysis pipeline
  • Figma Make — UI prototyped before any code was written

Challenges we ran into

  • ARIA + VoiceOver integration — getting Chrome's native VoiceOver to reliably read dynamically injected ARIA live regions required understanding the difference between aria-live="polite" and aria-live="assertive", and carefully managing DOM injection timing to avoid race conditions
  • Keyboard shortcut conflicts — shortcuts behave differently when the popup is open vs. closed, requiring separate keydown listeners in both content.js and popup.js to cover both states
  • ElevenLabs latency — managing audio chunking and pre-fetching paragraphs ahead of time to minimize perceived delay during reading
  • Chrome popup height limit — capped at 600px, requiring careful spacing and layout optimization to fit all controls without scrolling

Accomplishments that we're proud of

  • Built a fully keyboard-navigable extension where no mouse interaction is required at any point
  • Successfully injecting AI-processed content into ARIA live regions that native VoiceOver reads reliably without interrupting the user's flow
  • High-contrast subtitle overlay with real-time word highlighting designed specifically for low-vision users
  • Natural multi-voice TTS with stereo separation between AI agent voice and page content voice
  • Prototyped, iterated, and validated the entire UI in Figma Make before writing a single line of code

What we learned

  • Accessibility-first design forces you to rethink UX from the ground up
  • For visually impaired users, what they hear is their entire experience. There is no visual fallback
  • Keyboard shortcuts are not a secondary feature. For our users, they are the only interface that matters
  • Prototyping in Figma Make before coding saved us from rebuilding the UI multiple times

What's next for VoiceLens

  • Public release on the Chrome Extension Store
  • Support for PDF and document formats
  • Personalized reading profiles saved per user
  • Real-time collaborative reading sessions
  • Expanded language support via ElevenLabs multilingual
  • Mobile app version for iOS and Android

Built With

Share this project:

Updates