## Inspiration
The web was built with the assumption that everyone could see it. Layout, meaning, and hierarchy are communicated through sight and barely translated for people who can't see it. We built Live Lens because 7.5 million blind and low vision people in the US hit that wall every day, and we thought it was fixable.

## What it does
Live Lens is a Chrome extension that lets blind and low vision users ask natural language questions about any webpage.  For example, you could say "describe the image on the left" or "summarize this article" to get answers spoken back or displayed in large high-contrast text. It understands page layout, describes images using a vision model and works on every single website.

## How we built it
We built the extension in vanilla JavaScript using Chrome's Manifest V3, capturing page screenshots alongside DOM structure to give the AI spatial context about the page. Questions go to OpenAI's GPT models for vision understanding, and responses come back through Deepgram's text-to-speech API or rendered as on-screen text. The website is built with React/TSX/Vite, a simple tech stack for a quick demo.

## Challenges we ran into
Getting spatial awareness right was the hardest part. We had to map DOM element positions to regions of the screenshot so the AI could answer questions like "what's in the top right corner." Chrome's Manifest V3 restrictions around screenshot permissions also required some creative workarounds.

## Accomplishments that we're proud of
We're extremely product of the whole user flow. Voice in, vision AI, and the spoken response actually feels smooth and intuitive.

## What we learned
How broken web accessibility actually is in practice when you try to use a real page without sight. We also learned how much prompt engineering matters. The quality of the AI's answer drastically changed by how well you describe the page structure in the prompt.

## What's next for Live Lens
Given more time, we would expand this extension to other browsers. Also, we would do much more extensive testing with different reasoning and speech models to optimize speed and quality. Perhaps we could even integrate a monetization model and build this into a business.

Built With

Share this project:

Updates