Inspiration
Vision Assist was inspired by the gap between traditional screen readers and the way people actually want to understand the web. A normal screen reader can read everything on a page, but that often means listening through navigation menus, buttons, repeated links, ads, and long blocks of text before reaching the main idea. I wanted to build an AI assistant that gives blind and visually impaired users a faster first understanding of a page: what type of page it is, what sections matter, what images show, and what the user might want to do next.
What it does
Vision Assist is a Chrome extension and backend API that turns a web page into a concise spoken overview. The extension captures the active page, sends the HTML to a local backend, and receives an AI-generated summary, page type, navigable sections, image descriptions, and follow-up Q&A support. Users can summarize a page, jump to detected sections, listen to image descriptions, ask questions by typing or speaking, and stop audio whenever they want.
How we built it
I built the backend with Node.js, Express, Cheerio, CORS, dotenv, OpenAI, Google Gemini, and Google TTS. Cheerio extracts readable page text and removes noisy elements such as scripts, nav bars, forms, headers, footers, and buttons. The AI layer supports both OpenAI and Gemini, so the project can run with either provider. The Chrome extension uses Manifest V3, content scripts, background service workers, popup UI, Chrome storage, keyboard shortcuts, text-to-speech playback, and browser speech recognition.
Challenges we ran into
The biggest challenge was balancing usefulness with safety. Sending full page content to an AI model can create privacy risks, especially for financial, medical, login, or personal pages. To address that, I added both a domain blocklist and a content-level sensitive-data detector. Another challenge was making the response useful for screen-reader users without overwhelming them, so I focused the output on short spoken summaries, section hints, and direct Q&A instead of dumping every line of text.
Accomplishments that we're proud of
This the first time we created a chrome extension so we were happy to try something new for this hackathon.
What we learned
I learned how to combine browser extension APIs, backend page processing, LLM summarization, vision-based image descriptions, and text-to-speech into one assistive workflow. I also learned that responsible AI is not just about model choice; it requires product decisions like user confirmation, limited context, fallback behavior, and clear boundaries on what the AI should not do.
What's next for Vision Assist
We are going to publish our chrome extension and regularly improve it whilst trying to market it to those in need.
Built With
- cheerio
- chrome-extension-manifest-v3
- chrome-storage
- chrome-web-speech-api
- cors
- css
- dotenv
- express.js
- google-gemini-api
- google-tts-api
- html
- javascript
- node.js
- openai-api
Log in or sign up for Devpost to join the conversation.