Vision

Inspiration

Vision was inspired by a real and personal need. My brother is blind, and observing how many everyday tasks still require assistance or multiple disconnected tools highlighted a gap in truly integrated accessibility solutions. During the Google Gemini Hackathon, I saw an opportunity to explore how modern multimodal AI could reduce that friction and support independence through a single, voice-first interface.

What it does

Vision is a browser-based AI assistant designed for visually impaired users. It enables voice-driven interaction to help users manage daily tasks such as scheduling, tracking health information, monitoring finances, locating personal items, and accessing general assistance. Vision prioritizes simplicity, clarity, and accessibility by combining speech recognition, text-to-speech feedback, and AI-powered reasoning into one unified experience.

How I built it

The prototype was built as a web application using HTML, CSS, and JavaScript, with accessibility as a core design principle. Voice input is handled through browser speech recognition, responses are delivered using text-to-speech, and user data is stored locally for persistence. Google Gemini is used as the reasoning engine, allowing Vision to understand context, interpret natural language, and provide meaningful responses tailored to the user’s current data and needs.

Challenges we ran into

One of the main challenges was securely integrating an AI model while maintaining a browser-based experience. Ensuring accessibility while managing speech timing, preventing overlapping audio, and handling browser compatibility also required careful iteration. Additionally, designing features that feel helpful without overwhelming users demanded a strong focus on user-centered design rather than feature quantity.

Accomplishments that I'm proud of

I am proud to have built a functional, accessibility-first AI assistant prototype within a hackathon timeframe. Vision demonstrates how voice, context, and AI reasoning can work together to support real-world independence. Most importantly, the project is grounded in an authentic use case rather than a purely theoretical problem.

What I learned

I learned that accessibility-focused design must be intentional from the start, not added later. Small technical decisions, such as how and when audio feedback is delivered—have a major impact on usability. I also gained hands-on experience working with multimodal AI and learned how to responsibly architect AI-powered systems with user safety and data privacy in mind.

What's next for Vision

Next steps include expanding real-world testing with visually impaired users, refining voice interactions, and moving the AI integration to a fully secure backend architecture. We also plan to explore camera-based scene description, object recognition, and emergency assistance integrations. Long-term, Vision aims to become a reliable daily companion that meaningfully improves independence and quality of life.

Built With

css
dom
github
google-gemini-apis
html
javascript

Updates

Doctor Mkhonza started this project — Jan 18, 2026 05:12 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.