VisionAid

Seeing the world through sound

Inspiration

We wanted to build a tool that empowers visually impaired individuals to understand their surroundings in real time. Inspired by real-world accessibility challenges, we aimed to combine AI-powered image understanding with voice control and audio feedback—making a fully hands-free assistant.

What it does

VisionAid lets users say "capture" to take a picture from a live camera feed. It then uses AI to generate a description of what’s in the image and speaks the description aloud—enabling visually impaired users to “see” through sound.

How we built it

We used:

Flask for building the backend API
Python for backend logic and AI integration
Gemini’s vision model to generate image descriptions
MediaDevices API (via JavaScript) to access the webcam
Web Speech API for speech recognition (to detect "capture") and speech synthesis
Gunicorn for production-level deployment
Render for backend hosting

Challenges we ran into

Integrating voice input with real-time camera capture
Ensuring browser permissions for microphone and camera worked reliably across platforms
Deploying the backend and frontend to work seamlessly together
Managing API calls to return timely and accurate descriptions

Accomplishments that we're proud of

Achieving fully hands-free functionality with a single voice command
Creating a real-time assistive experience using camera, AI, and audio
Seamless interaction between frontend and backend services
A usable solution for people who rely on sound over sight

What we learned

How to integrate voice, vision, and audio feedback into one smooth workflow
How to handle asynchronous browser APIs like webcam and voice
Real-world accessibility testing principles

What's next for VisionAid

Adding OCR to read printed or handwritten text
Object detection to highlight specific items in the frame
Packaging as a mobile app for real-world portability
Multi-language voice support for accessibility across regions

Built With

Updates

Kritarth Srivastava started this project — May 30, 2025 07:43 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.