Inspiration

We wanted to build a tool that empowers visually impaired individuals to understand their surroundings in real time. Inspired by real-world accessibility challenges, we aimed to combine AI-powered image understanding with voice control and audio feedback—making a fully hands-free assistant.

What it does

VisionAid lets users say "capture" to take a picture from a live camera feed. It then uses AI to generate a description of what’s in the image and speaks the description aloud—enabling visually impaired users to “see” through sound.

How we built it

We used:

  • Flask for building the backend API
  • Python for backend logic and AI integration
  • Gemini’s vision model to generate image descriptions
  • MediaDevices API (via JavaScript) to access the webcam
  • Web Speech API for speech recognition (to detect "capture") and speech synthesis
  • Gunicorn for production-level deployment
  • Render for backend hosting

Challenges we ran into

  • Integrating voice input with real-time camera capture
  • Ensuring browser permissions for microphone and camera worked reliably across platforms
  • Deploying the backend and frontend to work seamlessly together
  • Managing API calls to return timely and accurate descriptions

Accomplishments that we're proud of

  • Achieving fully hands-free functionality with a single voice command
  • Creating a real-time assistive experience using camera, AI, and audio
  • Seamless interaction between frontend and backend services
  • A usable solution for people who rely on sound over sight

What we learned

  • How to integrate voice, vision, and audio feedback into one smooth workflow
  • How to handle asynchronous browser APIs like webcam and voice
  • Real-world accessibility testing principles

What's next for VisionAid

  • Adding OCR to read printed or handwritten text
  • Object detection to highlight specific items in the frame
  • Packaging as a mobile app for real-world portability
  • Multi-language voice support for accessibility across regions
Share this project:

Updates