📱 AI-Powered Vision Assistant

A mobile app where blind and low-vision users can capture their surroundings, and the app will speak aloud the information found in the scene, such as signs, flyers, labels, and other textual content, giving users real-time spoken insight into the visual world around them.

🔭 Current Scope

  • Capture a single image using the phone’s camera
  • Send the image to a backend server via MCP
  • Backend server sends data to Gemini API
  • Extracted information is returned as JSON
  • App reads the text aloud via Text-to-Speech (TTS)

💡Inspiration:

This idea came from a mix of things we’ve learned and observed.

In our digital design classes, we often talked about the importance of accessibility, designing not just for the majority, but for everyone. That got us thinking about how technology could help people with visual impairments experience the world more independently.

We imagined what it’s like to walk down the street and not be able to read signs, flyers, or menus. Something as simple as knowing what a sign says shouldn't be a barrier. We wanted to build a tool that could give that information back, through audio, in a way that’s fast, intuitive, and empowering.

Greek Goddess Theme ✨✨

The name Delphi connects to the ancient Greek Oracle of Delphi, known for providing guidance and insight. Similarly, the app acts as a “modern oracle” for blind and low-vision users, translating visual information into spoken guidance about their surroundings.

🧠Challenges:

Understanding User Needs:

  • None of us are blind or low-vision, so we had to empathize deeply to design appropriately
  • Research and accessibility guidelines helped, but real user feedback is critical

Back-End:

  • First time working with MCP, learning curve in understanding server communication and image handling
  • Troubleshooting Gemini API integration and output formatting

Front-End:

  • Making the camera work reliably across devices
  • Handling image resolution, lighting issues, and UI feedback for image capture

🛠️Tech Stack:

Front-End:

  • React / React Native

Back-End:

  • Gemini API
  • MCP Server
  • Python

Built With

Share this project:

Updates