EchoGuide
One-Liner
EchoGuide is an AI-powered accessibility tool that helps blind and low-vision users understand and navigate their surroundings through real-time computer vision and voice guidance.
Inspiration
For people with visual impairments, navigating unfamiliar environments can be challenging. While tools such as white canes and guide dogs are incredibly useful, they cannot explain what objects are nearby, where an exit is located, or what is blocking a path.
We wanted to see how modern AI could provide that missing layer of information using only a smartphone camera and audio feedback.
What It Does
EchoGuide uses a phone's camera to analyze the user's surroundings and provide spoken guidance.
Hazard Detection
The system continuously monitors the camera feed using a local YOLOv8 model. If it detects a potential hazard, it immediately warns the user.
Examples:
- "Obstacle ahead."
- "Bicycle detected."
- "Vehicle approaching."
Voice Interaction
Users can activate EchoGuide with a wake phrase and ask questions such as:
- "What is in front of me?"
- "Where is the nearest exit?"
- "Is there a clear path ahead?"
The current camera frame and the user's question are sent to Gemini, which generates a contextual response describing the environment and relevant spatial information.
How We Built It
Our system uses a FastAPI backend with several AI components working together:
- OpenCV for video processing
- YOLOv8 for local hazard detection
- Vosk for offline wake-word detection
- Gemini 2.5 Flash for scene understanding and reasoning
- ElevenLabs for voice output
By separating local detection from cloud-based reasoning, we were able to keep response times low while still generating detailed descriptions.
Biggest Challenge
One of the hardest problems we worked on was determining the user's position inside a building accurately.
Unlike outdoor navigation, indoor environments don't have reliable GPS data. We explored several approaches:
- Using Gemini to reason about the user's location from camera images
- Estimating location using Wi-Fi signal strengths and access points
- Creating a coordinate-based system using landmarks and room features
While each approach showed promise, achieving consistent indoor positioning accuracy proved much harder than expected. This remains one of the biggest areas for future development and one of the most interesting technical challenges we encountered during the project.
What We Learned
One thing we learned quickly is that object detection alone is not enough.
Saying:
"Chair. Table. Door."
is far less useful than:
"There is a table directly ahead and a doorway slightly to your left."
Providing context, spatial relationships, and actionable information makes a much bigger difference for accessibility.
We also gained experience building real-time AI systems that combine computer vision, speech recognition, and cloud-based reasoning.
What's Next
Future improvements we would like to explore include:
- Smart glasses and wearable camera integration
- More accurate indoor positioning and navigation
- Spatial audio cues for hazards and directions
- Greater on-device processing for offline use
- Navigation assistance to specific rooms, exits, or landmarks
EchoGuide is currently a proof of concept, but it demonstrates how combining computer vision, speech, and multimodal AI can create a more accessible way for people to interact with the world around them.
Log in or sign up for Devpost to join the conversation.