echo guide in action

EchoGuide

One-Liner

EchoGuide is an AI-powered accessibility tool that helps blind and low-vision users understand and navigate their surroundings through real-time computer vision and voice guidance.

Inspiration

For people with visual impairments, navigating unfamiliar environments can be challenging. While tools such as white canes and guide dogs are incredibly useful, they cannot explain what objects are nearby, where an exit is located, or what is blocking a path.

We wanted to see how modern AI could provide that missing layer of information using only a smartphone camera and audio feedback.

What It Does

EchoGuide uses a phone's camera to analyze the user's surroundings and provide spoken guidance.

Hazard Detection

The system continuously monitors the camera feed using a local YOLOv8 model. If it detects a potential hazard, it immediately warns the user.

Examples:

"Obstacle ahead."
"Bicycle detected."
"Vehicle approaching."

Voice Interaction

Users can activate EchoGuide with a wake phrase and ask questions such as:

"What is in front of me?"
"Where is the nearest exit?"
"Is there a clear path ahead?"

The current camera frame and the user's question are sent to Gemini, which generates a contextual response describing the environment and relevant spatial information.

How We Built It

Our system uses a FastAPI backend with several AI components working together:

OpenCV for video processing
YOLOv8 for local hazard detection
Vosk for offline wake-word detection
Gemini 2.5 Flash for scene understanding and reasoning
ElevenLabs for voice output

By separating local detection from cloud-based reasoning, we were able to keep response times low while still generating detailed descriptions.

Biggest Challenge

One of the hardest problems we worked on was determining the user's position inside a building accurately.

Unlike outdoor navigation, indoor environments don't have reliable GPS data. We explored several approaches:

Using Gemini to reason about the user's location from camera images
Estimating location using Wi-Fi signal strengths and access points
Creating a coordinate-based system using landmarks and room features

While each approach showed promise, achieving consistent indoor positioning accuracy proved much harder than expected. This remains one of the biggest areas for future development and one of the most interesting technical challenges we encountered during the project.

What We Learned

One thing we learned quickly is that object detection alone is not enough.

Saying:

"Chair. Table. Door."

is far less useful than:

"There is a table directly ahead and a doorway slightly to your left."

Providing context, spatial relationships, and actionable information makes a much bigger difference for accessibility.

We also gained experience building real-time AI systems that combine computer vision, speech recognition, and cloud-based reasoning.

What's Next

Future improvements we would like to explore include:

Smart glasses and wearable camera integration
More accurate indoor positioning and navigation
Spatial audio cues for hazards and directions
Greater on-device processing for offline use
Navigation assistance to specific rooms, exits, or landmarks

EchoGuide is currently a proof of concept, but it demonstrates how combining computer vision, speech, and multimodal AI can create a more accessible way for people to interact with the world around them.