LookOut: AI Guidance for Blind Navigation

Blindfolded user (assuming visually impaired/ low vision) navigating surroundings through Lookout app
Blindfolded user (assuming visually impaired/ low vision) navigating surroundings through Lookout app
LookOut App running with LiveKit Agent

Inspiration

Over 285 million people globally live with vision impairment, out of which 39 million are completely blind. Every single day, they face obstacles most of us never notice, like a misplaced chair, a stairwell with no railing, a silent car approaching from the side, and more. Our goal was to deliver a solution that delivers real-time, actionable guidance, translating vision into safe and clear next steps. By combining the phone’s camera, depth sensing, and AI voice interaction, it helps people move safely with confidence, empowering Accessibility for ALL.

What it does

LookOut listens to the user’s voice input, scans the surroundings through the phone camera, and uses an AI model to detect obstacles in real time. It then responds with short, clear directions on where to go, using depth and clock-based cues to describe obstacles and safe paths. The system is designed for low latency so guidance feels instant and natural, giving blind and low-vision users the confidence to move safely.

How we built it

We decided to try LiveKit for low latency and first implemented and tested our setup in the Agents Playground. From there, we leveraged a Python agent on LiveKit Agents to receive video streams, run reasoning with the Gemini Live API, and send back speech within the same session. We also used LiveKit’s data channel for basic controls such as start and stop listening and switching between front and rear cameras.

Challenges we ran into

At first we built a setup using Omi hardware for voice and a separate camera integration on web side using Next.js, React, and TypeScript, but connecting the two with low latency proved difficult and often left the user waiting. When we switched fully to LiveKit , the system became faster, more stable, and far more reliable. One of our hardest challenges was keeping the camera feed and spoken responses in sync, especially when the network was weak. After carefully tuning the audio and video pipeline we finally achieved a smooth, natural flow that felt responsive instead of robotic. Just as critical, we realized that long or cluttered instructions could overwhelm the user. The model had to speak in short, precise, and accurate phrases that guided the way without confusion. Through this process we also learned how refining the system prompt itself improved accuracy. By teaching the model to focus on depth and clock-based directions, it began to describe obstacles and safe paths in a way that blind and low-vision users could trust instantly.

Accomplishments that we're proud of

We are proud that we turned an ambitious idea into a working prototype within the hackathon. Our biggest accomplishment was achieving real-time guidance with low latency, where the camera feed, AI reasoning, and spoken responses stayed in sync. We successfully integrated LiveKit with the Gemini Live API and refined the system prompt so the model could deliver short, accurate, and natural instructions instead of overwhelming users. Another accomplishment was designing the interaction to feel intuitive: the app listens to the user, scans the environment, and replies instantly with depth and clock-based cues. Most importantly, we built something that has the potential to replace hesitation with confidence for blind and low-vision users!

What we learned

We learned how challenging and important it is to keep audio, video, and model responses perfectly in sync. At first, small delays made the interaction feel unreliable, but through testing and fine-tuning we discovered how to optimize LiveKit for low latency, so guidance feels instant and natural. We also learned how much the quality of the system prompt affects the model’s performance. By refining the prompt to focus on short, accurate, and context-aware directions, we improved both the clarity and reliability of the responses. Together, these lessons taught us not only technical skills in streaming and real-time AI but also how critical it is to design for user trust and confidence.

What's next for Lookout: AI Guidance for Blind Navigation

The next step for LookOut is integrating maps and destination-based navigation. Right now, the system guides users safely through their immediate surroundings, but we envision combining that with turn-by-turn directions so a user can choose a destination and receive both obstacle avoidance and route guidance in one experience. This will allow LookOut to not only prevent collisions but also help blind and low-vision users reach where they need to go with independence and confidence.

Video- https://www.facebook.com/61582589294855/videos/1139565257807161/