Inspiration

Safe navigation is not guaranteed for blind individuals.

Globally, over 2.2 billion people live with vision impairment, and visually impaired individuals face significantly higher rates of serious accidents and injuries during everyday navigation.

Traditional mobility tools like canes and guide dogs are essential — but they cannot describe complex scenes, detect all obstacles, or provide contextual safety warnings.

We asked a simple question:

What if AI could act as a second pair of eyes?

SightLine was built to enhance independence and safety by turning wearable smart glasses into a real-time AI vision assistant.

What it does

SightLine provides blind users with real-time spoken descriptions of their surroundings using AI-powered scene understanding.

Using Meta AI glasses and a live video pipeline, the system: • Captures visual frames from the environment • Analyzes them using multimodal AI models • Prioritizes safety-relevant information • Speaks concise, actionable descriptions through the glasses

SightLine operates in two modes:

Auto Mode – Continuously analyzes the environment and provides proactive descriptions.

Call Mode – Works alongside a live call session, enabling remote assistance while still running AI scene analysis.

The goal is clarity, not noise — actionable awareness without cognitive overload.

How we built it

SightLine is built on a real-time AI processing pipeline.

Frontend • A browser dashboard captures frames from a shared video call window • Frames are sent to the backend every few seconds • Live logs and controls are displayed using WebSockets

Backend • Built with FastAPI for asynchronous, low-latency processing • Handles frame decoding, inference routing, and audio generation

Vision Inference • GPU-accelerated multimodal AI models analyze images • AMD-hosted LLaVA endpoint for fast image-to-text reasoning • Gemini Flash 2.0 as a fallback inference model

Text-to-Speech • ElevenLabs for natural, high-quality speech • Local TTS fallback for reliability

The entire system runs in near real time, converting visual scenes into spoken safety guidance.

Challenges we ran into • Reducing latency between frame capture and spoken output • Designing prompts that prioritize safety over general description • Preventing overwhelming the user with excessive audio • Managing remote GPU inference stability • Working within wearable hardware constraints

Balancing speed, accuracy, and usability was our biggest technical hurdle.

Accomplishments that we’re proud of • Building a fully functional real-time AI vision assistant in 24 hours • Achieving stable end-to-end inference and speech output • Designing an accessibility-first experience • Integrating GPU-accelerated vision models with wearable hardware • Creating a system that feels practical, not theoretical

SightLine works live — not just as a concept.

What we learned • Accessibility solutions require restraint — less information is often better • Real-time AI requires thoughtful system design, not just powerful models • Latency and user experience matter as much as accuracy • Designing for blind users forces clarity in interaction design

Most importantly, we learned that AI should augment human independence — not replace it.

What’s next for SightLine

We plan to expand SightLine to include: • Obstacle proximity detection • Head-height hazard alerts • Stair and curb detection • Traffic signal recognition • Crosswalk awareness • Fully on-device inference for wearable independence

Our long-term goal is to create an affordable, scalable spatial awareness assistant that enhances safety and independence for blind individuals worldwide.

Built With

  • amd
  • elevenlabs
Share this project:

Updates