Inspiration

We wanted a fast, reliable way for blind and low-vision users to understand their surroundings in real time. Our goal: turn sight into timely, actionable audio guidance with minimal latency.

What it does

Computer vision detects objects, estimates their left-center-right position, and computes a risk score. The backend returns a concise text alert or scene summary, which is spoken by ElevenLabs.

How we built it

React front-end streams camera frames. A FastAPI backend runs YOLOv8 via PyTorch for real-time object detection and a custom risk-scoring pipeline. The backend returns text; ElevenLabs converts it to voice. Gemini API analyzes user snapshots.

Challenges we ran into

Stabilizing real-time inference and keeping latency low

Tuning thresholds to cut false positives while preserving safety

Managing noisy camera input and variable lighting

Accomplishments that we're proud of

Leveraging queue data structure to handle live input stream

End-to-end low-latency loop from camera to spoken alert

Clear positional cues like “car on your left” in real time

Modular design that’s easy to extend with tracking or depth

What we learned

Balancing model speed and accuracy, building robust streaming pipelines, and designing voice output that’s brief, clear, and helpful under pressure.

What’s next for MIRA

Develop with React Native to make a mobile app for Mira. Run on a Raspberry Pi for edge use, add lightweight tracking and depth cues, and refine haptic or spatial audio feedback for even faster reactions.

Built With

Share this project:

Updates