Inspiration
We wanted a fast, reliable way for blind and low-vision users to understand their surroundings in real time. Our goal: turn sight into timely, actionable audio guidance with minimal latency.
What it does
Computer vision detects objects, estimates their left-center-right position, and computes a risk score. The backend returns a concise text alert or scene summary, which is spoken by ElevenLabs.
How we built it
React front-end streams camera frames. A FastAPI backend runs YOLOv8 via PyTorch for real-time object detection and a custom risk-scoring pipeline. The backend returns text; ElevenLabs converts it to voice. Gemini API analyzes user snapshots.
Challenges we ran into
Stabilizing real-time inference and keeping latency low
Tuning thresholds to cut false positives while preserving safety
Managing noisy camera input and variable lighting
Accomplishments that we're proud of
Leveraging queue data structure to handle live input stream
End-to-end low-latency loop from camera to spoken alert
Clear positional cues like “car on your left” in real time
Modular design that’s easy to extend with tracking or depth
What we learned
Balancing model speed and accuracy, building robust streaming pipelines, and designing voice output that’s brief, clear, and helpful under pressure.
What’s next for MIRA
Develop with React Native to make a mobile app for Mira. Run on a Raspberry Pi for edge use, add lightweight tracking and depth cues, and refine haptic or spatial audio feedback for even faster reactions.
Log in or sign up for Devpost to join the conversation.