Helios
Inspiration
Real-time vision-to-audio is a fascinating technical challenge that's been developing rapidly. We wanted to push the boundaries of what's possible, not just describing scenes, but actually guiding someone through space in real-time. Traditional blind assistance tools detect obstacles at arm's length (canes) or cost $50K+ (guide dogs). We asked: what if your phone could be an intelligent guide that sees ahead and speaks naturally?
What it does
Helios is a mobile app that gives blind users real-time spatial awareness through their phone's camera. Point the phone forward, put in earbuds, and walk:
- Proactive navigation: "Chair left, keep right" - warns before you hit obstacles
- Conversational AI: Say "Helios, where's the door?" and get natural answers
- Motion-aware: Only speaks when you're moving, stays quiet when you stop
- Spatial memory: Remembers objects seen in the last 30 seconds for contextual answers
- Facial Recognition: Remembers faces of people you interact with - "Helios, this is Ben"
How we built it
- YOLO11x for real-time object detection with distance estimation using camera calibration
- Google Gemini 3.0 for natural language guidance and conversational Q&A
- Custom heuristics engine that decides when to speak based on urgency (emergency → alert → guidance → info)
- Dual-pipeline architecture: Vision pipeline runs continuously at 1 FPS; conversation pipeline activates on wake word
- React Native + Expo for the iOS app with native camera access
- Python/FastAPI backend with Socket.IO for real-time frame streaming
Challenges we ran into
- Latency vs. accuracy tradeoff: Gemini adds ~1-2 seconds, but provides natural guidance. We built a heuristics layer to minimize unnecessary API calls.
- Spatial data representation: Translating bounding boxes into useful directions ("chair left, 4 feet") required custom distance estimation using iPhone camera intrinsics.
- Motion detection: Preventing the app from constantly talking when standing still—solved with accelerometer/pedometer integration.
- Coordinating rapid development: 4 people, 24 hours, constantly changing architecture. Git conflicts were real.
Accomplishments that we're proud of
- Object detection reliably identifies obstacles at varying distances
- The heuristics engine dramatically reduces alert fatigue. It only speaks when it matters
- Wake word + conversational AI feels genuinely useful ("Hey Helios, is there a chair nearby?")
- Built something we'd actually want to use
What we learned
- Real-time AI assistance is hard. Latency is the enemy
- Heuristics matter as much as ML models. Knowing when to speak is half the problem.
- iPhone camera calibration math (focal length, sensor size) for distance estimation
- The importance of failing fast and pivoting (we scrapped 2 approaches before landing on the current architecture)
What's next for Helios
- On-device YOLO: Eliminate network latency entirely using Core ML
- Template-based fast alerts: Pre-computed phrases for sub-100ms obstacle warnings
- Indoor mapping: Remember layouts of frequently visited places
Built With
- chromadb
- expo.io
- fastapi
- gemini
- react-native
- socket.io
- yolo11
Log in or sign up for Devpost to join the conversation.