Helios

Inspiration

Real-time vision-to-audio is a fascinating technical challenge that's been developing rapidly. We wanted to push the boundaries of what's possible, not just describing scenes, but actually guiding someone through space in real-time. Traditional blind assistance tools detect obstacles at arm's length (canes) or cost $50K+ (guide dogs). We asked: what if your phone could be an intelligent guide that sees ahead and speaks naturally?

What it does

Helios is a mobile app that gives blind users real-time spatial awareness through their phone's camera. Point the phone forward, put in earbuds, and walk:

Proactive navigation: "Chair left, keep right" - warns before you hit obstacles
Conversational AI: Say "Helios, where's the door?" and get natural answers
Motion-aware: Only speaks when you're moving, stays quiet when you stop
Spatial memory: Remembers objects seen in the last 30 seconds for contextual answers
Facial Recognition: Remembers faces of people you interact with - "Helios, this is Ben"

How we built it

YOLO11x for real-time object detection with distance estimation using camera calibration
Google Gemini 3.0 for natural language guidance and conversational Q&A
Custom heuristics engine that decides when to speak based on urgency (emergency → alert → guidance → info)
Dual-pipeline architecture: Vision pipeline runs continuously at 1 FPS; conversation pipeline activates on wake word
React Native + Expo for the iOS app with native camera access
Python/FastAPI backend with Socket.IO for real-time frame streaming

Challenges we ran into

Latency vs. accuracy tradeoff: Gemini adds ~1-2 seconds, but provides natural guidance. We built a heuristics layer to minimize unnecessary API calls.
Spatial data representation: Translating bounding boxes into useful directions ("chair left, 4 feet") required custom distance estimation using iPhone camera intrinsics.
Motion detection: Preventing the app from constantly talking when standing still—solved with accelerometer/pedometer integration.
Coordinating rapid development: 4 people, 24 hours, constantly changing architecture. Git conflicts were real.

Accomplishments that we're proud of

Object detection reliably identifies obstacles at varying distances
The heuristics engine dramatically reduces alert fatigue. It only speaks when it matters
Wake word + conversational AI feels genuinely useful ("Hey Helios, is there a chair nearby?")
Built something we'd actually want to use

What we learned

Real-time AI assistance is hard. Latency is the enemy
Heuristics matter as much as ML models. Knowing when to speak is half the problem.
iPhone camera calibration math (focal length, sensor size) for distance estimation
The importance of failing fast and pivoting (we scrapped 2 approaches before landing on the current architecture)

What's next for Helios

On-device YOLO: Eliminate network latency entirely using Core ML
Template-based fast alerts: Pre-computed phrases for sub-100ms obstacle warnings
Indoor mapping: Remember layouts of frequently visited places