Helios

Inspiration

Real-time vision-to-audio is a fascinating technical challenge that's been developing rapidly. We wanted to push the boundaries of what's possible, not just describing scenes, but actually guiding someone through space in real-time. Traditional blind assistance tools detect obstacles at arm's length (canes) or cost $50K+ (guide dogs). We asked: what if your phone could be an intelligent guide that sees ahead and speaks naturally?

What it does

Helios is a mobile app that gives blind users real-time spatial awareness through their phone's camera. Point the phone forward, put in earbuds, and walk:

  • Proactive navigation: "Chair left, keep right" - warns before you hit obstacles
  • Conversational AI: Say "Helios, where's the door?" and get natural answers
  • Motion-aware: Only speaks when you're moving, stays quiet when you stop
  • Spatial memory: Remembers objects seen in the last 30 seconds for contextual answers
  • Facial Recognition: Remembers faces of people you interact with - "Helios, this is Ben"

How we built it

  • YOLO11x for real-time object detection with distance estimation using camera calibration
  • Google Gemini 3.0 for natural language guidance and conversational Q&A
  • Custom heuristics engine that decides when to speak based on urgency (emergency → alert → guidance → info)
  • Dual-pipeline architecture: Vision pipeline runs continuously at 1 FPS; conversation pipeline activates on wake word
  • React Native + Expo for the iOS app with native camera access
  • Python/FastAPI backend with Socket.IO for real-time frame streaming

Challenges we ran into

  • Latency vs. accuracy tradeoff: Gemini adds ~1-2 seconds, but provides natural guidance. We built a heuristics layer to minimize unnecessary API calls.
  • Spatial data representation: Translating bounding boxes into useful directions ("chair left, 4 feet") required custom distance estimation using iPhone camera intrinsics.
  • Motion detection: Preventing the app from constantly talking when standing still—solved with accelerometer/pedometer integration.
  • Coordinating rapid development: 4 people, 24 hours, constantly changing architecture. Git conflicts were real.

Accomplishments that we're proud of

  • Object detection reliably identifies obstacles at varying distances
  • The heuristics engine dramatically reduces alert fatigue. It only speaks when it matters
  • Wake word + conversational AI feels genuinely useful ("Hey Helios, is there a chair nearby?")
  • Built something we'd actually want to use

What we learned

  • Real-time AI assistance is hard. Latency is the enemy
  • Heuristics matter as much as ML models. Knowing when to speak is half the problem.
  • iPhone camera calibration math (focal length, sensor size) for distance estimation
  • The importance of failing fast and pivoting (we scrapped 2 approaches before landing on the current architecture)

What's next for Helios

  • On-device YOLO: Eliminate network latency entirely using Core ML
  • Template-based fast alerts: Pre-computed phrases for sub-100ms obstacle warnings
  • Indoor mapping: Remember layouts of frequently visited places

Built With

Share this project:

Updates