Inspiration
We kept asking one question: what does a blind person do when they need to find something right in front of them? Not navigate a city. Not read a screen. Just find the cup on the table. Find the door handle. Find the object one meter away. Every assistive tool we looked at solved a different problem. Nothing solved that one. 2.2 billion people live with vision impairment and the space directly in front of them has never been mapped in real time. We had LiDAR in our pocket. We had the neuroscience — visual cortex activates for spatial sound in blind individuals. The brain was already wired for this. We just needed to build the signal.
What It Does
Solaura turns the iPhone Pro into a spatial awareness device for blind and low-vision users. Point the phone. Solaura detects the object using YOLOv8, maps its exact 3D position using LiDAR, and sends directional audio to your ears. Left means left. Right means right. Closer means faster. Silence means found. When your hand enters frame, the system switches modes. It stops guiding your body toward the object and starts guiding your hand directly to it — calculating the vector between your hand and the target with centimeter precision. If your reach deviates by 2cm, the audio corrects you. Everything runs on-device.
How we built it
iOS — LiDAR maps depth in real time. YOLOv8n detects objects at 2.5Hz. Apple Vision tracks the hand at 15Hz via middleMCP for precise 3D position. AI — Gemini 2.0 Flash processes voice and camera frames. Triggers spatial radar automatically via function calling. ElevenLabs handles voice responses. Backend — Python receives coordinates via UDP. EMA smooths sensor noise. Azimuth → stereo pan. Distance → pitch and beep interval. Last known position holds through occlusion. Dashboard — Next.js + Three.js. Live 3D object tracking via HTTP.
Challenges we ran into
LiDAR-Vision fusion — Projecting YOLOv8 bounding boxes into accurate 3D world coordinates required precise alignment between the camera plane and ARKit's depth buffer.Object permanence — When the hand covers the target mid-reach, we freeze the last confirmed 3D position so audio guidance continues through occlusion.Dual-frequency tracking — Running object detection at 2.5Hz and hand tracking at 15Hz simultaneously without blocking the main thread.Audio calibration — Pitch curve, beep rhythm, stereo panning, and the 2cm correction threshold each required more iteration than any technical component.
Accomplishments that we're proud of
Hand guides directly to the object — not just toward it Audio continues through occlusion via object permanence Full pipeline from LiDAR to stereo audio Two tracking frequencies running simultaneously without degrading either.
What we learned
LiDAR works in darkness with no cloud dependency — right sensor, right problem Hand-to-object guidance is more useful than camera-to-object for reaching tasks Audio is a full interface layer — silence, pitch, rhythm, and pan all carry meaning.
What's next for Solaura
Expand detection beyond bottles to doors, chairs, and stairs Move audio processing fully on-device — eliminate the Python backend Multi-object voice selection via Gemini function calling User testing with blind and low-vision users App Store submission



Log in or sign up for Devpost to join the conversation.