Inspiration

253 million people worldwide live with visual impairment (WHO). Dedicated assistive devices like smart canes cost \$300–\$600, yet most visually impaired users already carry an iPhone. The iPhone Pro's LiDAR sensor — originally designed for AR — is a millimeter-accurate depth scanner running at 60fps. We wanted to turn that into a wearable navigation assistant at zero additional cost.

We were also motivated by a critical gap: what happens when a visually impaired user falls and can't call for help? Existing apps focus on navigation but ignore emergencies. Guider addresses both — real-time obstacle avoidance and automatic emergency response with GPS location sharing.

What it does

Guider turns an iPhone Pro into a chest-worn navigation device with three core capabilities:

Navigation Mode — LiDAR scans ahead at 60fps, detecting obstacles across a 3×3 directional grid. Four distance zones (Safe > 2m, Caution 1–2m, Warning 0.5–1m, Danger < 0.5m) trigger increasingly urgent haptic vibrations and voice alerts like "Obstacle left". Stairs are detected separately with a distinct double-tap vibration pattern.

Object Scan — Tap the screen to photograph and identify what's in front of you. Online mode uses Gemini AI for rich natural language descriptions ("A red coffee mug on a wooden desk next to a laptop"). Offline mode automatically falls back to Apple Vision for on-device classification ("I see coffee mug, desk"). Switching is automatic based on network status.

Emergency SMS — If the phone detects a fall and the user doesn't respond within 10 seconds, Guider automatically sends an SMS with GPS coordinates and a Google Maps link to the emergency contact. A bystander guidance loop repeats every 10 seconds, speaking aloud to alert nearby people to help.

The entire app is controlled with just three gestures: tap, long press, and triple-tap. No buttons to find. Every state change is voice-announced. Emergency contacts are set up by voice during onboarding — say a name, the app searches your contacts and confirms.

How we built it

  • Depth Pipeline — ARKit streams LiDAR depth maps which we sample into a 3×3 grid (left/center/right × top/mid/bottom). ARKit plane anchors filter ground surfaces to eliminate false positives below 30cm. A 5-frame rolling median smooths noisy readings before zone classification.

  • Stair Detection — Depth gradient analysis in the lower 40% of the frame detects ≥3 regularly-spaced depth discontinuities (0.05–0.30m step rise). A 3-frame temporal filter prevents false triggers, with a 5-second cooldown between alerts.

  • Adaptive Frame Rate — A motion classifier tracks ARKit camera displacement. Walking triggers 30fps scanning; stationary drops to 15fps, extending battery life.

  • Object Recognition — NWPathMonitor detects network status. Online requests go to Gemini 2.5 Flash with a prompt optimized for visually impaired users. Offline uses Apple Vision's VNClassifyImageRequest with top-3 confidence filtering. If Gemini fails mid-request, it automatically falls back to offline.

  • Emergency Flow — Drop detection monitors the camera's Y-position for a 40cm+ fall within 0.5 seconds. SFSpeechRecognizer listens for "yes"/"help" responses. SMS is sent via a configurable webhook with GPS coordinates from CoreLocation, falling back to the system SMS composer if the webhook is unavailable.

  • Tech Stack — Swift 5.9, SwiftUI, ARKit, Core Haptics, AVSpeechSynthesizer, Speech framework, CoreLocation, Gemini API, Apple Vision, Network framework.

Challenges we ran into

Ground plane false positives — Early versions vibrated constantly because the floor registered as an obstacle. We solved this with ARKit plane anchor filtering and a 30cm height threshold, but tuning it to work across different surfaces (carpet, tile, grass) required extensive on-device testing.

Haptic engine invalidation — iOS silently kills the Core Haptics engine when the app enters background. Users would return to a completely silent app. We added foreground notification listeners and nil-checks that auto-restart the engine on return.

Stair detection accuracy — Distinguishing stairs from shelves, fences, or textured walls was difficult. The key insight was filtering for a dominant gradient direction (all positive or all negative) and requiring regular spacing with <30% standard deviation — real stairs have consistent step heights, random surfaces don't.

Emergency UX for unconscious users — If the user is unconscious, they can't confirm a call or tap a button. Our solution was a voice loop that speaks to bystanders, not the user — repeating instructions every 10 seconds while simultaneously sending an SMS with GPS so the emergency contact knows exactly where to go.

LiDAR range in sunlight — Infrared-based LiDAR loses range in direct sunlight. We can't fix the physics, but adaptive frame rate and conservative zone thresholds help maintain safety margins outdoors.

Accomplishments that we're proud of

  • End-to-end latency under 50ms from LiDAR depth capture to haptic feedback — fast enough that users can walk at normal pace and react to obstacles in time.

  • Fully hands-free emergency response — from fall detection to SMS with GPS coordinates, the entire flow works without any user interaction. A bystander who has never seen the app can still help because the phone speaks instructions aloud.

  • Seamless online/offline object recognition — the app automatically detects network status and switches between Gemini AI (detailed descriptions) and Apple Vision (on-device labels) without any user action.

  • Voice-guided onboarding — a visually impaired user can set up the entire app, including emergency contacts, using only their voice. No sighted assistance needed.

  • Three-gesture interaction model — tap, long press, and triple-tap. No buttons, no menus, no visual UI required. The entire screen is the touch target.

What we learned

  • Accessibility-first design is fundamentally different — we had to unlearn every UI instinct. No buttons, no visual feedback, no confirmation dialogs. Everything must be voice-announced and gesture-driven.

  • Real-time sensor pipelines require careful threading — ARKit callbacks, haptic playback, and speech synthesis all have different thread requirements. Getting them to coexist without blocking or crashing took significant architectural effort.

  • On-device ML is surprisingly capable — Apple Vision's VNClassifyImageRequest runs in under a second with no internet, making offline mode a genuine fallback rather than a degraded experience.

  • The "last 10%" of assistive tech is the hardest — obstacle detection was week one. Making it reliable enough that a blind person can trust it with their safety was the rest of the project.

  • Emergency scenarios expose every assumption — designing for an unconscious user forced us to rethink who the "user" is. Sometimes it's not the person holding the phone — it's the stranger standing nearby.

What's next for Guider

  • Apple Watch companion — haptic feedback directly on the wrist for more discreet and immediate alerts, freeing the phone to stay in a bag or pocket.

  • Multi-language support — voice announcements and speech recognition in multiple languages to reach a global audience.

  • Route learning — remember frequently walked routes and provide proactive guidance ("Stairs in 10 meters on your left, like last time").

  • Community obstacle mapping — crowdsourced hazard reporting so users can warn each other about construction zones, broken sidewalks, or temporary obstacles.

  • Indoor navigation — combine LiDAR point clouds with ARKit scene understanding for indoor wayfinding in malls, hospitals, and transit stations.

  • Caregiver dashboard — a web portal where family members can see the user's location, receive emergency alerts, and review trip history.

Built With

Share this project:

Updates