Inspiration
253 million people worldwide live with visual impairment (WHO). Dedicated assistive devices like smart canes cost \$300–\$600, yet most visually impaired users already carry an iPhone. The iPhone Pro's LiDAR sensor — originally designed for AR — is a millimeter-accurate depth scanner running at 60fps. We wanted to turn that into a wearable navigation assistant at zero additional cost.
We were also motivated by a critical gap: what happens when a visually impaired user falls and can't call for help? Existing apps focus on navigation but ignore emergencies. Guider addresses both — real-time obstacle avoidance and automatic emergency response with GPS location sharing.
What it does
Guider turns an iPhone Pro into a chest-worn navigation device with three core capabilities:
Navigation Mode — LiDAR scans ahead at 60fps, detecting obstacles across a 3×3 directional grid. Four distance zones (Safe > 2m, Caution 1–2m, Warning 0.5–1m, Danger < 0.5m) trigger increasingly urgent haptic vibrations and voice alerts like "Obstacle left". Stairs are detected separately with a distinct double-tap vibration pattern.
Object Scan — Tap the screen to photograph and identify what's in front of you. Online mode uses Gemini AI for rich natural language descriptions ("A red coffee mug on a wooden desk next to a laptop"). Offline mode automatically falls back to Apple Vision for on-device classification ("I see coffee mug, desk"). Switching is automatic based on network status.
Emergency SMS — If the phone detects a fall and the user doesn't respond within 10 seconds, Guider automatically sends an SMS with GPS coordinates and a Google Maps link to the emergency contact. A bystander guidance loop repeats every 10 seconds, speaking aloud to alert nearby people to help.
The entire app is controlled with just three gestures: tap, long press, and triple-tap. No buttons to find. Every state change is voice-announced. Emergency contacts are set up by voice during onboarding — say a name, the app searches your contacts and confirms.
How we built it
Depth Pipeline — ARKit streams LiDAR depth maps which we sample into a 3×3 grid (left/center/right × top/mid/bottom). ARKit plane anchors filter ground surfaces to eliminate false positives below 30cm. A 5-frame rolling median smooths noisy readings before zone classification.
Stair Detection — Depth gradient analysis in the lower 40% of the frame detects ≥3 regularly-spaced depth discontinuities (0.05–0.30m step rise). A 3-frame temporal filter prevents false triggers, with a 5-second cooldown between alerts.
Adaptive Frame Rate — A motion classifier tracks ARKit camera displacement. Walking triggers 30fps scanning; stationary drops to 15fps, extending battery life.
Object Recognition — NWPathMonitor detects network status. Online requests go to Gemini 2.5 Flash with a prompt optimized for visually impaired users. Offline uses Apple Vision's VNClassifyImageRequest with top-3 confidence filtering. If Gemini fails mid-request, it automatically falls back to offline.
Emergency Flow — Drop detection monitors the camera's Y-position for a 40cm+ fall within 0.5 seconds. SFSpeechRecognizer listens for "yes"/"help" responses. SMS is sent via a configurable webhook with GPS coordinates from CoreLocation, falling back to the system SMS composer if the webhook is unavailable.
Tech Stack — Swift 5.9, SwiftUI, ARKit, Core Haptics, AVSpeechSynthesizer, Speech framework, CoreLocation, Gemini API, Apple Vision, Network framework.
Challenges we ran into
Ground plane false positives — Early versions vibrated constantly because the floor registered as an obstacle. We solved this with ARKit plane anchor filtering and a 30cm height threshold, but tuning it to work across different surfaces (carpet, tile, grass) required extensive on-device testing.
Haptic engine invalidation — iOS silently kills the Core Haptics engine when the app enters background. Users would return to a completely silent app. We added foreground notification listeners and nil-checks that auto-restart the engine on return.
Stair detection accuracy — Distinguishing stairs from shelves, fences, or textured walls was difficult. The key insight was filtering for a dominant gradient direction (all positive or all negative) and requiring regular spacing with <30% standard deviation — real stairs have consistent step heights, random surfaces don't.
Emergency UX for unconscious users — If the user is unconscious, they can't confirm a call or tap a button. Our solution was a voice loop that speaks to bystanders, not the user — repeating instructions every 10 seconds while simultaneously sending an SMS with GPS so the emergency contact knows exactly where to go.
LiDAR range in sunlight — Infrared-based LiDAR loses range in direct sunlight. We can't fix the physics, but adaptive frame rate and conservative zone thresholds help maintain safety margins outdoors.
Accomplishments that we're proud of
End-to-end latency under 50ms from LiDAR depth capture to haptic feedback — fast enough that users can walk at normal pace and react to obstacles in time.
Fully hands-free emergency response — from fall detection to SMS with GPS coordinates, the entire flow works without any user interaction. A bystander who has never seen the app can still help because the phone speaks instructions aloud.
Seamless online/offline object recognition — the app automatically detects network status and switches between Gemini AI (detailed descriptions) and Apple Vision (on-device labels) without any user action.
Voice-guided onboarding — a visually impaired user can set up the entire app, including emergency contacts, using only their voice. No sighted assistance needed.
Three-gesture interaction model — tap, long press, and triple-tap. No buttons, no menus, no visual UI required. The entire screen is the touch target.
What we learned
Accessibility-first design is fundamentally different — we had to unlearn every UI instinct. No buttons, no visual feedback, no confirmation dialogs. Everything must be voice-announced and gesture-driven.
Real-time sensor pipelines require careful threading — ARKit callbacks, haptic playback, and speech synthesis all have different thread requirements. Getting them to coexist without blocking or crashing took significant architectural effort.
On-device ML is surprisingly capable — Apple Vision's VNClassifyImageRequest runs in under a second with no internet, making offline mode a genuine fallback rather than a degraded experience.
The "last 10%" of assistive tech is the hardest — obstacle detection was week one. Making it reliable enough that a blind person can trust it with their safety was the rest of the project.
Emergency scenarios expose every assumption — designing for an unconscious user forced us to rethink who the "user" is. Sometimes it's not the person holding the phone — it's the stranger standing nearby.
What's next for Guider
Apple Watch companion — haptic feedback directly on the wrist for more discreet and immediate alerts, freeing the phone to stay in a bag or pocket.
Multi-language support — voice announcements and speech recognition in multiple languages to reach a global audience.
Route learning — remember frequently walked routes and provide proactive guidance ("Stairs in 10 meters on your left, like last time").
Community obstacle mapping — crowdsourced hazard reporting so users can warn each other about construction zones, broken sidewalks, or temporary obstacles.
Indoor navigation — combine LiDAR point clouds with ARKit scene understanding for indoor wayfinding in malls, hospitals, and transit stations.
Caregiver dashboard — a web portal where family members can see the user's location, receive emergency alerts, and review trip history.
Built With
- arkit
- avfoundation
- core-haptics
- core-location
- gemini
- network
- platforms
- speech
- swift
- swiftui
- vision
- xcode
Log in or sign up for Devpost to join the conversation.