Wander

A wearable haptic vision system for blind and low-vision users

Inspiration

Blind and low-vision pedestrians navigate with tools that tell them almost nothing about the space around them. A white cane finds obstacles at ground level. A guide dog handles traffic. Neither gives turn-by-turn directions, and neither warns about a person stepping into your path from the side. We wanted to fuse navigation and obstacle awareness into a single, always-on sense, delivered through touch so the wearer's ears stay free for the street.

What It Does

Wander is a haptic belt worn around the torso. A chest-mounted iPhone reads Google Maps walking directions, monitors its own LiDAR depth sensor, and watches the camera for people and obstacles. Every heartbeat it compresses all of that into one cue and fires it to four servos arranged as a cross around the body: front, back, left, and right.

The belt points the way to go, not at the hazard. A tap on the left motor means turn left, a tap on the right means turn right, and the front motor means you are on course. When something is in your path, the belt steers you clear instead of buzzing the thing itself. An obstacle on your left taps Right to send you toward the open side, and a person stepping in dead ahead taps Back, telling you to stop and step back. Closeness rides on the strength of the tap. The wearer sets a destination by speaking it, and Wander resolves the place, builds the route, and starts walking them there.

The safety logic runs at 10 Hz with a strict priority stack: a person in your path beats a LiDAR obstacle, which beats an early-warning looming cue, which beats a navigation turn cue. The belt never confuses "turn here" with "stop, something is in front of you."

How We Built It

The phone is the entire sensing stack. We dropped the planned Coral accelerator once on-device LiDAR and CoreML carried the safety story, so there is no external compute to wear.

iOS app (Swift 6, strict concurrency): AppModel runs a 10 Hz decide loop that arbitrates cues from four sources and fires a single LC2 packet over UDP to the belt.
LiDAR obstacle detection: ARKit depth frames, sampled in three lateral bands to decide whether to steer left, steer right, or stop and reorient.
Person and object detection: YOLOv8n CoreML model, on-device at the LiDAR frame rate. Depth-crop fusion confirms distance. 21 COCO navigation classes (person, bicycle, car, bus, stop sign, and more).
Early warning: A BearingTracker watches for centered, looming objects before LiDAR has a return and fires a soft front tap as a heads-up.
Navigation: Google Maps SDK + Directions API. The wearer speaks a destination, PlaceResolver finds it with MKLocalSearch, and the app drives the route off live GPS.
Voice layer: Deepgram Voice Agent for speech in and out, with client-side function calling for commands like set_destination, where_am_i, describe_surroundings, and read_sign.
Claude reasoning (off the safety path): the describe path runs an evaluator-optimizer, a Haiku draft checked by a Sonnet verify pass against the live scene, with an on-device guard that rejects any "path is clear" line the LiDAR contradicts. read_sign hands one camera frame to Opus vision to read store signs, bus numbers, and door labels, hedged when the read is high-stakes. Claude never touches the obstacle reflex.
Belt: ESP32 in Wi-Fi AP mode driving four servos. A FastAPI laptop bridge over USB serial is the fallback if the ESP32 does not come up.

Challenges We Ran Into

Thermal headroom. Running LiDAR depth, YOLO inference, Google Maps, and a Deepgram WebSocket at once on one phone generates heat. We instrumented a thermal monitor and gated YOLO behind a thermal threshold to keep the phone from throttling mid-demo.

ARKit and AVFoundation cannot share a session. We had to collapse the camera preview, LiDAR depth, and YOLO inference onto a single ARSession rather than running a second AVCaptureSession.

Swift 6 strict concurrency. Every service and actor boundary had to satisfy the compiler's data-race checker, with a clean build and no warnings.

Deepgram Voice Agent API quirks. The plan assumed a client_side flag on function definitions; sending one causes an UNPARSABLE_CLIENT_MESSAGE error, and the real way to mark a function client-side is to omit its endpoint. Push-to-talk with stopAudio tripped CLIENT_MESSAGE_TIMEOUT. We moved to continuous mic streaming with Deepgram's own end-of-speech detection and a tap-to-toggle UI.

Chest-mount angle. LiDAR pointed at chest height sees the ground at range. Without ground-plane rejection, the obstacle cue fires constantly on flat pavement.

Accomplishments We're Proud Of

A complete pipeline from spoken destination to walking directions to haptic belt cues, all running on one iPhone with no cloud compute on the safety path.
A belt that guides to safety rather than pointing at danger: every hazard cue taps the direction the wearer should move, and a four-tier arbiter guarantees a navigation hint can never mask a real hazard.
The voice layer working end to end on device. Speak a place name, get a route, and the belt starts guiding.
An evaluator-optimizer for spoken safety narration: a Haiku draft verified by Sonnet against the live scene, with an on-device guard that blocks a false "all clear."
120+ unit test assertions covering the packet codec, bearing math, routing geometry, obstacle avoidance, person detection, depth fusion, and voice command parsing.
A hardware fallback (laptop FastAPI bridge to an Arduino over USB serial) so the demo is not gated on the ESP32 coming up.

What We Learned

Safety systems need a single, explicit arbitration point. Letting each subsystem fire the belt on its own would have produced chaos. A ranked priority stack with one sender per tick made the behavior predictable and testable.

Voice APIs require hands-on device testing. Every assumption we made about the Deepgram API, the function flags, the push-to-talk model, the audio routing, turned out wrong in some detail. The as-built behavior came from running it on the phone, not from reading docs.

On-device inference is fast enough to matter. YOLOv8n at the LiDAR frame rate adds real signal without a co-processor.

What's Next for Wander

Belt bring-up: prove the LC2 round-trip on the real ESP32 and servos, and tune the haptic patterns for clarity at walking speed.
Thermal hardening: ground-plane rejection at the chest-mount angle, threshold tuning, and false-positive discipline with settle and hysteresis logic.
On-device voice and vision verification: confirm the describe and read-sign paths answer within a single voice turn on the phone against live keys, and tune the tone of what Wander says.
YOLO-World upgrade: swap the fixed COCO vocabulary for an open-vocabulary model so we can name any object class without retraining.
Fetch.ai integration: an optional transactional tier for booking accessible transit or flagging routing hazards to a shared map.

Built With

Swift, SwiftUI, Swift 6, ARKit, CoreML, YOLOv8n, Google Maps SDK, Google Directions API, Deepgram Voice Agent, Anthropic Claude API, MKLocalSearch, ESP32, Arduino, FastAPI, Python, WebSockets, XcodeGen