BlindSight

BlindSight Logo

Inspiration

Around 40 million people around the world are blind. But still, guide dogs are widely inaccessibly due to the high cost of training and breeding. Canes can only alert you of obstacles when you hit objects. And both of these solutions are clunky and dangerous indoors. Blindness is a growing trend, and we need better tools to solve it. We found a gap in blind-aid technology: current tools lack the contextual intelligence to guide someone through the inside complex building. So we were inspired to introduce an accessible, true zero-shot multi-step indoor navigation. Instead of just avoiding obstacles, our system intelligently reasons about the environment to break down end destinations into sequential subgoals ("Find the hallway," then "Go to the elevator," and so on). BlindSight hopes to increase independent mobility in complex indoor spaces.

What it does

The core navigation pipeline uses ARKit's SLAM capabilities for real time spatial tracking and continuous environmental mapping. Camera frames are processed through a YOLOE model and fed into Gemini for high level spatial reasoning and navigation guidance. This runs in parallel with a low level obstacle avoidance system that uses LiDAR depth to alert the user of obstacles and to guide it along a path using haptic vibrations and our spatial audio system, respectively.

How we built it

We built the core navigation pipeline using ARKit's SLAM capabilities for live spatial tracking and continuous environmental mapping. As the user moves, camera frame are fed through a YOLOE model for open-vocabulary object detection, feeding the scene data into Gemini 3 to prompt it for high level spatial reasoning and navigation guidance. This visual processing is fed through a low-level obstacle avoidance system that uses LiDAR depth data to enable dynamic haptic vibrations and spatial audio feedback to alert the user of obstacles and guide them along a path, respectively. This ensures safe navigation around hazards.

Challenges we ran into

We ran into plenty of challenges. Our basic pipeline was bottlenecked by YOLOE at one point (it just couldn't detect anything) but after multiple hours of debugging we figured out the issue was in the nuances of how YOLOE needed to be configured and exported to CoreML. Another issue revolved around our spatial audio system not working. The system relies on spatial audio to direct the user on where to go. For a while, the system wasn't able to localize a sound in the direction of the path. There were quite a few issues with optimizing the system to run on an iPhone. We used an iPhone 15 Pro, and it kept overheating given the compute that we were running. We ended up slicing frames and taking a frame every second or two to keep the compute low. Another issue we ran into was the inability of the model to tell when it was at a goal. Often, when we arrived at a 'door', the system wasn't able to tell that we were 'there.'

Accomplishments that we're proud of

We are very proud of our spatial audio system. It's able to fully autonomously guide users to avoid obstacles and on the right path to a destination. We're also super proud of our SLAM implementation using ARKit that tracks a destination waypoint and automatically veers the user onto the right path even if they shift/turn their body abruptly.

What we learned

We learned a lot of new things! For one, bring more sleeping equipment for an overnight hackathon. Floors are hard. But more importantly, we learned that iOS app development requires a lot of conversion to iOS standards. We also started coding too late and were too relaxed until late at night, at which point we realized we had to redo the architecture. So the lesson is to plan ahead thoroughly and plan for roadblocks.

What's next for BlindSight

We have a few directions to go down. First is to move beyond the smartphone form factor into smart glasses or some wearable device. We also want to solve the thermal issues of the smart phone by reducing the computational load of all the models that we are running.