Seek: AI-assistant for the Visually Impaired

Inspiration

Several of our team members have friends with visual disabilities. From speaking with these friends, we've learned that being able to "feel" an environment in a natural, discreet way is extremely important. But devices that provide this functionality aren't always accessible. We decided to build Seek as an attempt to tackle this challenge.

What it does

Seek integrates with the iPhone's depth sensor, camera, and proximity sensor to provide a seamless, hands-free experience for the visually-impaired to explore their environments. We have three modes:

Navigate/Discover: the user is stationary and scans their phone around their surroundings. The phone continuously vibrates at an intensity proportional to the distance of the objects being pointed at. This way the user can "feel" the depth of an environment. Simultaneously, a vision network is identifying objects in the frame, and the phone narrates to the user "person five feet away," or "water bottle two feet away" to provide important contextual information.

Touch: the user tells the phone what types of objects they are interested in identifying. Then, the user scans the phone around the environment, and the phone alerts the user when an object of interest is detected. Similar to the "Discover" mode, the phone vibrates more as the user approaches the object, then finally alerts the user when they are about to bump into the object.

How we built it

We built a Swift iOS app, completely powered by Houndify's conversational interface and CoreML. Using state-of-the-art neural networks to integrate information from the iPhone X's TrueDepth sensor, front-facing camera, and proximity sensor we are able to extract salient information from the user's environment. We present this information to the user in a natural, accessible manner, enabling the user to "feel" the depth of their environment.

Challenges we ran into

We had difficulty in running our models in real-time with limited compute on the iPhone. Vision inference and depth estimation are both computationally-intensive tasks. We used model distillation to develop a smaller, faster network and implemented an efficient scheduling algorithm for running both depth and vision calculations.

Accomplishments that we're proud of

We are extremely proud of developing state of the art embedded inference for both object and depth recognition. Our model is efficient enough to run on an iPhone in real time! Additionally, we showed Seek to one of our visually-impaired friends, and she found the app super useful! Of course our app isn't fully built-out, but we are proud that we were able to make so much progress in such a short period of time.

What we learned

We learned a ton about how to use Houndify, Swift, and CoreML. TreeHacks provided us with great resources and mentorship, and we will definitely leave the hackathon with many useful new skills.