Seer | Devpost

Demonstration with salience and object detection
Trying on the headset

Inspiration

Computer vision is now often more accurate than human vision; why not help those who don’t have “human vision”? Our goal was to use cutting-edge innovations like convolutional neural networks to address disadvantages faced by vision-impaired people in a sighted-oriented world. This gave us an opportunity to contribute a novel and robust solution to the growing body of accessibility technologies.

What it does

Seer can survey the environment in real time and identify the objects in a scene that a human would focus on, how far away they are, and audibly describe which ones a human would find relevant, using stereo audio to intuitively indicate object direction. For example, if used in a classroom, Seer would focus on the professor, her gesticulating hands, and the book she was pointing at, while ignoring the wall behind her or a stationary pencil on a desk, as a person would. On the street, it knows to describe a moving car instead of chairs on a patio, providing safety to a vision-impaired person.

How we built it

We considered what features and information are important to navigate and understand the surrounding world. In order to implement these, we learned how to use theano and tensorflow and interface with webcams. Despite many frustrations, we always either came up with a solution or an alternative.

Challenges we ran into

Getting all of the different machine learning tools together, designed for different versions of python with often limited documentation due to their recency, and adjusting to unforeseen complications like our cameras’ built in optical zooms preventing the generation of a depth map with stereo vision. We also ran into challenges with video memory management and the various libraries we used competing for resources.

Accomplishments that I'm proud of

We started with limited to no knowledge of the machine learning tools and without a clear stack in mind, and ended up developing a familiarity with a complicated ML stack.

What we learned

Each of us pushed ourselves and our team members to try new tools and branch out, and as a result we learned a lot about machine learning and computer vision.

What's next for Seer

We think there's a lot left that we can do. More interactive features would be useful in an official version, like allowing the user to inquire about specific objects ("Tell me about the laptop" -> "A person is typing on a silver laptop at a table"). We are excited about Seer’s potential not just as a disability assistance device, but also as a transformational step in computer vision.

Built With

faster-rcnn
opencv
python
salgan
tensorflow
theano

Submitted to

HopHacks - Fall 2017
- Winner First Place Overall
- Winner Accenture: AI is the New UI
- Winner Bloomberg: Best Philanthropic Hack

Created by

I worked on implementing and reconciling the machine learning libraries as well as several audio features to use the recorded data and account for apparent distance. I was also developing an inquiry-based captioner with Show and Tell and gTTS

Eric Zelikman
Christian Cosgrove
Jason Li
Noah Naparst