Computer vision is now often more accurate than human vision; why not help those who don’t have “human vision”? Our goal was to use cutting-edge innovations like convolutional neural networks to address disadvantages faced by vision-impaired people in a sighted-oriented world. This gave us an opportunity to contribute a novel and robust solution to the growing body of accessibility technologies.
What it does
Seer can survey the environment in real time and identify the objects in a scene that a human would focus on, how far away they are, and audibly describe which ones a human would find relevant, using stereo audio to intuitively indicate object direction. For example, if used in a classroom, Seer would focus on the professor, her gesticulating hands, and the book she was pointing at, while ignoring the wall behind her or a stationary pencil on a desk, as a person would. On the street, it knows to describe a moving car instead of chairs on a patio, providing safety to a vision-impaired person.
How we built it
We considered what features and information are important to navigate and understand the surrounding world. In order to implement these, we learned how to use theano and tensorflow and interface with webcams. Despite many frustrations, we always either came up with a solution or an alternative.
Challenges we ran into
Getting all of the different machine learning tools together, designed for different versions of python with often limited documentation due to their recency, and adjusting to unforeseen complications like our cameras’ built in optical zooms preventing the generation of a depth map with stereo vision. We also ran into challenges with video memory management and the various libraries we used competing for resources.
Accomplishments that I'm proud of
We started with limited to no knowledge of the machine learning tools and without a clear stack in mind, and ended up developing a familiarity with a complicated ML stack.
What we learned
Each of us pushed ourselves and our team members to try new tools and branch out, and as a result we learned a lot about machine learning and computer vision.
What's next for Seer
We think there's a lot left that we can do. More interactive features would be useful in an official version, like allowing the user to inquire about specific objects ("Tell me about the laptop" -> "A person is typing on a silver laptop at a table"). We are excited about Seer’s potential not just as a disability assistance device, but also as a transformational step in computer vision.