Our inspiration was to build a tool that was as simple to use as possible. We realized that technology has come a very long way in the last few years, and some of it could be used to improve the lives of others. Our goal was to create a tool for the visually impaired with which they could point it at an object, sign, or scene, and get real-time information back.
What it does
We built a Python interface for the Google machine vision, machine learning, and deep learning APIs which allow for quick and simple image processing, and other functions like built in OCR translate. Any image can be passed into the interface, and the program will recognize what is contained in the image. If there is text, it will read the text aloud to the user. If there is a famous landmark, it will announce what and where it is. Should the user simply be facing down a street, it will report the objects it sees around it with the same spoken format. The program is also capable of translating text for detected languages to English and by recognizing different logos from well known brands.
How we built it
This interface was built in Python using the Google Vision and Google Text-To-Speech APIs. Additional functions such as an automatic translate function, and additional OCR and Text-To-Speech interfaces are also implemented in our interface.
Challenges we ran into
Due to lack of materials, our original idea of a kind of handheld "wand" was unattainable. Working around this, we were challenged by having never used APIs at this scale before, and by having to learn or relearn Python from the ground up for this project.
Accomplishments that we're proud of
We are proud of the final product. While it is not currently in the form of a simple handheld device, we are happy that the interface we made can be easily implemented into many different user applications to make navigation easier for those with visual impairments. As a team comprised of four people with glasses, we can appreciate what it is like to not be able to see things all the time, and are glad that computers do not share the same limitations.
What we learned
We learned about handling large APIs such as Google Vision, and Google Text-To-Speech. We also learned a good bit about File-IO in Python and how to read live video streams and perform computer vision operations on still images.
What's next for iSpy
As mentioned above, we were unable to make our original implementation of a handheld wand-like device due to lack of a camera. In the future we intend to finish implementing our original designs and build a tool that can be used by anything with only the push of a button.