I started testing out Microsoft Azure's Computer Vision API and the results were both accurate and fast, so I thought, if this is so close to how I would describe an image, why not help blind people using this technology? After that, I brainstormed what other functions would be useful for someone who was blind, and PySight was born.
What it does
PySight has three main functions:
- Image description
- Text translation
- Motion detection to determine if there are moving cars in an video feed
How I built it
Detailed in the video demo. A special thanks to the following link for an overall scheme for a text-extraction algorithm: http://www.danvk.org/2015/01/07/finding-blocks-of-text-in-an-image-using-python-opencv-and-numpy.html, and Thank You to Microsoft and Google for making effective, easy-to-use API's.
Challenges I ran into
I originally wanted this to run on a Raspberry Pi (it was originally "PiSight", I just kind of got lucky that Python starts with the same sound) with buttons that blind people could easily distinguish, and a little camera on the back. I got it mostly functional and almost everything worked, but I updated the firmware using rbi-update, rebooted, and it never turned back on, so it was a bummer to not be able to produce a complete product. But I was able to demo all of the features that the Raspberry Pi would actually run, so I was overall happy.
Accomplishments that I'm proud of
The image classifier got an accuracy of 100%, which I thought was really cool. I'm happy with the speed of everything as well, especially considering there's so much data being passed around to so many places.
What I learned
What's next for PySight
Hopefully, I can fix the Raspberry Pi and it can be a little $60 device that we can distribute to the visually impaired.