I wanted to make a project entirely out of material found at a hackathon, problems with the 3d printer did not prevent me from making this cardboard, hot glue, solder, and plastic cup "camera". Additionally, I wanted to focus on the topic of accessibility and the recent advances in computer vision suggested a project in the venue.
What it does
ContextCam takes pictures when the user clicks the button and then calls the BlueMix and Indico.IO API's to get classification labels and emotion from facial features, respectively, based on the user's amount of button presses.
By using these one or two-word classes, it generates text(sentences) to describe the object pictured, along with a definition and other possible labels it could be using Natural Language Processing from NLTK.
Finally, it outputs audio through the headphones of that description or emotion to the user based on the option.
How I built it
I built it using a Raspberry PI 3 with its camera and built-in WiFi using Python and the libraries referenced above. The majority of the code is taking the labels and processing/building the grammar to form a coherent text for the user.
Challenges I ran into
I was running into ambiguous grammar classification by NLTK of words, so I had to override it to make the dictionary look for the appropriate part of speech. Additionally, I wanted to implement a local image classification model in case there was no internet using caffe, but the raspberry pi did not have the computational power to compile and run it. However, I am happy with the results of Watson as they are much better than the reference BVLC AlexNet.
Accomplishments that I'm proud of
Actually going from image to the audio description was fun to make. The grammar parsing and proper wording was challenging, but ultimately rewarding. I also enjoyed hot-gluing the cardboard together into a cool-looking camera! And I kept the design simple with one button that can take multiple inputs rather than many buttons with single inputs.
What I learned
- How to use NLTK
- Raspberry PI Button + Camera
What's next for ContextCam
I would like to see where I can take this idea into text recognition, like reading signs. Also, I would like to implement a custom model that would train on the user's acquaintances to recognize them. Furthermore, I would want to do continuous analysis that outputs information based on previous context. I think these are all features a visually impaired person would need.
Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.