Inspiration
As someone with impaired vision, I and others with similar conditions struggle to see text or objects that are far away. Something that I often use to help me see things far away is to take a photo using my phone and read from the photo, which can be quite inconvenient. This inconvenience inspired me to develop IntelliCam.
What it does
InelliCam is a portable device powered by a RaspberryPi and any webcam. It would constantly capture the environment around you from the view of the webcam, and you can ask it questions about the environment such as the location of objects and the content of texts. It would verbally answer your question based on the frame when the question was asked.
How we built it
I used xml image labeling to label features of images that would be used to train the neural network. Then, I used transfer learning to train a convolutional neural network using Tensorflow that would detect and label features of images. I also used a pre-trained neural network to detect and recognize text inside images. And finally, the user input is fed into a natural language processing system that would interpret the question and provide an appropriate response.
Challenges we ran into
Because the Tensorflow object detection API that I was using was no longer maintained, I had to spend a lot of time rebuilding numerous Tensorflow source files for it to be compatible with Tensorflow2.
Accomplishments that we're proud of
A fairly reliable detection network that handled different orientations and backgrounds. Somewhat reliable text recognition.
What we learned
How to use transfer learning to speed up ML training. How to integrate code from stack overflow and Github into my projects and account for the plenty more errors that comes with using others' code. How to use xml to label images to create training sets.
What's next for Image Q&A
- Change the interactions from text-based to voice activated along with a keyword to trigger the interaction.
- Optimize the model and port it onto a RaspberryPi.
- Increase the size of the training set significantly to improve accuracy, reliability, and versatility as more features are being trained on.
- Tune the parameters of the text recognition algorithm such that it is more accurate and reliable.
Log in or sign up for Devpost to join the conversation.