We wanted to build something relating to computer vision and the theme of the hackathon was social impact. Eventually, our brainstorming session led us to develop a suite of tools to improve accessibility to the disabled. As we’ve had no direct experience with video processing at this scale, Interpreting American Sign Language seemed like a fair challenge. Through intense iteration we were able to narrow focus to helping to both streamline and redefine the quotidian interaction experience for American Sign Language speakers.
What does it do?
It feeds photos from a webcam to a custom neural network which we trained on IBM Watson to determine the intent of the user. The intent will trigger a specific action from our backend and display useful data to the user.
For example, if a user is at a grocery store and signs “Where can I find apples?”, the webcam will pass this to our neural network, the neural network will extract the intent, and our application will show where the apples in the grocery store are.
How I built it?
The core application is built in Flask and served with nginx and uwsgi. There is an ORM layer in SQLAlchemy which ties into our MySQL database hosted in AWS RDS. We utilize IBM Watson’s Python SDK to interface with the custom neural network. The frontend is built on Materialize and native JS.
We also used our application to create the data set and train our custom neural network ourselves because there was a huge deficiency of publicly available sign language data sets to train AI with.
We did some keystroke mappings using XKeys to improve quick access for FAQs.
We built our own Alexa Skill through AWS Lambda.
Challenges I ran into:
Ever since 2015, Google released a new standard of WebRTC which forces all webcam media interfaces to be loaded over HTTPS. This forced us to self generate OpenSSL certificates and serve it using nginx. Configuring this to work with Flask and uWSGI was a huge technical challenge but we prevailed in the end.
Another challenge was the rate limit of Watson’s Visual Recognition API. The free tier was rate limited to 250 image classifications per day, so we had to continuously refresh API keys in order to continue development.
Accomplishments that I’m proud of:
Built a LSTM recurrent neural net (with VGG19 feature extraction) that worked well but not as good as the services IBM bluemix Watson provides. The complete data pipeline architecture can be found at : https://www.dropbox.com/s/tn320esidl772nz/data-pipeline.png?dl=0
Developed a consistent, simplistic and cognitively intuitive user interface that fuses seamlessly with the platform developed by team members.
What I learned:
Image recognition training algorithms are very computationally heavy. Reducing the dimensions of your image can greatly improve the performance of your neural network. We found that even storing a 64x64 PNG image was enough for our requirements and was the perfect balance between accuracy and speed.
What’s next for gestur:
Having a virtual store assistant is only a very small scaled showcasing of the capabilities of Gestur. It could be integrated with virtually any technology and has an infinite possibility to expand for the greater good of the world. We probably need to get some sleep too...