I have always been interested in ASL since elementary school when my friends learned the alphabet and would "secretly" sign messages to each other. I was president of ASL club in high school, and there I learned about many of the struggles facing the Deaf community and the issue of their voice not being heard. One way that could give them a voice is if we developed efficient and accurate ASL translation, which is exactly what we sought out to do with this machine learning model.
What it does
This program uses machine learning to classify images depicting handshapes of the static (non-moving) letters of the alphabet in ASL. It splits the data we compiled in Google Drive into testing and training data, and trains a convolutional neural network to classify handshapes.
How we built it
We compiled a dataset of around 360 images (~15 per letter) of ASL handshapes in Google Drive. Then, using pytorch and Google Colab, were able to train this model on the Google Cloud GPU servers (in under 7 minutes!).
Challenges we ran into
Not many ASL handshape datasets are available on the internet so we made our own. Compiling the dataset took a lot of time.
Accomplishments that we're proud of
The accuracy of the model surprised us! Even with two convolutional layers in the model, it was able to learn and detect patterns in images that generalized pretty well to unseen images as well!
What we learned
It was both of our first times using Google Colab, so we learned a lot about that process. We learned how to use Google Colab to read files in from Google Drive and process them. Although I learned about CNN's in class theoretically, I had never built one before, so it was also both of our first times developing convolutional neural networks using pytorch, so a lot was learned!
What's next for ASL Alphabet Classifier with Convolutional Neural Networks
This was a preliminary example with only static images. We could expand with more handshapes, and then we could go onto the complex movement and dynamics of ASL. The potential for this is immense; imagine a person signing at a video camera and seeing a real time translation of what they are signing!