Sign n Speak

Inspiration

This project aims to help people that were unable to pronounce or say words and help develop a more natural way for someone that uses ASL. The idea really rooted from when we realized the communication problems that are involved when an ASL speaker speaks to a non-ASL speaker. We first came up with this idea when one of our team members met one of their friends grandparents who is speech impaired. Not only did she have trouble communicating with others, but she also dealt with many mental health issues because of that. We realized that this would be even worsened now with the remote learning, as students similar to her would be isolated and cannot receive any help they might usually receive from school employees. During DVHacks, our team were inspired to try and solve this problem with an app.

How we built it

First, we had to create a training dataset for our backend machine learning model. We synthesized multiple online datasets containing images of hand signals with images that we took with our own cameras to create a large, robust sign language signal dataset. To actually build the computer vision model, we used the Python library Tensorflow. We trained a multi-level convolutional neural network on the previously mentioned dataset. In addition, we utilized transfer learning to enhance our model with the images in the ImageNet database. Lastly, we used Tensorflow Lite to convert our model into a file that can be used by our mobile app for on-device inference.

We built off a boilerplate Android Studio template in Kotlin. Taking input from the camera, we used the tensorflow lite model to run inferences every 1.5 seconds. Every time a user displays a hand signal, the model's prediction is recorded. In addition, TextViews were added to display the most confident letters and their confidence scores for each frame. When a user displays the signal for “space”, the inferences for each frame are concatenated into a string and spoken out loud using the MTTS library.

Challenges we ran into

We had some issues with the accuracy of our CV model, especially when it came to some of the hand signs which were similar. For example, the signs for “S” and “E” or “N” and “M” were similar enough for the model to mix them up given the amount of training data we had. Because of that, the model performed relatively worse on these letters.

The data acquisition portion of our development was also quite challenging and time consuming, since finding a good existing dataset which aligned with our needs was difficult on its own. However, we also had to add our own images into the dataset to better train it for our project. Making it a significant challenge to set up and execute. Moreover, the training of the model took a long time, as there were over 3000 images for each of the 26 letters plus “space” and “nothing” tags, which ended up taking a few hours of training per model.

Additionally, since the entirety of our team was completely new to Android Studio, TensorFlow Lite and Kotlin, we had to spend a good portion of time setting up our development environments (necessary softwares, IDE, emulators, etc). One of our team members also had an unsupported processor type in their computer, which caused some more issues when trying to set up the android development environment on his laptop.

Accomplishments that we're proud of

We are proud that we were able to build and deploy a machine learning model on an android app, in just 24 hours (with very minimal previous experience). In addition, we are proud that we were able to develop a model that predicts accurately on many different letters.

What we learned

All three of us had very minimal experience with Tensorflow lite, Android Studio, and Kotlin. Through this process, we have gained valuable experience developing android apps, creating machine learning models for computer vision using these technologies, and deploying models to a mobile app.

What's next for Sign n Speak

Using the machine learning model we built, we want to develop a chrome extension that will work directly with Google Meet. This will allow people to communicate with ASL during their meetings. We also want to productionize our code so that we can release this app to the general public in the near future.