Inspiration

Most people who use sign language can have ways of understanding spoken words, whether through reading lips, online captions, or other means. However, their primary mode of communication is sign language and most people do not know sign language, creating a communication barrier. SwiftSign facilitates communication between people who only use ASL and people who do not understand it. This is crucial for common situations like doctor's appointments, where they would otherwise need a translator. This technology would also increase their independence because they now no longer need to rely on someone to translate for them. Our app promotes accessibility and inclusion of hearing-impaired individuals in society.

What it does

SwiftSign translates from ASL to English in real-time. The user signals into the camera, and the app outputs each letter while the user is communicating.

How we built it

We trained a YOLOv8 PyTorch model on a dataset of images with all ASL alphabet letters. We then uploaded the model to a Streamlit app, where we took in a live video feed of the user and outputted the ASL letter in real time using OpenCV for computer vision. The app first detects a hand and places a bounding box around it, transforming the image into a letter. The app detects changes every 25 milliseconds and sends the new image to the model to classify.

Challenges we ran into

Each model we trained took several hours to run because of the size of the dataset. Once they finished running and we uploaded them to the app, several models classified hand signals with very low accuracy, defaulting to “nothing” most of the time. Although the bounding box would identify the hand with high accuracy, the model could not classify it. After running several tests, we realized that the image sent to the model cut off most of the hand. We fixed this by increasing the padding for the bounding box so the camera would send the right image to the model to classify.

Accomplishments that we're proud of

We are proud that our app runs in real-time at 15 frames per second, meaning it detects and classifies hand signals at an extremely fast rate. This is useful for real-time translation because the model runs fast enough to work simultaneously with sign language communication. Our model also has very high metrics for accuracy, precision, and recall, all above 85%. Our program can have many real-world applications due to its real-time translation and accuracy.

What we learned

We learned how to design and train several types of ML models. Since some were inaccurate and took hours to train, we had to develop multiple models and run them simultaneously to have many options in case one did not work. We also learned to run our model in real-time using a live camera feed. We also had to figure out how to test and debug our code to determine why a model with really high accuracy was classifying hand signals incorrectly. By doing this project, we ended up learning sign language, too.

What's next for SwiftSign

Future steps are to add data for common ASL words so the app can translate both letters and words. We could also have the app keep track of each letter and display the text at the bottom of the screen, like captions.

Built With

Share this project:

Updates