SignFlow

Video to Text
Audio to Video

Inspiration

The inspiration for SignFlow came from a desire to make communication more accessible between hearing and non-hearing individuals. As someone passionate about technology and its ability to break barriers, I saw the potential to leverage AI to bridge the gap between spoken language and American Sign Language (ASL). The project aims to enhance inclusivity, making it easier for people to communicate regardless of their hearing abilities.

What I learned

Building SignFlow has been a great learning experience in AI, machine learning, and app development. I gained hands-on knowledge of natural language processing (NLP) techniques, particularly in the context of speech-to-text and video analysis. Additionally, I learned how to train models to recognize ASL gestures and translate them into meaningful text or audio. The project also deepened my understanding of integrating AI models with mobile applications for real-time user interaction.

How we built it

The development of SignFlow involved several steps and technologies to integrate both audio and video-based translations of ASL.

Audio to ASL I started by using Faspit to capture audio from users. The audio was sent to a Flask API, which processed the input and matched it to a relevant ASL phrase. To train the system, I gathered a dataset of both ASL phrase videos and images of hand signs, which were crucial for the ASL gesture recognition. Once I had a dataset, I processed the phrases into arrays of strings and did the same for the input audio. The app then matched the audio to the corresponding ASL video phrases, which were served via an API. The frontend would fetch the top matches and request video streams from the backend.

ASL Gesture Recognition For the ASL gesture part, I trained a model on the hand sign dataset to recognize individual hand gestures. The frontend allowed users to capture hand signs, and the trained model would predict the corresponding letter or word. The prediction process worked in real-time, making the app interactive and responsive.

Video Streaming To ensure smooth video playback, I implemented a video streaming feature, which allowed the frontend to fetch video matches and display them to users.

Overall, the combination of AI model training, real-time processing, and efficient API design allowed me to build a responsive and functional app.

Challenges we ran into

Data Collection: Collecting a robust dataset of ASL gestures for training the model was difficult. Many ASL datasets were either incomplete or lacked diversity.

Model Training: Training the ASL gesture recognition model was time-consuming, requiring fine-tuning to ensure accuracy in detecting and translating gestures.

Real-time Processing: Achieving low-latency processing for both speech-to-ASL and ASL-to-text translations posed significant technical hurdles.