Inspiration

We wanted to allow all users to be able to easily use video conferencing tools no matter what disabilities they may have. Whether they're hard of hearing, hard of seeing, or anywhere on the spectrum, our goal is to use ASL and emotion recognition to ease the burden of communication barriers amongst users.

What it does

Using a Google Extension, a bot account sets up a Zoom with the user's consent to feed in audio and visual data. Users can sign and the bot account will interpret it and translate it into captions and phrases. There will also be capabilities to use a text-to-voice generator to accommodate for blind users.

How we built it

Uses OpenCV in a Google Extension to allow users the ability to sign words and phrases in a Google Meet conference call. There will be abilities to use AI text-to-voice generation to sound out generated captions based on signing. Additionally, the dummy account which sets up the meeting will gather "input" from participants in the form of audio and video, and, using a pre-trained model incorporated with the ___ repository trained with TensorFlow and Keras, increases the accuracy of predicting what the user is signing.

Challenges we ran into

This was our first time working with Zoom SDK and OAuth frameworks. As a result, establishing the actual Zoom connection itself using access tokens was a struggle, however it further expanded our understanding of how video conferencing software interacts with data servers and integrating JavaScript.

Additionally, we struggled with finding ASL translation models that were able to be easily trained and recognize what the user is signing. We ended up using a cloned dataset and having to have the user sign out words and phrases letter by letter.

Accomplishments that we're proud of

We were able to successfully incorporate OpenCV frameworks to integrate the software with ASL and facial recognition. We developed an understanding of the Zoom SDK and how to integrate OAuth frameworks to generate and authorize users into Zoom meetings. Finally, we were able to incorporate TensorFlow and Keras to work with a cloned repository to yield a pre-trained model that assists in predicting what the user is signing.

What we learned

This project taught us how to integrate various APIs and understand how to work with authentication frameworks to establish video conferencing meetings. We also developed a much better sense of source control with Git, as this was a skill that we were not as skilled in before working on this project.

What's next for EyeSign

As we currently are only having users sign words and phrases letter by letter, we want to improve this model in the future by using frameworks and libraries that allow us to sign entire words and phrases, not having to sign things letter by letter. We also want to expand the dataset we have and be able to train models on a larger dataset in order to yield more accurate results. Finally, we want to look more into using software such as Recall.ai to improve upon the readings from the Zoom user. We had to do workarounds because current methods require us to create our own domain and extract the raw video and audio data in I420 raw frames and PCM 16LE raw format respectively, requiring us to encode our own data to train in the model which, given our time constraints, wasn't entirely feasible.

Built With

Share this project:

Updates