Inspiration

There are over 300 deaf people at UMBC, two of which are our friends. Moreover, the most used app for class communication at UMBC is Discord.

This is a huge problem, and we sought to develop a not-so-hard solution!

What it does

Voicebox provides a ridiculously simple Discord interface that enables anyone to engage in Discord voice calls, just as natural as they would in real life.

1. Voice Transcription An optional feature to process and convert voice audio into text, in real time.

2. Text to Speech With VoiceBox TTS, your words find a voice! Our Text to Speech feature lets you turn simple text commands into spoken words, ensuring everyone can join the conversation on Discord.

3. Sign-Language to Speech Communicate with simple gestures, and watch them convert to real-time speech on Discord. It’s all about making every conversation inclusive and every voice heard, no matter how it’s spoken!

How we built it

Challenges we ran into

Training the LLM on hand gestures was particularly difficult because of the limit on the amount of data it could process, as well as the physical computing power we had available.

We had to prioritize having a few reliable commands, versus a more diverse set of half-baked gestures.

Accomplishments that we're proud of

It was a fun experience, but at the end of the day we just wanted to share our Discord voice-channel memories with our deaf friends, and now we can.

What we learned

It was easy for us to get super wrapped up in our technical solution and what "could be", that we at times lost track of what would work best for the actual end users. With the time constraint, we were forced to think about what was absolutely necessary and most intuitive for those using Voicebox.

What's next for Voicebox

  1. Training out ATS model on a larger corpus of visual communication data
  2. Increased language support
  3. More customization features (voice selection)
  4. Multi-user voice Sink support

Built With

Share this project:

Updates