In an era where many of our meetings are now virtual, we wanted to improve the accessibility of common video conferencing. Especially for those who may be have auditory impairments, we wanted to provide a better experience for video captioning. We found that auto-captioning presented difficulties that we wanted to solve, such as how captions can past by quickly without a chance to read, so our goal was to prototype a better user interface in the form of something more familiar and intuitive to group communication in the present.

What it does

speechto.text is a video conferencing platform that captions the conversation in the form of instant text messages. Simply enter your name, create or join a meeting, and say something into the mic to see your words on the right. Presenting the captions as a group text chat allows the user to keep track of who is speaking, reading captions at their own pace, and ultimately bringing back user agency in the conversation.

How we built it

We first started by simple sketches on paper that we eventually expanded and designed high fidelity prototypes on figma. From those components, the front end was built using React.

pain Our Figma prototypes

For the backend, we focused on how we would send data to the Google speech-to-text API, as well as handling video calls. We used Pion to handle video calls, which was written in GoLang, so we also decided to write the rest of our backend in GoLang as well.

Challenges we ran into

Speech-to-text Transcription

We ran into a lot of challenges converting speech to text. Google's speech-to-text API was difficult to use in our case, since it involved streaming multiple audio tracks and collecting their transcriptions in real time. Using a platform called Pion, we were able to create the interface needed for this two-way transmission of data.

React troubles

Connecting our front and backend became another issue, since we needed to be able to stream video and audio, as well as displaying transcriptions. Unfortunately, we ran into an issue with dependencies very late into our project, and weren't able to solve the issue.

Accomplishments that we're proud of

Over the past day, some of us gained experience in a new language. Some of us used new technologies that we experimented with, and some of us got rid of pesky bugs that disrupted us for hours. In a such a short time, we are all proud that we were able to produce a (somewhat) working prototype. While we weren't able to have the full intended functionality, we completed a lot of planning process for each screen on figma, which can be seen in the image gallery.

What we learned

We learned a lot over the past day, ranging from the complexities of programming speech to text to the fundamental importance of flex boxes.

Share this project: