Currently, Zoom only offers live closed captioning when a human transcriber manually transcribes a meeting. We believe that users would benefit greatly from closed captions in every meeting, so we created Cloud Caption.
What it does
Cloud Caption receives live system audio from a Zoom meeting or other video conference platform and translate this audio in real time to closed captioning that is displayed in a floating window. This window can be positioned on top of the Zoom meeting and it is translucent, so it will never get in the way.
How we built it
Cloud Caption uses the Google Cloud Speech-to-Text API to automatically transcribe the audio streamed from Zoom or another video conferencing app.
Challenges we ran into
We went through a few iterations before we were able to get Cloud Caption working. First, we started with a browser-based app that would embed Zoom, but we discovered that the Google Cloud framework isn't compatible in browser-based environments. We then pivoted to an Electron-based desktop app, but the experimental web APIs that we needed did not work. Finally, we implemented a Python-based desktop app that uses a third-party program like Loopback to route the audio.
Accomplishments that we're proud of
We are proud of our ability to think and adapt quickly and collaborate efficiently during this remote event. We're also proud that our app is a genuinely useful accessibility tool for anyone who is deaf or hard-of-hearing, encouraging all students and learners to collaborate in real time despite any personal challenges they may face. Cloud Caption is also useful for students who aren't auditory learners and prefer to learn information by reading.
Finally, we're proud of the relative ease-of-use of the application. Users only need to have Loopback (or another audio-routing program) installed on their computer in order to receive real time video speech-to-text transcription, instead of being forced to wait and re-watch a video conference later with closed captioning embedded.
What we learned
Our team learned that specifying, controlling, and linking audio input and output sources can be an incredibly difficult task with poor support from browser and framework vendors. We also came to appreciate the values of building with accessibility as a major goal throughout the design and development process. Accessibility can often be overlooked in applications and projects of every size, so all of us have learned to prioritize developing with inclusivity in mind for our projects moving forward.
What's next for Cloud Caption
Our next step is to integrate audio routing so that users won't need a third-party program. We would also like to explore further applications of our closed captioning application in other business or corporate uses cases for HR or training purposes, especially targeting those users who may be deaf or hard-of-hearing.