The COVID-19 pandemic has created tremendous accessibility problems, especially for the Deaf and Hard of Hearing population. As everyone moved to online video conferencing apps, audio quality has gone downhill, lipreading is completely contingent on having high-quality video streams, and captioning and transcriptions aren't always available. Accessibility has long been an issue, but the pandemic has made it all the more worse, especially in the education sector.
What it does
Capti hopes to alleviate some of these issues by processing audio from your chatting client (be it Zoom, Discord, Google Meet, or anything else), processing and adjusting the audio for improved quality, and sending it through Google Cloud's Speech to Text API for live captioning. It's convenient to set up as a user and does not require the other speakers (such as a teacher) to do anything at all, making it helpful in environments where schools or workplaces unfortunately do not have appropriate accessibility measures already in place. Although it's intended for chatting clients, because it processes your local audio, it can technically be extended to any program that plays audio, including video players, video games, and more!
How I built it
The GUI is written in Clojure using the JavaFX/cljfx library. We also use JACK to setup some virtual audio devices, as well as the Google Cloud Speech to Text API for transcription.
Challenges I ran into
Learning new things is always a challenge!
Accomplishments that I'm proud of
Learned about the Java Sound API & clojure.core.async