Inspiration
Over 5.3 million international students study in English, French, or Spanish speaking countries whilst having no native-fluency in those languages. In addition, over 35% of these students report "poor mental health". However, international students are reluctant to getting help--over 17% of these students claim "language anxiety" as the reason they don't seek adequate mental health resources.
What it does
STEVE.ai functions as a zoom meeting plugin tool to reduce language anxiety and increase clear communication in online mental health therapy sessions. Through a simple chat, the therapist or patient can initiate a minimal-delay captioning that translates the speaker's words, despite broken English, mixed languages, and/or incorrect grammar, into the desired language.
How we built it
To create this web app prototype, we focused on 3 steps: obtaining audio files from a zoom meeting, translating the mixed broken-language speech into clear interpretable language through chained models and processing, and presenting the final text to the user in a more easily interpretable language.
To obtain the audio files, we utilized an RTMP (Real-Time Messaging Protocol) of the zoom meeting on an ngrok server to utilize the FFMpeg library, allowing us to extract and selectively cut audio files mid-zoom call. We then ran the audio files through our program, utilizing chained data processing and manipulation as well as OpenAI's Whisper and GPT-4 LLMs (Large-Language Models) to translate the mixed/broken-language into clear single-language text. Finally, we took the output and ran it through Zoom's captioning API to present the output text in the zoom meeting chat.
Challenges we ran into
There were many issues related to interacting with the Zoom API, as utilizing tools like websockets and virtual chat agents required special permissions. Zoom also doesn't have any existing APIs that allow for downloading of audio files during a meeting, so we had to use an RTMP server to grab the audio files mid-meeting. In addition to this, Zoom APIs specifically did not allow for usage/access of the in-meeting chat, causing us to pivot to an alternate method of initiating the recording process using hot-keys.
Accomplishments that we're proud of
Our multilingual translation is quite smooth and accurate, even with background noise, complex input, and more. We're able to take speech mixes the desired language with multiple foreign languages and translate it into the grammar-corrected desired language within very few seconds.
What we learned
We've learned how to utilize Zoom APIs to create web app integrations, as well as a plethora of other WebDev skills.
What's next for STEVE.ai
Commercializing the prototype version of STEVE.ai will likely require the use of cloud-based hosting of Zoom meetings, as well as implementing all of our programs within a Zoom web app found on their marketplace, making it very easy for any user of Zoom meetings to add and utilize our product. All the complexities have already been fleshed out, so this expansion would only require some basic use of AWS and the Zoom App Marketplace. Our target market could also be expanded beyond international student mental health services to include foreign corporate vendor communication, virtual education support, and any case of international relations and communication.
Log in or sign up for Devpost to join the conversation.