In our new online classes, the lectures are long and drawn out, and Audio Alerter allows users to never miss a key moment, archive the lecture as a transcript, and search through that transcript to find the right part of the lecture.
What it does
There are two main modes:
- Real Time Alert Mode, that is designed for transcribing live audio and receiving alerts when keywords you specify come up. This is great for live lectures or videos that you cannot download yourself. Simply specify the keywords, allow access to the microphone, play the video (or place the microphone so it can hear the lecture or presentation), then wait for Audio Alerter to send you a notification when one of your keywords came up.
- Video Search Mode, that is designed for searching within videos that you have access to. Maybe you're professor posted a lecture video and you need to find where in the lecture they mentioned something in particular. Never fear, simply upload the video, then a search box will come up allowing you to find where in the transcript that keyword or key phrase occurred. You can click on the transcript box to take you to that part of the video too.
How we built it
The front end is React with Material-UI, so everything is fully responsive, so this can be used on any device.
The back end for the video processing is built using socket.io websockets for real time communication, which connects to a Google Kubernetes Cluster running a server that passes the audio to the Google Speech-To-Text API. In return, transcript data is sent back through the Google Kubernetes Cluster to the front end for you to use and receive alerts, or search the video.
The authentication and upload systems are powered by Firebase Authentication, Firestore, Cloud Functions, and Cloud Storage.
Challenges we ran into
The Google Speech-To-Text API works great, but it requires raw data passed via gRPC for the real time streaming. Unfortunately, front-end browsers do not support gRPC nor do they expose the raw data, so there was a long time (about 2/3 of the hacking time) spent trying to get the microphone data re-encoded into the right format and to the server via webhooks so that the server could make the gRPC API calls. Given we did not have any experience with real time communication before this hack, and especially not with binary audio data that we couldn't debug by looking at it, it took us a while to get everything figured out. In the end though, we got it all working, so we're very proud of that!
Accomplishments that we're proud of
We got pretty much all of the main features we wanted to support done. While we'd love to go deeper into the weeds for buffers and streams to support infinite real-time audio analysis and build out infinite streaming and larger file support, we got enough working to create our demo and prove the concept, and at the end of the day that's what we were going for,
What we learned
We learned a ton about webhooks, real time communication, audio encoding, and the Google Kubernetes Engine in the process of creating this hack.
What's next for Audio Alerter
- Infinite Streaming
- Larger File Sizes
- Native Apps for mobile devices (Flutter)