Throughout our Zoom university journey, our team noticed that we often forget to unmute our mics when we talk, or forget to mute it when we don't want others to listen in. To combat this problem, we created speakCV, a desktop client that automatically mutes and unmutes your mic for you using computer vision to understand when you are talking.
💻 What it does
speakCV automatically unmutes a user when they are about to speak and mutes them when they have not spoken for a while. The user does not have to interact with the mute/unmute button, creating a more natural and fluid experience.
🔧 How we built it
The application was written in Python: scipy and dlib for the machine learning, pyvirtualcam to access live Zoom video, and Tkinter for the GUI. OBS was used to provide the program access to a live Zoom call through virtual video, and the webpage for the application was built using Bootstrap.
⚙️ Challenges we ran into
A large challenge we ran into was fine tuning the mouth aspect ratio threshold for the model, which determined the model's sensitivity for mouth shape recognition. A low aspect ratio made the application unable to detect when a person started speaking, while a high aspect ratio caused the application to become too sensitive to small movements. We were able to find an acceptable value through trial and error. Another problem we encountered was lag, as the application was unable to handle both the Tkinter event loop and the mouth shape analysis at the same time. We were able to remove the lag by isolating each process into separate threads.
⭐️ Accomplishments that we're proud of
We were proud to solve a problem involving a technology we use frequently in our daily lives. Coming up with a problem and finding a way to solve it was rewarding as well, especially integrating the different machine learning models, virtual video, and application together.
🧠 What we learned
- How to setup and use virtual environments in Anaconda to ensure the program can run locally without issues.
- Working with virtual video/audio to access the streams from our own program.
- GUI creation for Python applications with Tkinter.
❤️ What's next for speakCV.
- Improve the precision of the shape recognition model, by further adjusting the mouth aspect ratio or by tweaking the contour spots used in the algorithm for determining a user's mouth shape.
- Moving the application to the Zoom app marketplace by making the application with the Zoom SDK, which requires migrating the application to C++.
- Another option is to use the Zoom API and move the application onto the web.