speakCV.

Facial Landmark Coordinates

💭 Inspiration

Throughout our Zoom university journey, our team noticed that we often forget to unmute our mics when we talk, or forget to mute it when we don't want others to listen in. To combat this problem, we created speakCV, a desktop client that automatically mutes and unmutes your mic for you using computer vision to understand when you are talking.

💻 What it does

speakCV automatically unmutes a user when they are about to speak and mutes them when they have not spoken for a while. The user does not have to interact with the mute/unmute button, creating a more natural and fluid experience.

🔧 How we built it

The application was written in Python: scipy and dlib for the machine learning, pyvirtualcam to access live Zoom video, and Tkinter for the GUI. OBS was used to provide the program access to a live Zoom call through virtual video, and the webpage for the application was built using Bootstrap.

⚙️ Challenges we ran into

A large challenge we ran into was fine tuning the mouth aspect ratio threshold for the model, which determined the model's sensitivity for mouth shape recognition. A low aspect ratio made the application unable to detect when a person started speaking, while a high aspect ratio caused the application to become too sensitive to small movements. We were able to find an acceptable value through trial and error. Another problem we encountered was lag, as the application was unable to handle both the Tkinter event loop and the mouth shape analysis at the same time. We were able to remove the lag by isolating each process into separate threads.

⭐️ Accomplishments that we're proud of

We were proud to solve a problem involving a technology we use frequently in our daily lives. Coming up with a problem and finding a way to solve it was rewarding as well, especially integrating the different machine learning models, virtual video, and application together.

🧠 What we learned

How to setup and use virtual environments in Anaconda to ensure the program can run locally without issues.
Working with virtual video/audio to access the streams from our own program.
GUI creation for Python applications with Tkinter.

❤️ What's next for speakCV.

Improve the precision of the shape recognition model, by further adjusting the mouth aspect ratio or by tweaking the contour spots used in the algorithm for determining a user's mouth shape.
Moving the application to the Zoom app marketplace by making the application with the Zoom SDK, which requires migrating the application to C++.
Another option is to use the Zoom API and move the application onto the web.

Built With

bootstrap
conda
css
dlib
html
imutils
javascript
obs
opencv
python
scipy
tkinter

Submitted to

DeerHacks
- Winner 1st Place

Created by

I worked on the Tkinter application for interacting with SpeakCV and made sure the project ran on a clean machine.

Kevin Shin
I worked on the speech detection code and recording the video content.

Raghav Sharma
I managed the team, created the front-end webpage using bootstrap and edited the video.

Anthony Tedja
Bugging Fixes
I worked on tweaking the mouth detection threshold and creating a visual feed of what SpeakCV sees. I also helped create the GUI frontend used my SpeakCV.

Hani Asim
I make computers do things for me