We originally wanted to create voice-triggered transitions for Google Slides. The Google Slides API, however, was too limiting so we opted for a more general solution. We ended up solving our original problem and then some by creating a full-blown voice activated command system.

What it does

When you speak, our app listens to what you are saying and actively translates what you say into text that we can use to execute a sequence of configurable keystrokes or mouse movements.

How we built it

We used the Google Cloud Speech-to-text API to parse what the user says and the pynput package to control the host machine's mouse and keyboard.

Challenges we ran into

We developed this app on three separate OSes which complicated the development environment. Also, the Google Cloud Speech-to-text API only allows for ~1 minute of continuous audio streaming so we had to work around this limitation in order to provide a consistent and continuous stream of audio.

Accomplishments that we're proud of

Despite the difficulty of getting the continuous audio stream working, we managed to make the voice recognition experience fairly consistent. (It actually works!) Also this was the best use of git I've had on a collaborative project. Very little issues with merge conflicts.

What we learned

We learned how to use GCP and simulate keyboard and mouse clicks in python.

What's next for VOICE

More commands and a user friendly front-end for the end user to configure the voice commands.

+ 14 more
Share this project: