We originally wanted to create voice-triggered transitions for Google Slides. The Google Slides API, however, was too limiting so we opted for a more general solution. We ended up solving our original problem and then some by creating a full-blown voice activated command system.
What it does
When you speak, our app listens to what you are saying and actively translates what you say into text that we can use to execute a sequence of configurable keystrokes or mouse movements.
How we built it
We used the Google Cloud Speech-to-text API to parse what the user says and the
pynput package to control the host machine's mouse and keyboard.
Challenges we ran into
We developed this app on three separate OSes which complicated the development environment. Also, the Google Cloud Speech-to-text API only allows for ~1 minute of continuous audio streaming so we had to work around this limitation in order to provide a consistent and continuous stream of audio.
Accomplishments that we're proud of
Despite the difficulty of getting the continuous audio stream working, we managed to make the voice recognition experience fairly consistent. (It actually works!) Also this was the best use of git I've had on a collaborative project. Very little issues with merge conflicts.
What we learned
We learned how to use GCP and simulate keyboard and mouse clicks in python.
What's next for VOICE
More commands and a user friendly front-end for the end user to configure the voice commands.