Speech Computer Control

List of commands pt.1
List of commands pt.2
Example of the chrome extension in action

Inspiration

https://www.youtube.com/watch?v=lxuOxQzDN3Y Robbie's story stuck out to me at the endless limitations of technology. He was diagnosed with muscular dystrophy which prevented him from having full control of his arms and legs. He was gifted with a Google home that crafted his home into a voice controlled machine. We wanted to take this a step further and make computers more accessible for people such as Robbie.

What it does

We use a Google Cloud based API that helps us detect words and phrases captured from the microphone input. We then convert those phrases into commands for the computer to execute. Since the python script is run in the terminal it can be used across the computer and all its applications.

How I built it

The first (and hardest) step was figuring out how to leverage Google's API to our advantage. We knew it was able to detect words from an audio file but there was more to this project than that. We started piecing libraries to get access to the microphone, file system, keyboard and mouse events, cursor x,y coordinates, and so much more. We build a large (~30) function library that could be used to control almost anything in the computer

Challenges I ran into

Configuring the many libraries took a lot of time. Especially with compatibility issues with mac vs. windows, python2 vs. 3, etc. Many of our challenges were solved by either thinking of a better solution or asking people on forums like StackOverflow. For example, we wanted to change the volume of the computer using the fn+arrow key shortcut, but python is not allowed to access that key.

Accomplishments that I'm proud of

We are proud of the fact that we had built an alpha version of an application we intend to keep developing, because we believe in its real-world applications. From a technical perspective, I was also proud of the fact that we were able to successfully use a Google Cloud API.

What I learned

We learned a lot about how the machine interacts with different events in the computer and the time dependencies involved. We also learned about the ease of use of a Google API which encourages us to use more, and encourage others to do so, too. Also we learned about the different nuances of speech detection like how to tell the API to pick the word "one" over "won" in certain context, or how to change a "one" to a "1", or how to reduce ambient noise.

What's next for Speech Computer Control

At the moment we are manually running this script through the command line but ideally we would want a more user friendly experience (GUI). Additionally, we had developed a chrome extension that numbers off each link on a page after a Google or Youtube search query, so that we would be able to say something like "jump to link 4". We were unable to get the web-to-python code just right, but we plan on implementing it in the near future.