Commercial voice assistants like Alexa and Siri are amazing tools for productivity and automation but they're also closed-source 'black boxes' with the ability to record everything in your home. Without a way to verify if your voice data is being recorded on their servers, this brings upon concerns for the privacy conscious automation enthusiasts. Our goal is to create a solution which doesn't send your voice data off of your device while also being able a helpful tool for hands-free productivity and automation.
What it does
Randall is a voice assistant which doesn't send your voice data to any server for processing. All the voice recognition is done on-device so your data isn't sent anywhere else, easing concerns for privacy conscious folks.
How we built it
We're using Mozilla's DeepSpeech project as our voice recognition library. This text data is processed by Natural Language Tool Kit and spaCy for tokenization before being sent through our logic to determine what the user wants. If a response is merited, the platform's native text-to-speech is invoked. This pipeline is kept together with Python.
Challenges we ran into
This project is composed of a lot of technologies we have never used before. These include DeepSpeech and NLTK. Learning to implement these technologies as well as wrangling Python to do what we needed proved to be challenges for us. However, the biggest challenge was designing the logic to be able to understand what the user wants.
Accomplishments that we're proud of
A functional voice assistant without the use of cloud-based voice recognition APIs is definitely something we're proud of.
What we learned
Since it was our first time working with the speech recognition technologies we used, we had to get familiar with how to use DeepSpeech, a voice-to-text project which uses TensorFlow to map voice data to text, and NLTK, an extremely helpful tool when processing natural language.
What's next for Randall
Randall needs more skills to become even more useful than it is. Using our interface, this can be done without dealing with DeepSpeech or NLTK. Randall could also get better at recognizing speech. This would involve training a new DeepSpeech model to be more accurate. Randall would also be a perfect fit for a discrete hardware product which could be developed using a small single-board computer.