Inspiration
We were inspired by [e.g., the daily frustration of switching between multiple productivity apps / the need for a more accessible way for visually impaired users to interact with smart home devices / the lack of a voice agent that truly understood niche gaming commands]. We realized that current voice assistants [e.g., were too generic / lacked deep integration with X / weren't privacy-focused enough]. We envisioned a voice agent that could [e.g., seamlessly manage our digital workflow / provide truly hands-free control in specific environments / offer a more personalized and secure experience
What it does
Our voice agent allows users to [e.g., manage their to-do lists, schedule meetings, and draft emails across different platforms using natural language commands]. It can also [e.g., control smart lights and thermostats from various brands through a unified voice interface] and [e.g., fetch real-time stock information and set price alerts]. For example, a user can simply say, '[Your Agent's Wake Word], add 'finish report' to my Trello board for tomorrow and schedule a follow-up meeting with Sarah next Tuesday.'
How we built it
Core Logic: Python (Flask/Django for backend API, custom NLU logic) Speech-to-Text (STT): Google Cloud Speech-to-Text API / Vosk / DeepSpeech Natural Language Understanding (NLU): Rasa NLU / Dialogflow / spaCy / Custom intent parsing Text-to-Speech (TTS): Google Cloud Text-to-Speech API / eSpeak NG / Coqui TTS Wake Word Detection: Porcupine / Snowboy / Custom model Frontend (if any): React / Vue.js / Simple HTML/JS for a demo interface Database (if any): PostgreSQL / SQLite / Firebase Integrations: [e.g., Trello API, Google Calendar API, Philips Hue API] Platform/Hardware: [e.g., Deployed on AWS EC2 / Runs locally on a Raspberry Pi 4 / Desktop application] "Our agent uses a [e.g., modular architecture with a core service handling STT, NLU, and TTS, and separate modules for different skills/integrations]."
Challenges we ran into
Accurately transcribing noisy audio in real-world environments was a significant challenge. We experimented with different STT services and implemented pre-processing filters to improve results." "Distinguishing between similar-sounding intents (e.g., 'add task' vs. 'ask about task') required careful NLU model training and fine-tuning of our entity extraction. "Integrating with [X proprietary API] was difficult due to limited documentation, so we had to do a lot of reverse engineering and trial-and-error." "Managing context across multi-turn conversations proved complex, leading us to implement a more robust dialogue management system."
Accomplishments that we're proud of
"We're particularly proud of the [e.g., natural language understanding for complex scheduling commands, achieving over 90% intent accuracy in our tests]." "Successfully integrating [e.g., three different smart home ecosystems] into a single, coherent voice interface." "Developing a custom wake word detection model that is both lightweight and highly accurate on our target hardware." "The agent's ability to maintain conversation context for up to [X] turns, allowing for more natural follow-up questions."
What we learned
"We learned a tremendous amount about the intricacies of STT and NLU, and the importance of high-quality training data." "This project deepened our understanding of API design and integration patterns." "We gained valuable experience in designing intuitive voice user interfaces and handling ambiguity in user speech." "The importance of iterative development and user testing became very clear, especially for a voice-first product."
What's next for
"We want to improve the personalization by allowing users to train custom voice commands and responses." "Exploring on-device NLU processing to enhance privacy and reduce latency is a key future goal." "We'd like to conduct more extensive user testing to refine the user experience and identify new use cases." "Potentially, we could look into [e.g., building a mobile app companion / open-sourcing parts of the framework]."
Built With
- database
- javascript
- python
- rag
Log in or sign up for Devpost to join the conversation.