Inspiration
Modern AI assistants often rely on text input and manual interaction, which interrupts workflow and reduces efficiency. We wanted to create a more natural and intuitive way for users to interact with their computers — using conversation instead of clicks.
Buddy was inspired by the idea that speaking should be the fastest interface. Our goal was to build an assistant that listens, understands intent, and executes tasks seamlessly in the background.
What it does
Buddy is a voice-first AI assistant for Windows that enables hands-free control of your computer through natural language commands. Instead of navigating menus or typing requests, users simply speak. Buddy interprets the request and performs the action in real time.
Key capabilities include:
Managing calendars and scheduling events Launching applications and navigating to websites Sending emails and messages Playing music and handling everyday tasks Language independent, recognizing all human languages Creating files and directories in the file system
Buddy functions as an intelligent copilot that streamlines daily computer use.
How we built it
Buddy is built as a modular, voice-to-action pipeline that converts natural speech directly into executable system and API commands.
Our architecture follows this flow:
Speech → Transcription → Intent Detection → API Routing → Execution
Speech Capture – The user speaks naturally through their microphone.
Speech-to-Text – Audio is transcribed into text using a speech recognition engine.
Intent Detection (LLM) – The transcript is sent to OpenAI, where a language model extracts the user’s intent and key parameters (e.g., action, target app, time, or content).
Command Routing – A decision engine maps the detected intent to the appropriate service or API (Gmail, Spotify, Calendar, system controls, etc.).
Execution – The selected API or automation script executes the task in real time.
For example:
“Send an email to Aaron” → Gmail API “Play a song by the Beatles” → Spotify API “Open Chrome” → Windows system command
This structured, decision-tree approach ensures reliable and deterministic behavior while still allowing flexible, natural language input.
We implemented Buddy using Python, OpenAI’s API for intent understanding, and third-party service APIs for task execution, creating a scalable and extensible architecture.
Challenges we ran into
One of our primary challenges was achieving reliable, real-time voice recognition. Background noise, accents, and variations in phrasing often reduced transcription accuracy, which directly impacted downstream task execution.
Another challenge was translating flexible natural language into deterministic system actions. Users can express the same intent in many different ways, so we had to design robust intent detection and parameter extraction to consistently map requests to the correct APIs.
We also worked to minimize latency across the entire pipeline — transcription, AI processing, and API calls — to ensure Buddy felt instantaneous rather than delayed.
Finally, integrating multiple external services (Gmail, Spotify, calendar tools, and system controls) required careful handling of authentication, rate limits, and error management to maintain reliability during live use.
Accomplishments that we're proud of
Some notable accomplishments include:
Built a fully functional, end-to-end voice assistant within a hackathon timeframe Achieved real-time speech-to-action execution with low latency Successfully integrated OpenAI with multiple external APIs Enabled hands-free control of real applications, not just chatbot responses Designed a modular architecture that can easily support new services
We’re especially proud that Buddy performs practical tasks rather than simply generating text responses.
What we learned
Through building Buddy, we gained hands-on experience with:
Speech recognition and audio processing Natural language understanding and intent classification using LLMs API integration and automation workflows Designing reliable decision trees for task routing Rapid prototyping and debugging under time constraints
We also learned that user experience is critical for voice interfaces — responsiveness and simplicity matter just as much as model accuracy.
What's next for Buddy
We plan to continue expanding Buddy’s capabilities by:
Adding wake-word activation (“Hey Buddy”) Improving intent accuracy and contextual awareness Supporting additional integrations and APIs Reducing latency further for near-instant responses Exploring cross-platform support beyond Windows Personalizing actions based on user behavior and preferences
Our goal is to evolve Buddy into a dependable, everyday AI copilot that makes interacting with computers faster, more natural, and completely hands-free.
Built With
- gmailapi
- google-cloud-console
- openai
- python
- spotipy
- whisper
Log in or sign up for Devpost to join the conversation.