Inspiration
Cooking is when a screen is least usable and most needed—your hands are messy, the phone keeps locking, and the recipe's buried under a cutting board. We wanted a recipe you never have to touch: just talk to your kitchen and have it talk back. That's Handful—hands-free cooking, for when your stomach is hungry and your hands are full.
This product is highly applicable to college kids for when the fridge is low, but their stomachs are still hungry. Introducing more ideas with the limited resources they have, teaches them the creativity that cooking allows for.
What it does
Handful is a voice-native cooking assistant. You talk, it cooks with you—completely hands-free.
- Say what you've got or what you're craving—"I've got flour, eggs, and butter" or "I wanna make donuts"—and Handful suggests real recipes, confirming before it commits so it never gets ahead of you.
- Once you confirm, the recipe locks in and becomes spoken notecards: Handful reads each step aloud and shows it as a clean display you never have to touch.
- Voice-controlled timers start, pause, and tell you when something's done—no phone alarms.
- Recipes adapt mid-cook. Say "I'm out of feta" and it rewrites itself with a substitution in real time.
- It moves at your pace, only advancing when you say so
- It remembers your preferences, diet, likes, and dislikes, so recommendations fit you.
- Wake-word activation ("Hey Chef") means it only responds when you mean it—built for a noisy kitchen.
How we built it
- Voice is the entire product, so we built directly on Deepgram's Voice Agent stack over a single real-time WebSocket: Nova-3 for high-accuracy, real-time speech-to-text, and Aura-2 for natural, low-latency text-to-speech that doesn't make you want to mute it.
- The Voice Agent API to orchestrate the full STT → reasoning → TTS conversational loop
- On top of that, a fast intent-parsing layer using Gemma3 turns free-form speech into structured cooking actions (generate recipe, confirm, next step, set timer, log a mistake, swap an ingredient), backed by a recipe state machine that gates the flow with natural confirmations. The frontend is a custom-rendered, kinetic card-stack UI that stays in sync with the conversation—driven by the same WebSocket that carries the audio.
Challenges we ran into
- Knowing when the user is done talking. Listing ingredients out loud has natural pauses; we tuned turn-taking and added confirmation gates so Handful never commits to a recipe before you've finished your thought.
- Keeping the UI and the voice in lockstep. The card on screen always reflects the exact step the agent is on — getting real-time audio and visual state to agree took a clean, single-source-of-truth architecture.
- Latency. A cooking assistant that lags is useless when your pan's already smoking. We leaned on Deepgram's real-time stack to keep responses snappy.
Accomplishments that we're proud of
- Voice is more than just a feature in Handful—it's the whole interface. You can cook an entire recipe start to finish without ever touching the screen.
- Real-time ingredient swaps that rewrite the recipe live.
- A confirmation-aware agent that genuinely waits for you and moves at your pace.
- A demo that feels less like an app and more like having a chef next to you.
What we learned
- Building around WebSockets and Deepgram’s low-latency streaming APIs taught us how much responsiveness matters in voice interfaces. Real-time transcription, turn-taking, and knowing when to stop talking and listen are what make an agent feel natural.
- Using WebSockets with Deepgram’s streaming stack showed us that voice-first experiences depend on latency above all else. Fast transcription and smooth turn-taking create the illusion of a patient, human conversation.
What's next for Handful
Integration using tuned models such as ChefGPT. Pantry scanner for mobile devices. User uploaded recipes.
Built With
- css
- deepgram
- fastapi
- gemma3
- python
- websockets
Log in or sign up for Devpost to join the conversation.