Inspiration
Cooking with digital recipes is often far more frustrating than it should be. Recipe videos force users to constantly pause and rewind, while written recipes require endless scrolling up and down the page to find the right step again. In real kitchens, this becomes even more inconvenient because your hands are messy, your attention is split, and the device that is supposed to help you ends up disrupting the flow of cooking.
We wanted to solve a problem we have all experienced ourselves: recipes are not designed for the physical reality of being in the kitchen. When you are cooking for the first time, trying to manage timing, ingredients, and unfamiliar steps, even something as simple as checking what comes next can feel annoying and disruptive. That frustration adds up quickly and makes cooking feel more stressful than enjoyable.
That is what inspired us to build CookPilot. Instead of making the user adapt to the recipe, we wanted the recipe to adapt to the user. CookPilot takes recipe text or video input and transforms it into a hands-free guided cooking experience. Rather than forcing the cook to constantly look down at a screen, scroll, or scrub through a video, CookPilot becomes an AI chef that can walk you through the process step by step, answer your questions in context, and help you stay focused on the food itself.
At a time when AI is often framed as something abstract or detached from daily life, we wanted to create something deeply practical. CookPilot brings AI into one of the most universal everyday activities, making cooking less frustrating, more accessible, and more enjoyable.
What it does
CookPilot turns recipe text, uploaded files, or video recipes into a hands-free cooking experience. Once a recipe is loaded, it guides the user step by step with voice output, keeps track of progress, answers contextual questions, and helps the user move through the recipe naturally without constantly touching their phone or laptop.
It also supports practical cooking help in real time. Users can ask questions like what ingredient comes next, how much to add, or what a specific tool means. The assistant can suggest substitutions, handle step-specific timers, and remember where the user left off so the cooking flow does not reset if the page reloads.
How we built it
We built CookPilot as a full-stack web app using FastAPI for the backend and JavaScript, HTML, and CSS for the frontend. The backend handles recipe ingestion, parsing, cooking session state, timer logic, and conversational reasoning, while the frontend handles the guided cooking interface, recipe browsing, voice interaction, and playback. For the AI layer, we used the OpenAI API for recipe parsing, contextual question answering, and recipe understanding. This allows the assistant to stay grounded in the recipe while still responding naturally to user questions during the cooking process. For voice output, we integrated ElevenLabs to create a more natural, assistant-like speaking experience. To support recipe videos, we built a pipeline using yt-dlp and FFmpeg to process media, Whisper for transcription, and OCR plus vision-based extraction to recover information shown on screen, such as ingredients or measurements. We also added persistent session storage so users can reload the app and continue where they left off, which is especially important in a real cooking environment.
Challenges we ran into
One of the biggest challenges was making the experience feel truly hands-free instead of just adding a chatbot on top of recipes. We had to carefully design how the assistant should react to natural phrases like “next step,” “I’m done,” or “start the timer,” while keeping the behavior reliable enough for a live demo.
Another major challenge was handling recipe videos. Recipe information is often split across narration, on-screen text, and visuals, so extracting useful structure from videos was much harder than handling plain text recipes. We also ran into issues with voice integration, browser audio behavior, microphone handling, and API credential setup across multiple services.
Accomplishments that we're proud of
We’re proud that CookPilot feels like a real kitchen companion rather than just a recipe viewer. It can guide users step by step, answer questions in context, help with substitutions and timers, and create an experience that actually feels useful while cooking.
We’re also proud of bringing together multiple input types, conversational AI, and voice-based interaction into one cohesive product. For a hackathon project, it pushed us to think beyond individual features and build something that feels like a complete experience.
What we learned
We learned that building a good AI product is not just about using a model. A lot of the real work is in the interaction design: how the assistant responds, when it should speak, how much initiative it should take, and how to make the overall experience feel natural.
We also learned a lot about multimodal processing, speech interfaces, and the complexity of coordinating frontend behavior with backend AI systems. Working on CookPilot showed us how much small UX decisions matter when building something people will actually use in real life.
What's next for CookPilot
Next, we want to make CookPilot even more agentic and adaptive. That includes improving how it interprets natural cooking actions, making timer handling more seamless, and giving more proactive guidance during longer or more complex recipes.
We also want to improve video understanding, strengthen personalization for dietary needs and user preferences, and make the assistant feel even more like a true kitchen CookPilot instead of a reactive helper.
Built With
- auth0
- css
- docker
- elevenlabs
- fastapi
- ffmpeg
- html
- javascript
- openai
- paddleocr
- python
- websockets
- whisper
- yt-dlp
Log in or sign up for Devpost to join the conversation.