Inspiration
Moving far away from home for work brought an unexpected challenge: survival cooking. I quickly grew tired of expensive, unhealthy takeout from nearby restaurants and wanted to save money. However, with zero cooking skills, I found traditional recipes and videos frustrating. I couldn't keep touching my phone with messy hands to pause or rewind, and I needed someone patient to guide me in real-time. I built Remy's Kitchen to be the sous-chef I didn't have—a live, voice-enabled assistant that watches me cook and talks me through every step.
What it does
Remy's Kitchen is a next-generation AI cooking assistant. Users simply paste a YouTube link to a recipe they want to try. Remy uses the Gemini 2.5 Flash-Lite model to instantly analyze the video metadata and summarize the ingredients and steps. Once the "Kitchen" is active, Remy becomes a live companion using the Gemini Live API. He can:
- See your progress: By processing live webcam frames to understand where you are in the recipe.
- Talk in real-time: Provide hands-free, interruptible audio instructions.
- Manage the kitchen: Users can ask Remy to "set a timer for 10 minutes," and he will execute the tool via function calling.
How we built it
The project is built with a FastAPI backend and a vanilla JavaScript frontend.
- Core Logic: Python serves as the bridge between the user and Google's Generative AI models.
- Multimodal Interaction: We use WebSockets to stream low-latency PCM audio and JPEG image frames to the Gemini 2.5 Flash Native Audio model.
- Infrastructure: The application is containerized using Docker and hosted on Google Cloud Run to ensure scalability and reliability.
- Tools: We implemented Google Cloud services and the YouTube Data API to fetch and process video context dynamically.
Challenges we ran into
The biggest hurdle was the steep learning curve of WebSockets and **Asynchronous programming **in Python. Connecting a frontend HTML environment to a complex AI backend required precise handling of data buffers to keep latency low. Initially, I struggled with integrating the HTML with FastAPI, but by leveraging AI Studio for prototyping and Gemini for debugging, I was able to bridge the gap between my ideas and the code.
Furthermore, we encountered a significant roadblock when deploying to the cloud: YouTube’s automated bot-detection systems. When hosted on Cloud Run, our initial scraping methods (using yt-dlp) were flagged as bot traffic because they originated from data center IP addresses. This prevented Chef Remy from accessing video descriptions for recipe extraction. To resolve this, we pivoted from scraping to a more robust, 'production-grade' architecture by integrating the official YouTube Data API v3. This not only bypassed the bot-detection hurdle but also improved the stability and professional standard of our data pipeline.
Accomplishments that we're proud of
I am incredibly proud of achieving a real-time, low-latency experience. There is something magical about speaking to an AI, having it respond instantly while you are busy chopping vegetables, and knowing it actually "understands" the recipe you just gave it. Successfully deploying a full-stack application to Google Cloud was a major milestone for me as a developer.
What we learned
This project was a massive learning journey. I went from having a basic idea to mastering:
- Backend Development: Building robust APIs with FastAPI.
- Cloud Infrastructure: Dockerization and Google Cloud SDK deployment.
- Multimodal AI: Implementing the Gemini Live API for audio/vision synchronization.
- Frontend Basics: Using MediaDevices API and AudioWorklets for real-time streaming.
What's next for Remy's Kitchen: The AI Assistance cooker
In the future, I want to give Remy a "memory" so he can remember my favorite ingredients or allergies. I also plan to implement more advanced Vision-language features where Remy can look at a pan and tell me if the steak is seared enough or if the water is boiling, making him the ultimate tutor for anyone starting their cooking journey from zero.
Built With
- css3
- docker
- fastapi
- functioncalling
- gemini2.5flash
- geminilivemodel
- google-cloud
- googlegenaisdk
- html5
- javascript
- python
- youtubedataapi
Log in or sign up for Devpost to join the conversation.