Inspiration

The inspiration came from seeing endless recipe videos as I browse the internet, but never feeling confident enough to get in the kitchen and try.

Cooking is a very practical skill so when the recipe videos go too quick and don't explain things, it leaves you feeling deflated as a beginner.

What it does

YesChef takes blog and video links and turns it into an ingredients list as well as step by step instructions with handy tips.

Most importantly YesChef let's you connect via voice and video to be guided along the way. YesChef talks to you and guides you through the recipe, helping you tune things to how you like it, learn techniques and visually confirm if things look right.

How I built it

These are the main tools, libraries and frameworks I used:

  • Gemini (3 Flash Preview & Native Audio Preview 12-2025)
  • FastAPI
  • Python
  • React w/vite
  • Lucide
  • Zustand
  • Livekit
  • Supabase
  • Firecrawl
  • Supadata
  • Render
  • Github Copilot

Challenges I ran into

There was a big challenge in making sure the realtime agent had video data at the correct time and wasn't hallucinating using stale video frames, or being steered to hard by the recipe to "see" things that weren't in the frame.

Another challenge was figuring out how to get the video/blog data, initially I wanted to build my own pipeline using tools like recipe-scrapers and yt-dlp but due to timing constraints and technical challenges like having to navigate around the social network detection systems I decided to use existing APIs (Supadata and Firecrawl).

Accomplishments that I'm proud of

To get the realtime audio/video working to a good level where YesChef would not hallucinate constantly with what it was shown but be free enough to help the user in general regarding cooking was a big challenge, so I'm proud to get that working.

I'm proud to get the entire flow working, although it's still a work in progress and there are things to refine and fix I'm happy that I can put a recipe link and then start talking and sharing realtime video to the YesChef agent in less than a few minutes.

What I learned

I mainly learned how to build a realtime agent using the Gemini live models, and using the LiveKit integration to make it all work smoothly. How to set up an agent with not only text data that we gathered but to make it flexible enough to react to "sensory data" like talking and showing the surroundings - as it starts and stops.

What's next for YesChef

YesChef doesn't handle videos that have their ingredients recipe as "subtitles" embeded in the video yet as we're not downloading and processing the video. This would be the main next step, the reason it wasn't done was a tradeoff of the complexity with the time I had.

Other than the main step outlined, I would also like to introduce user accounts, let the users and YesChef to rate a cooking session, then this can be fed into later sessions of cooking the same (or different) recipes, so users can feel like they're improving their skills, to their taste as they carry on using the app.

Some smaller tasks of improvement would be testing the agent and ingestion systems for more robustness.

Built With

Share this project:

Updates