Inspiration

We’ve all been there: scrolling through TikTok or YouTube Shorts, finding a delicious recipe, and saving it—only to never cook it because constantly pausing and rewinding a fast-paced video while your hands are covered in flour is a nightmare.

We also noticed a common "pantry paralysis"—staring at a fridge full of ingredients but having no idea what to make. We wanted to build a bridge between inspiration and action, creating an intelligent assistant that doesn't just store recipes, but actively helps you cook them.

What it does

PlateIt is an AI-powered sous-chef that transforms how you find, manage, and cook food:

  1. Universal Recipe Extraction: Paste a link from YouTube (including Shorts), Instagram, or a blog, and PlateIt uses Gemini 3 to "watch" the video or read the site. It automatically extracts a structured recipe with ingredients, amounts, and step-by-step instructions.
  2. Smart Pantry & Vision: Snap a photo of your open fridge or groceries. Our Gemini 3 integration analyzes the image to identify ingredients and auto-populate your digital pantry, estimating quantities instantly.
  3. Intelligent Cooking Mode: When you're ready to cook, PlateIt enters a distraction-free, step-by-step mode.
  4. Multimodal Chef Agent: Stuck on a step? Ask our AI Chef. You can even send a photo of your pan and ask, "Is this caramelized enough?" or "Does this look consistent?", and the agent will analyze the visual data to give you real-time culinary advice.

How we built it

We built PlateIt as a native Android application (Java) backed by a robust FastAPI (Python) server.

  • The AI Core: We leveraged Google Gemini 3 for high-speed video and text processing, and Gemini 3 for its superior vision capabilities in pantry scanning and cooking analysis.
  • Agent Orchestration: We used LangGraph to build a complex, stateful agent workflow. This allows our backend to intelligently route tasks—deciding whether to scrape a website, download a video using yt-dlp, or perform an OCR scan on an image.
  • Data & Search: We integrated Spoonacular API for structured food data and nutrient information, and SerpApi to fetch trending food blogs and YouTube videos dynamically based on user preferences.

(Note: We recommend uploading the agent_workflow.png found in BackEnd/Agent/ here to visualize the LangGraph architecture)

Challenges we ran into

  • Video Processing: Handling the myriad of video formats (Shorts vs. Long-form, different codecs) was tricky. We had to implement a robust pipeline that downloads the video, optimizes it, and then feeds it to Gemini for multimodal analysis without hitting timeout limits.
  • Agent State Management: Managing the state for a multi-step cooking conversation—remembering context from "Step 3" while answering a question about "Step 1"—required careful design of our LangGraph nodes.
  • Pantry Accuracy: Distinguishing between a "dish" (cooked food) and "ingredients" (raw food) from a single user upload required fine-tuning our prompts to ensure the AI routed the request to the correct processing logic.

Accomplishments that we're proud of

  • Seamless Multimodality: Successfully implementing a flow where a user can snap a picture of their stove, send it to the backend, have Gemini analyze it, and get a text-to-speech response in seconds.
  • The "Watch to Cook" Pipeline: Taking a raw YouTube Short URL and converting it into a clean, interactive UI with ingredients and timers feels magical every time we use it.
  • Complex Agentic Workflow: Building a modular backend where the agent effectively "thinks" about the best tool to use (Search vs. Vision vs. Knowledge Base) rather than just following a hardcoded script.

What we learned

  • Gemini's Vision capabilities are production-ready: The speed and accuracy of Gemini 3 in identifying obscure ingredients from blurry fridge photos exceeded our expectations.
  • Structured Outputs are Key: Getting an LLM to output "vibes" is easy; getting it to output strict JSON that leads to a crash-free Android app requires rigorous prompt engineering and validation layers.

What's next for PlateIt

  • Social Cooking: Sharing your "Cooked It" results and tweaked recipes with friends.
  • Dietary Architect: Adding a global filter (Vegan, Keto, GF) that automatically modifies every imported recipe to fit your needs before you even see it.
  • Instacart/Amazon Integration: One-tap ordering for the missing items in your generated grocery list.

Built With

Share this project:

Updates