Inspiration
We've all had that moment: standing in front of a fridge full of random ingredients, unsure of what to make. In an era defined by short-form, visual content on platforms like TikTok and Instagram Reels, cooking inspiration is still largely confined to static, text-based blogs. We wanted to bridge that gap by reimagining how people discover and create food. The question that inspired us was simple: What if you could just take a photo of your ingredients and instantly get a cinematic, TikTok-style cooking video showing you what to make? From that idea, we set out to combine the power of multimodal AI with an everyday challenge, transforming recipe discovery into a complete, end-to-end culinary experience.
What it does
foogle is an AI-powered culinary assistant that turns a single photo of your ingredients into a fully immersive cooking experience. When a user uploads an image of their fridge or counter, the app analyzes it to identify edible ingredients and then generates five unique, easy-to-make recipes. Each recipe comes with a title, cooking time, and detailed step-by-step instructions.
foogle then automatically creates short, cinematic cooking videos, each 10 seconds long in a vertical 9:16 format to make it mobile-friendly. It's powered by Google’s Veo 3.1. Each video includes a realistic AI voiceover generated by Eleven Labs, narrating a script written by Gemini.
At the heart of the app is the Intelligent Cooking Assistant Agent. Unlike a traditional chatbot, this agent is an autonomous system capable of reasoning and taking actions through six specialized tools. It can generate categorized shopping lists, find substitutions for allergies or dietary needs, modify recipes to adjust servings or make them vegan/gluten-free, create meal plans, offer cooking tips, and estimate nutritional information. Together, these features make foogle not just a recipe generator, but a true personal cooking companion.
How we built it
foogle is built on a modern, AI-first stack using React, TypeScript, and Vite, styled with Tailwind CSS for a clean, responsive experience.
The backend logic runs on the Google GenAI SDK (@google/genai) and chains multiple models together for multimodal reasoning. Gemini 2.5 Flash serves as the system’s central brain by handling image analysis, structured recipe generation, and powering the Cooking Assistant Agent by autonomously deciding which tools to execute. Veo 3.1 converts Gemini’s text-based storyboards into cinematic 720p vertical cooking videos. Finally, Eleven Labs takes Gemini’s voiceover scripts and generates natural, high-quality audio narrations to complete the experience.
Challenges we ran into
Our biggest challenge was handling video generation quotas with the Veo API. Video generation is resource-intensive, and we often hit daily limits. To solve this, we implemented robust error handling and graceful fallbacks so the app still functions fully even when videos can’t be generated.
We also had to manage long-running async operations, since generating videos can take several minutes. We built a polling system that tracks progress, updates users in real time, and keeps them engaged with rotating fun facts and cooking tips.
Finally, ensuring agentic reliability and getting the assistant to consistently select the correct tools and use the right context required extensive prompt engineering and system tuning. With iteration, we achieved reliable, context-aware behavior that feels natural and autonomous.
Accomplishments that we're proud of
We’re proud to have built a complete “photo-to-video” pipeline that transforms a single image into a narrated cooking video. This seamless chain of vision, reasoning, video, and audio models represents a fully multimodal AI experience.
We’re also proud of developing a truly agentic system. The Intelligent Cooking Assistant goes beyond simple chat by reasoning, planning, and executing multi-step tasks autonomously.
To ensure great UX, we designed graceful fallback behavior, so the app works beautifully whether videos are generated or not. We also built a Test Mode that skips video generation for instant development feedback which saved us hours during iteration and made testing far more efficient.
What we learned
We learned that multimodal AI achieves its full potential when models are chained, not isolated. Gemini for reasoning, Veo for video, and Eleven Labs for audio together created something more powerful than any single model could.
Prompt engineering proved to be the backbone of reliability, where the clarity of our JSON schemas and system prompts directly determined the quality of our recipes, storyboards, and tool use. Finally, we learned how critical it is to manage user expectations during long-running AI processes: consistent feedback, progress indicators, and fun visuals kept users engaged and satisfied.
What's next for foogle
Looking ahead, we plan to expand foogle’s capabilities and personalization even further. We’re developing a grocery delivery integration, allowing the agent to connect with APIs like Instacart or Kroger to automatically order missing ingredients.
Next, we’re building personalized recipe recommendations using ingredient tags and user likes. As users engage with videos and save favorites, foogle will automatically curate their home feed to show new recipes and videos that match their tastes, preferred cuisines, and frequently used ingredients.
As video generation models become faster and quotas expand, we aim to generate full-length videos covering every step of a recipe. Finally, we plan to introduce user accounts and dietary profiles so the app can learn from preferences and create increasingly personalized, intelligent culinary experiences.
Built With
- elevenlabs
- gemini
- google-cloud
- javascript
- node.js
- openai
- react
- tailwind-css
- tsx
- typescript
- veo
- vite

Log in or sign up for Devpost to join the conversation.