Inspiration
The spark for OmniChef didn't come from a tech lab; it came from my kitchen in Mauritania. Every day, I watched my mother carry the heavy mental load of deciding what to cook. But it wasn’t just about finding any meal our family has strict health requirements because my father suffers from hypertension.
Mauritania and the broader MENA region are rich in highly nutritious, traditional ingredients, but the knowledge of how to utilize them properly for modern health diets is often lost. Furthermore, many existing tech solutions require high digital literacy. I realized we needed a solution that was as natural as talking to a friend. I built OmniChef to bridge this gap: adding a conversational AI layer so that anyone can access personalized, healthy, and culturally authentic culinary guidance simply by speaking in their native language.
What it does
OmniChef is an AI-powered, completely hands-free culinary assistant. It removes the friction of modern cooking through two core multimodal features:
Visual Ingredient Analysis: Using your phone's camera, OmniChef identifies exactly what ingredients you have on your counter.
Real-Time Voice Guidance: Powered by Google Gemini Live, it converses with you naturally. You don't need to touch your screen with messy hands. You just ask, "What can I make with these that is safe for my dad's high blood pressure?"
OmniChef instantly cross-references your ingredients against our curated database of 100+ authentic Mauritanian and MENA recipes, guiding you step-by-step through the cooking process.
How we built it
We engineered OmniChef for real-time performance, extremely low latency, and cross-platform scale:
Frontend: Built with Flutter (Dart 3.0) for seamless deployment across Web, Android, and Windows from a single codebase. We implemented custom audio capture (record and mp_audio_stream) and bidirectional web_socket_channel for instant AI communication.
Backend & Database: A stateless Python/FastAPI architecture deployed on Google Cloud Run via Docker. It connects asynchronously via asyncpg to a PostgreSQL database housing our recipes, nutritional info, and user data. The API is cleanly organized into specific routing endpoints (e.g., /recipes, /voice, /vision) to handle real-time multimodal streams.
The AI Layer: We utilized the cutting-edge Gemini 2.5 architecture across the entire application. We used Gemini 2.5 Flash for instantaneous text and vision processing, and Gemini Live 2.5 Flash Native Audio (via Vertex AI) for 16kHz/24kHz bidirectional voice streaming.
Zero Hallucinations: We built a custom Function Calling architecture with 8+ registered tools:
Python
Example of our tool registry forcing database queries
tools = ["find_recipe", "get_recipe_details", "set_timer", "advance_cooking_step"] The AI is structurally required to query our database before speaking, making hallucinated recipes impossible.
Challenges we ran into
Brutal Web Audio Latency: Capturing raw PCM audio in a browser, streaming it via WebSockets to Cloud Run, routing it to Vertex AI, and piping it back with sub-3-second latency was a massive hurdle. We had to build custom voice activity detection (VAD) and audio resampling pipelines from scratch.
Cultural & Linguistic Nuance: OmniChef seamlessly auto-detects and switches between English, French, Arabic, and Darija mid-conversation. Tuning the system prompts to maintain dialect sensitivity and accurate cultural substitutions pushed the limits of the model.
Barge-in Recovery: Humans interrupt each other constantly. We engineered custom state-recovery logic so users can cut the AI off mid-sentence without losing their active timer, step position, or recipe context.
Infrastructure Constraints: Cloud Run's default concurrency model isn't designed for long-lived WebSocket streaming sessions. We had to restructure the connection lifecycle and engineer GCP credential bootstrapping at container startup to keep Vertex AI auth seamless.
Accomplishments that we're proud of
We are incredibly proud to have built an app that solves a highly technical problem (real-time multimodal AI streaming) while remaining deeply human and accessible. By successfully engineering a voice-first pipeline, we removed the "screen barrier," allowing anyone to cook hands-free. Preserving our cultural culinary heritage while adapting it for modern health needs is a massive win for our team.
What we learned
We learned that the future of UI is no UI. Voice and Vision combined create an entirely new paradigm of human-computer interaction. We also deepened our expertise in Google Cloud architecture, specifically learning how to optimize Vertex AI endpoints and WebSockets to handle continuous, real-time audio streams without bottlenecking.
What's next for OmniChef
This is just the beginning. Our future roadmap includes:
Local Chef Collaborations: Partnering with Mauritanian and regional MENA chefs to digitize their signature dishes via a revenue-sharing model.
E-Commerce Integration: Connecting our ingredient recognition directly to local grocery delivery APIs to automatically order what you are missing.
Global Cuisine Expansion: Scaling our curated dataset methodology to preserve and adapt indigenous cuisines worldwide, ensuring healthy, accessible cooking for every culture.
Built With
- dart
- docker
- fastapi
- flutter
- google-cloud
- google-cloud-run
- python
- sqlalchemy
- vertex-ai
- websockets
Log in or sign up for Devpost to join the conversation.