Inspiration
Every home cook knows the frustration: hands covered in flour, a sauce reducing on the stove, and the recipe you need is locked behind a screen you can't touch. Traditional cooking apps demand constant visual attention and manual interaction — exactly what you can't give when you're actually cooking.
We asked ourselves: what if your kitchen had an AI sous-chef that could listen to you, see what you're doing, know what's in your pantry, respect your allergies, and guide you step by step — all without you ever touching your phone?
That question became Chefeze — a real-time, voice-first cooking copilot powered by Gemini Live and Google ADK. The name blends "chef" with "eze" (ease), capturing the mission: make cooking with AI genuinely effortless.
Three things drove the design from day one:
- Hands-free is non-negotiable. A cooking assistant you have to type into isn't solving the real problem.
- Grounded, not hallucinated. Recipe suggestions must come from real data — actual pantry contents, verified allergen databases, real ingredient prices — not fabricated convenience.
- Trust through transparency. When the agent warns you about cross-contamination or tells you a dish is over budget, it must show its evidence, not just assert confidence.
What it does
Chefeze is a multimodal live cooking agent that operates through voice and vision in real time:
- Talk naturally — Push-to-Talk or fully hands-free Cook Mode with barge-in (interrupt the AI mid-sentence when your timer goes off).
- Show your kitchen — Point your camera at your pantry for instant ingredient extraction, or enable Cook Mode's continuous camera stream where the AI monitors your cooking for safety hazards (dangerous temperatures, cross-contamination, knife safety) at up to 2 frames per second.
- Get grounded answers — Every recipe suggestion passes through a 5-step hybrid GraphRAG pipeline: vector search (pgvector HNSW), full-text search (tsvector + trigram), knowledge graph expansion (ltree, 1-2 hops), cross-retriever fusion (Reciprocal Rank Fusion, \(k=60\)), and a confidence gate (\(\theta = 0.15\)) — all running on PostgreSQL 17.
- Stay safe — Real-time allergen detection across 14 EU allergen categories using Open Food Facts data, with STOP/WARN/TIP alert levels. A dedicated food safety sentinel checks internal temperatures, danger zone time limits, and cross-contamination risks.
- Play the Budget Challenge — Set a budget and number of people, and the AI gamifies meal planning with real ingredient prices from the Open Prices API. Earn badges like
under_budget,zero_waste, orpantry_master. - See your food before you cook it — Gemini generates photorealistic hero images and illustrated step cards, uploaded to Cloud Storage and displayed inline.
- Save, remix, and fork recipes — Build a personal cookbook. Remix any recipe into a variant (e.g., "make it vegetarian") with visible lineage and a "View original" link.
- Cook in your language — Full 4-language support (English, Portuguese BR, Portuguese PT, Spanish) across voice, UI, units, and currency — with 775 i18n keys per locale.
How we built it
Architecture
Chefeze is a monorepo with three main layers:
┌─────────────────────────────────────────────────┐
│ PWA / Android (Ionic React + Capacitor + Vite) │
│ Audio capture (16kHz PCM) · Camera frames │
│ 14-type card renderer · Audio visualizer orb │
└───────────────────┬─────────────────────────────┘
│ WebSocket
┌───────────────────▼─────────────────────────────┐
│ FastAPI WebSocket Gateway (state machine) │
│ Resume tokens · Sequence validation · Barge-in │
│ Session compression at 70% context capacity │
└───────────────────┬─────────────────────────────┘
│ ADK streaming bridge
┌───────────────────▼─────────────────────────────┐
│ Chef Agent (Google ADK Hub-Specialist Pattern) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐│
│ │ Safety │ │ Budget │ │ Creative ││
│ │Specialist│ │Specialist│ │Specialist ││
│ └──────────┘ └──────────┘ └──────────────────┘│
│ ┌──────────┐ ┌──────────┐ │
│ │Retrieval │ │ Game │ 9 ADK Tools │
│ │Specialist│ │ Master │ 55 Cuisine Skills │
│ └──────────┘ └──────────┘ │
└───────────────────┬─────────────────────────────┘
│
┌───────────────────▼─────────────────────────────┐
│ Data Layer │
│ PostgreSQL 17 (pgvector + ltree + pg_trgm) │
│ Redis 7.4 (sessions, rate limits, flags) │
│ Google Cloud Storage (images) │
│ MCP: Open Food Facts · Open Prices │
└─────────────────────────────────────────────────┘
Multi-Agent Orchestration
The core of Chefeze is a hub-specialist agent pattern built on Google ADK. A coordinator agent receives every user turn and routes it to the right specialist using confidence-scored intent detection:
| Specialist | Responsibility | Key Tools |
|---|---|---|
| Safety | Allergen checks, food safety, cross-contamination | check_allergens, food_safety_sentinel |
| Budget | Cost estimation, budget challenge scoring | estimate_cost (MCP Open Prices) |
| Creative | Recipe composition, image generation | compose_recipe, generate_images |
| Retrieval | Knowledge base search, recipe discovery | retrieve_recipes (GraphRAG) |
| Game Master | Challenge gamification, badge scoring | score_challenge, ui_action_plan |
Intent routing uses multi-signal confidence scoring with ambiguity detection (threshold 0.35, gap 0.10). Safety always gets priority weight \(1.5\times\) — because catching a peanut allergy matters more than suggesting a garnish.
The GraphRAG Pipeline
Recipe retrieval is a 5-step hybrid pipeline, not a single vector lookup:
- Query Normalization — Keyword-based intent extraction (diet, allergens, equipment, budget, cuisine) across 4 languages. No LLM call needed.
- Embedding —
gemini-embedding-2-preview(3072-dim, multimodal). Content-hash cache prevents redundant API calls. - Vector Retrieval — HNSW cosine search over pgvector
halfvec(3072), top-40 candidates, similarity threshold \(\geq 0.30\). - Lexical Retrieval — Dual-path: PostgreSQL
tsvectorfull-text search +pg_trgmtrigram similarity (\(> 0.3\)), bothunaccent-normalized for diacritics. Run in parallel, union-deduplicated. - Graph Expansion — 1-2 hop traversal of an ontology graph (ingredients, substitutions, cuisines, techniques, allergen flags, diet tags) using
ltreepaths. - Scoring Fusion — Reciprocal Rank Fusion (\(k=60\)) across retrieval channels, then:
$$ \text{score} = 0.40 \cdot s_{\text{vec}} + 0.15 \cdot s_{\text{lex}} + 0.20 \cdot c_{\text{pantry}} - 0.10 \cdot p_{\text{cost}} - 0.05 \cdot p_{\text{equip}} + 0.10 \cdot b_{\text{personal}} $$
Allergen-matched recipes are hard-blocked (\(\text{score} = -\infty\)). A confidence gate at \(\theta = 0.15\) filters noise.
Gemini Models Used
| Purpose | Model |
|---|---|
| Live voice streaming | gemini-2.5-flash-native-audio-preview |
| Text/vision reasoning | gemini-3-flash-preview |
| Multimodal embeddings (3072-dim) | gemini-embedding-2-preview |
| Image generation (hero) | gemini-3-pro-image-preview |
| Image generation (steps) | gemini-2.5-flash-image |
| Cook mode frame safety | gemini-3-flash-preview (vision) |
Privacy and Safety
- Consent-gated memory: user preferences, allergies, and taste profiles are only persisted when the user opts in. Three consent levels:
private(session-only),personalized(turns persisted),contribute(full memory graph). - Guardrail engine:
ALLOW / REDACT / REWRITE / BLOCK / ESCALATEpolicies on both input and output, enforced via ADK callbacks. - Semantic log redaction: 17 sensitive field patterns globally redacted from structured logs.
- GDPR deletion:
DELETE /auth/mewith FK-ordered cascade across 6 child tables.
Testing and Quality
We invested heavily in layered validation:
| Layer | Count | What it proves |
|---|---|---|
| Backend unit/integration | 3,594 | Domain logic, DB writes, WS auth, GraphRAG pipeline, tool side effects |
| Frontend unit/component | 1,942 | Component behavior, hooks, state, accessibility (axe-core) |
| Ingestion pipeline | 110 | Embedding, seeding, ontology graph correctness |
| E2E (Playwright) | 114 | Cross-boundary journeys: login → live → pantry → recipe → challenge → safety |
| Security/abuse | 40+ | Auth bypass, BOLA/IDOR, brute force, prompt injection, secret scanning |
| i18n parity | 775 × 4 | Key coverage across all 4 locales |
Total: 5,760+ automated tests, with a 75% backend coverage gate enforced in CI.
Challenges we ran into
1. Real-time interruption safety. When a user says "wait, stop" while the agent is mid-sentence describing a recipe, you need true barge-in — not a polite queue. We implemented server-owned audio queues with drop-oldest backpressure (4 bounded asyncio queues: audio_in=32, audio_out=64, control=32, tool_results=16) and a WebSocket state machine that handles interruption as a first-class event, not an edge case.
2. Grounding vs. hallucination in live voice. It's tempting to let the LLM freestyle recipe suggestions. But a cooking copilot that invents ingredients you don't have or misses your peanut allergy is worse than useless — it's dangerous. Building the full GraphRAG pipeline with allergen hard-blocks and pantry coverage scoring was the hardest engineering investment, but it's what makes the agent trustworthy.
3. Vision at kitchen speed. Cook Mode captures camera frames at up to 2 FPS for safety analysis. But network hiccups, slow model responses, and frame backlogs can cascade. We added a circuit breaker (3 failures → open), rate limiting, a vision queue with max depth 8 and drop-oldest policy, and per-session failure tracking. The system degrades gracefully instead of crashing.
4. Multilingual voice + UI + data consistency. Supporting 4 languages isn't just translating strings — it means locale-aware voice prompts, unit conversions (metric/imperial), currency formatting, recipe retrieval fallback chains (requested locale → en → any), and unaccent-normalized search across Portuguese diacritics. We wrote 775 i18n keys per locale and enforced parity in CI.
5. Privacy-safe observability. We needed operator telemetry (session durations, tool call rates, error patterns) without leaking user content into logs. The solution was semantic-key global redaction in structlog (17 field patterns), protected /metrics behind admin authentication, and span attribute scrubbing before OpenTelemetry export.
Accomplishments that we're proud of
- A working live voice agent that you can actually interrupt, that sees your kitchen, and that grounds every suggestion in real pantry and allergen data — not a demo with pre-recorded responses.
- 5,760+ tests with zero mocks at the integration layer — every DB write, WebSocket handshake, and GraphRAG query hits real PostgreSQL and Redis.
- 14 allergen safety regression tests covering edge cases like casein detection, cross-contamination warnings, and multilingual allergen names.
- The scoring fusion formula that balances vector similarity, lexical relevance, pantry coverage, cost penalty, equipment penalty, and personalization — making recipe suggestions feel genuinely tailored.
- 55 cuisine skills with structured flavor principles, safety notes, and guardrails — from Brazilian to Japanese to Mediterranean — that activate progressively per session.
What we learned
Real-time agent quality comes from state handling, not prompt engineering. The difference between a demo and a product is how the agent behaves when the WebSocket drops, the user interrupts, or the MCP circuit breaker opens. We spent more time on retry/offline/interruption logic than on prompt tuning.
Specialist agents work best when the live bridge owns state and boundaries. Letting each specialist run independently caused chaos. The hub-specialist pattern with explicit tool allowlists, turn limits, and timeout budgets per specialist made the system predictable.
Compose + targeted browser proof beats broad mock-heavy confidence. Running tests inside Docker Compose against real PostgreSQL and Redis caught bugs that mocked tests would have missed — including a migration that worked in SQLite but failed on real Postgres.
The last-mile hackathon risk is proof packaging, not missing features. We had a working product days before the deadline. The hard part was capturing evidence, recording the demo, and making the submission tell a coherent story.
Privacy and safety aren't features you bolt on — they're architecture decisions. Consent-gated memory, semantic log redaction, and allergen hard-blocks had to be designed into the data model and agent routing from the start. Retrofitting them would have been a rewrite.
What's next for Chefeze
- Pantry intelligence — Auto-compose meals from what you already have, with expiration-aware prioritization.
- Personal recipe library — Upload your own PDFs and handwritten recipe photos; the agent parses and indexes them into your private knowledge base.
- Google Identity Platform — Replace identity-key auth with Google Sign-In for seamless mobile login.
- Real-device accessibility — VoiceOver and TalkBack validation on physical iOS and Android devices.
- Community recipes — Let users publish and discover recipes from other Chefeze cooks.
Log in or sign up for Devpost to join the conversation.