Inspiration

Every home cook knows the frustration: hands covered in flour, a sauce reducing on the stove, and the recipe you need is locked behind a screen you can't touch. Traditional cooking apps demand constant visual attention and manual interaction — exactly what you can't give when you're actually cooking.

We asked ourselves: what if your kitchen had an AI sous-chef that could listen to you, see what you're doing, know what's in your pantry, respect your allergies, and guide you step by step — all without you ever touching your phone?

That question became Chefeze — a real-time, voice-first cooking copilot powered by Gemini Live and Google ADK. The name blends "chef" with "eze" (ease), capturing the mission: make cooking with AI genuinely effortless.

Three things drove the design from day one:

  1. Hands-free is non-negotiable. A cooking assistant you have to type into isn't solving the real problem.
  2. Grounded, not hallucinated. Recipe suggestions must come from real data — actual pantry contents, verified allergen databases, real ingredient prices — not fabricated convenience.
  3. Trust through transparency. When the agent warns you about cross-contamination or tells you a dish is over budget, it must show its evidence, not just assert confidence.

What it does

Chefeze is a multimodal live cooking agent that operates through voice and vision in real time:

  • Talk naturally — Push-to-Talk or fully hands-free Cook Mode with barge-in (interrupt the AI mid-sentence when your timer goes off).
  • Show your kitchen — Point your camera at your pantry for instant ingredient extraction, or enable Cook Mode's continuous camera stream where the AI monitors your cooking for safety hazards (dangerous temperatures, cross-contamination, knife safety) at up to 2 frames per second.
  • Get grounded answers — Every recipe suggestion passes through a 5-step hybrid GraphRAG pipeline: vector search (pgvector HNSW), full-text search (tsvector + trigram), knowledge graph expansion (ltree, 1-2 hops), cross-retriever fusion (Reciprocal Rank Fusion, \(k=60\)), and a confidence gate (\(\theta = 0.15\)) — all running on PostgreSQL 17.
  • Stay safe — Real-time allergen detection across 14 EU allergen categories using Open Food Facts data, with STOP/WARN/TIP alert levels. A dedicated food safety sentinel checks internal temperatures, danger zone time limits, and cross-contamination risks.
  • Play the Budget Challenge — Set a budget and number of people, and the AI gamifies meal planning with real ingredient prices from the Open Prices API. Earn badges like under_budget, zero_waste, or pantry_master.
  • See your food before you cook it — Gemini generates photorealistic hero images and illustrated step cards, uploaded to Cloud Storage and displayed inline.
  • Save, remix, and fork recipes — Build a personal cookbook. Remix any recipe into a variant (e.g., "make it vegetarian") with visible lineage and a "View original" link.
  • Cook in your language — Full 4-language support (English, Portuguese BR, Portuguese PT, Spanish) across voice, UI, units, and currency — with 775 i18n keys per locale.

How we built it

Architecture

Chefeze is a monorepo with three main layers:

┌─────────────────────────────────────────────────┐
│  PWA / Android (Ionic React + Capacitor + Vite) │
│  Audio capture (16kHz PCM) · Camera frames      │
│  14-type card renderer · Audio visualizer orb   │
└───────────────────┬─────────────────────────────┘
                    │ WebSocket
┌───────────────────▼─────────────────────────────┐
│  FastAPI WebSocket Gateway (state machine)      │
│  Resume tokens · Sequence validation · Barge-in │
│  Session compression at 70% context capacity    │
└───────────────────┬─────────────────────────────┘
                    │ ADK streaming bridge
┌───────────────────▼─────────────────────────────┐
│  Chef Agent (Google ADK Hub-Specialist Pattern) │
│                                                 │
│  ┌──────────┐ ┌──────────┐ ┌──────────────────┐│
│  │ Safety   │ │ Budget   │ │ Creative         ││
│  │Specialist│ │Specialist│ │Specialist        ││
│  └──────────┘ └──────────┘ └──────────────────┘│
│  ┌──────────┐ ┌──────────┐                     │
│  │Retrieval │ │  Game    │   9 ADK Tools       │
│  │Specialist│ │ Master   │   55 Cuisine Skills │
│  └──────────┘ └──────────┘                     │
└───────────────────┬─────────────────────────────┘
                    │
┌───────────────────▼─────────────────────────────┐
│  Data Layer                                     │
│  PostgreSQL 17 (pgvector + ltree + pg_trgm)     │
│  Redis 7.4 (sessions, rate limits, flags)       │
│  Google Cloud Storage (images)                  │
│  MCP: Open Food Facts · Open Prices             │
└─────────────────────────────────────────────────┘

Multi-Agent Orchestration

The core of Chefeze is a hub-specialist agent pattern built on Google ADK. A coordinator agent receives every user turn and routes it to the right specialist using confidence-scored intent detection:

Specialist Responsibility Key Tools
Safety Allergen checks, food safety, cross-contamination check_allergens, food_safety_sentinel
Budget Cost estimation, budget challenge scoring estimate_cost (MCP Open Prices)
Creative Recipe composition, image generation compose_recipe, generate_images
Retrieval Knowledge base search, recipe discovery retrieve_recipes (GraphRAG)
Game Master Challenge gamification, badge scoring score_challenge, ui_action_plan

Intent routing uses multi-signal confidence scoring with ambiguity detection (threshold 0.35, gap 0.10). Safety always gets priority weight \(1.5\times\) — because catching a peanut allergy matters more than suggesting a garnish.

The GraphRAG Pipeline

Recipe retrieval is a 5-step hybrid pipeline, not a single vector lookup:

  1. Query Normalization — Keyword-based intent extraction (diet, allergens, equipment, budget, cuisine) across 4 languages. No LLM call needed.
  2. Embeddinggemini-embedding-2-preview (3072-dim, multimodal). Content-hash cache prevents redundant API calls.
  3. Vector Retrieval — HNSW cosine search over pgvector halfvec(3072), top-40 candidates, similarity threshold \(\geq 0.30\).
  4. Lexical Retrieval — Dual-path: PostgreSQL tsvector full-text search + pg_trgm trigram similarity (\(> 0.3\)), both unaccent-normalized for diacritics. Run in parallel, union-deduplicated.
  5. Graph Expansion — 1-2 hop traversal of an ontology graph (ingredients, substitutions, cuisines, techniques, allergen flags, diet tags) using ltree paths.
  6. Scoring Fusion — Reciprocal Rank Fusion (\(k=60\)) across retrieval channels, then:

$$ \text{score} = 0.40 \cdot s_{\text{vec}} + 0.15 \cdot s_{\text{lex}} + 0.20 \cdot c_{\text{pantry}} - 0.10 \cdot p_{\text{cost}} - 0.05 \cdot p_{\text{equip}} + 0.10 \cdot b_{\text{personal}} $$

Allergen-matched recipes are hard-blocked (\(\text{score} = -\infty\)). A confidence gate at \(\theta = 0.15\) filters noise.

Gemini Models Used

Purpose Model
Live voice streaming gemini-2.5-flash-native-audio-preview
Text/vision reasoning gemini-3-flash-preview
Multimodal embeddings (3072-dim) gemini-embedding-2-preview
Image generation (hero) gemini-3-pro-image-preview
Image generation (steps) gemini-2.5-flash-image
Cook mode frame safety gemini-3-flash-preview (vision)

Privacy and Safety

  • Consent-gated memory: user preferences, allergies, and taste profiles are only persisted when the user opts in. Three consent levels: private (session-only), personalized (turns persisted), contribute (full memory graph).
  • Guardrail engine: ALLOW / REDACT / REWRITE / BLOCK / ESCALATE policies on both input and output, enforced via ADK callbacks.
  • Semantic log redaction: 17 sensitive field patterns globally redacted from structured logs.
  • GDPR deletion: DELETE /auth/me with FK-ordered cascade across 6 child tables.

Testing and Quality

We invested heavily in layered validation:

Layer Count What it proves
Backend unit/integration 3,594 Domain logic, DB writes, WS auth, GraphRAG pipeline, tool side effects
Frontend unit/component 1,942 Component behavior, hooks, state, accessibility (axe-core)
Ingestion pipeline 110 Embedding, seeding, ontology graph correctness
E2E (Playwright) 114 Cross-boundary journeys: login → live → pantry → recipe → challenge → safety
Security/abuse 40+ Auth bypass, BOLA/IDOR, brute force, prompt injection, secret scanning
i18n parity 775 × 4 Key coverage across all 4 locales

Total: 5,760+ automated tests, with a 75% backend coverage gate enforced in CI.

Challenges we ran into

1. Real-time interruption safety. When a user says "wait, stop" while the agent is mid-sentence describing a recipe, you need true barge-in — not a polite queue. We implemented server-owned audio queues with drop-oldest backpressure (4 bounded asyncio queues: audio_in=32, audio_out=64, control=32, tool_results=16) and a WebSocket state machine that handles interruption as a first-class event, not an edge case.

2. Grounding vs. hallucination in live voice. It's tempting to let the LLM freestyle recipe suggestions. But a cooking copilot that invents ingredients you don't have or misses your peanut allergy is worse than useless — it's dangerous. Building the full GraphRAG pipeline with allergen hard-blocks and pantry coverage scoring was the hardest engineering investment, but it's what makes the agent trustworthy.

3. Vision at kitchen speed. Cook Mode captures camera frames at up to 2 FPS for safety analysis. But network hiccups, slow model responses, and frame backlogs can cascade. We added a circuit breaker (3 failures → open), rate limiting, a vision queue with max depth 8 and drop-oldest policy, and per-session failure tracking. The system degrades gracefully instead of crashing.

4. Multilingual voice + UI + data consistency. Supporting 4 languages isn't just translating strings — it means locale-aware voice prompts, unit conversions (metric/imperial), currency formatting, recipe retrieval fallback chains (requested locale → en → any), and unaccent-normalized search across Portuguese diacritics. We wrote 775 i18n keys per locale and enforced parity in CI.

5. Privacy-safe observability. We needed operator telemetry (session durations, tool call rates, error patterns) without leaking user content into logs. The solution was semantic-key global redaction in structlog (17 field patterns), protected /metrics behind admin authentication, and span attribute scrubbing before OpenTelemetry export.

Accomplishments that we're proud of

  • A working live voice agent that you can actually interrupt, that sees your kitchen, and that grounds every suggestion in real pantry and allergen data — not a demo with pre-recorded responses.
  • 5,760+ tests with zero mocks at the integration layer — every DB write, WebSocket handshake, and GraphRAG query hits real PostgreSQL and Redis.
  • 14 allergen safety regression tests covering edge cases like casein detection, cross-contamination warnings, and multilingual allergen names.
  • The scoring fusion formula that balances vector similarity, lexical relevance, pantry coverage, cost penalty, equipment penalty, and personalization — making recipe suggestions feel genuinely tailored.
  • 55 cuisine skills with structured flavor principles, safety notes, and guardrails — from Brazilian to Japanese to Mediterranean — that activate progressively per session.

What we learned

  1. Real-time agent quality comes from state handling, not prompt engineering. The difference between a demo and a product is how the agent behaves when the WebSocket drops, the user interrupts, or the MCP circuit breaker opens. We spent more time on retry/offline/interruption logic than on prompt tuning.

  2. Specialist agents work best when the live bridge owns state and boundaries. Letting each specialist run independently caused chaos. The hub-specialist pattern with explicit tool allowlists, turn limits, and timeout budgets per specialist made the system predictable.

  3. Compose + targeted browser proof beats broad mock-heavy confidence. Running tests inside Docker Compose against real PostgreSQL and Redis caught bugs that mocked tests would have missed — including a migration that worked in SQLite but failed on real Postgres.

  4. The last-mile hackathon risk is proof packaging, not missing features. We had a working product days before the deadline. The hard part was capturing evidence, recording the demo, and making the submission tell a coherent story.

  5. Privacy and safety aren't features you bolt on — they're architecture decisions. Consent-gated memory, semantic log redaction, and allergen hard-blocks had to be designed into the data model and agent routing from the start. Retrofitting them would have been a rewrite.

What's next for Chefeze

  • Pantry intelligence — Auto-compose meals from what you already have, with expiration-aware prioritization.
  • Personal recipe library — Upload your own PDFs and handwritten recipe photos; the agent parses and indexes them into your private knowledge base.
  • Google Identity Platform — Replace identity-key auth with Google Sign-In for seamless mobile login.
  • Real-device accessibility — VoiceOver and TalkBack validation on physical iOS and Android devices.
  • Community recipes — Let users publish and discover recipes from other Chefeze cooks.

Built With

Share this project:

Updates