Menu To Food Tour

Menu-to-food-tour Multi-Agent AI Experience
Deployment
Sequence Diagram

Inspiration

We've all been there — sitting in a restaurant in a foreign country, staring at a menu we can't read. Google Translate gives you a flat, literal translation that strips away everything that makes food exciting. "Tonkotsu Ramen" becomes "pork bone noodle soup." Accurate? Sure. But it tells you nothing about the 12-hour simmered broth born in Fukuoka's late-night street stalls, or how to actually pronounce what you're about to order.

We realized that eating abroad isn't a translation problem — it's a storytelling problem. Travelers don't just need words converted; they need a food tour guide in their pocket. Someone who explains the history, shows them what the dish looks like, and teaches them how to say it without embarrassing themselves.

When we saw that Google ADK supports multi-agent orchestration with ParallelAgent, the idea clicked: what if we built a team of specialized AI agents — one that reads the menu, one that tells the story, one that generates the photo, one that checks allergens, and one that teaches pronunciation — all working simultaneously? Not a single monolithic prompt, but a coordinated squad of experts, each doing what they do best, streaming results to the user the instant they're ready.

That's when the Menu-to-Food-Tour Converter was born.

What it does

The Menu-to-Food-Tour Converter transforms a restaurant menu photo into an immersive, multi-sensory culinary experience.

The user flow is dead simple:

Snap or upload a photo of any restaurant menu (any language).
Watch the magic happen — dish cards animate onto the screen in real time via SSE streaming, each one progressively filling with:
- A cultural narrative explaining the dish's history and significance (grounded with Google Search)
- An AI-generated photorealistic food photograph (Gemini image generation)
- A pronunciation audio guide with native-language speech (Gemini TTS)
- An allergy safety check with dietary flags
Explore and learn — tap any dish to hear the pronunciation, read the story, check allergens, and chat with the AI about any question.

What happens behind the scenes

Six AI agents work together through Google ADK:

VisionAgent — reads every dish from the menu photo using Gemini's multimodal vision
StoryAgent — generates cultural narratives and English translations with Google Search grounding (no hallucinations)
AllergyAgent — runs real-time allergen detection with dietary flag analysis (vegetarian, vegan, gluten-free, nut-free, etc.)
ImageAgent — creates professional food photography using Gemini image generation (Imagen 3)
AudioAgent — produces native-language pronunciation guides using Gemini TTS (gemini-2.5-flash-preview-tts)
ChatAgent — conversational Q&A about any dish with ADK session memory

The story, image, and audio agents run in parallel for each dish via ADK's ParallelAgent. Results stream to the browser via Server-Sent Events — the fastest agent's output appears first, so the user never waits for the slowest one.

How we built it

Architecture

We built a modular multi-agent system using Google ADK (Agent Development Kit) in Python, scaffolded with agent-starter-pack:

Menu to Food Tour - Architecture Diagram

Key Technical Decisions

Google ADK's ParallelAgent — Instead of writing custom async orchestration, we used ADK's native ParallelAgent to run story/image/audio generation concurrently. This gave us true parallelism with clean agent isolation.
Google Search Grounding — StoryAgent narratives are grounded with real-time Google Search data, ensuring cultural facts are accurate and preventing hallucinations.
Structured Output with Pydantic — Every agent produces typed, structured data through function tools. The LLM calls the tool; the tool returns clean, predictable JSON every time.
SSE Real-Time Streaming — Results are pushed to the frontend via Server-Sent Events as soon as each agent finishes. The user sees stories appear, then images load, then audio becomes available — all progressively.
Zero API Keys in Production — The Cloud Run service account provides OAuth2 tokens automatically via google.auth.default(). No API keys to rotate or leak.
Resilient Model Fallback — ImageAgent races multiple Gemini image models to bypass rate limits. All agents have retry logic with exponential backoff.
ADK Chat Sessions — ChatAgent uses ADK's Runner with InMemorySessionService for conversational memory. Users can ask follow-up questions about any dish.

Tech Stack

Layer	Technology
Agent Framework	Google ADK (google-adk) with Agent, ParallelAgent
LLM	Gemini 3 Flash Preview (gemini-3-flash-preview) on Vertex AI
Image Generation	Gemini Image Generation (Imagen 3)
Text-to-Speech	Gemini TTS (gemini-2.5-flash-preview-tts)
Search Grounding	Google Search via Gemini
Backend	FastAPI + Uvicorn, Python 3.11
Frontend	Next.js 16 + React 19 + TypeScript + Tailwind CSS + Framer Motion
Deployment	Google Cloud Run (backend + frontend)
Auth	Identity-Aware Proxy (IAP)
Observability	OpenTelemetry + Cloud Trace + Cloud Logging + BigQuery
CI/CD	Cloud Build (PR checks, staging deploy, prod deploy)
Infrastructure	Terraform + agent-starter-pack
Package Manager	uv (Python), npm (Node.js)

Google Cloud Services Used

Service	Purpose
Cloud Run	Hosts both backend (FastAPI) and frontend (Next.js) as serverless containers
Vertex AI	Gemini 3 Flash model inference, image generation, TTS
Cloud Build	CI/CD pipeline — PR checks, staging deploy, production deploy with approval gates
Artifact Registry	Docker image storage
Cloud Trace	Distributed tracing across agent calls
Cloud Logging	Structured logging with GenAI log sinks
BigQuery	Telemetry data warehouse (completions, latency, costs)
Cloud Storage	Artifact storage (generated images/audio), log exports
IAP	Identity-Aware Proxy for Google-managed authentication
Secret Manager	Secure credential management

Cloud Deployment

Backend Deployment (Cloud Run)

cd my-food-tour

gcloud run deploy food-tour-backend \
  --source . \
  --region us-central1 \
  --memory 4Gi \
  --cpu 4 \
  --min-instances 1 \
  --max-instances 10 \
  --service-account food-tour-app@hackathon-demo.iam.gserviceaccount.com \
  --no-allow-unauthenticated \
  --no-cpu-throttling \
  --session-affinity \
  --set-env-vars "GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_LOCATION=global,MODEL=gemini-3-flash-preview"

Service Name: food-tour-backend
Region: us-central1
URL: https://food-tour-backend-demo.us-central1.run.app
Auth: IAP (Identity-Aware Proxy) — Google-managed login wall, no code changes needed
Dockerfile: Python 3.11 slim + uv package manager

Frontend Deployment (Cloud Run)

cd my-food-tour/frontend

gcloud run deploy food-tour-frontend \
  --source . \
  --region us-central1 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 5 \
  --allow-unauthenticated \
  --set-env-vars "NEXT_PUBLIC_API_URL=$BACKEND_URL" \
  --set-build-env-vars "NEXT_PUBLIC_API_URL=$BACKEND_URL"

Service Name: food-tour-frontend
Region: us-central1
URL: https://food-tour-frontend-demo.us-central1.run.app
Dockerfile: Node 20 slim, Next.js standalone output
CORS: Backend updated with ALLOW_ORIGINS pointing to frontend URL

Infrastructure as Code

The project includes Terraform configs in deployment/terraform/ and Cloud Build pipelines in .cloudbuild/:

Pipeline: git push -> PR Checks -> Merge -> Staging Deploy -> Load Test -> Prod Deploy [APPROVAL]

Verified Endpoints

Endpoint	Status	Response
`GET /health`	200 OK	`{"status":"ok","agents":6,"model":"gemini-3-flash-preview"}`
`POST /api/story`	200 OK	Cultural narrative + translation + cuisine origin
`POST /api/allergy-check`	200 OK	Allergens, dietary flags, safety level
`POST /api/chat`	200 OK	Conversational response with session memory
`POST /api/audio`	200 OK	Pronunciation audio (Gemini TTS)
`POST /api/tour/json`	200 OK	Full SSE streaming pipeline
`POST /feedback`	200 OK	Feedback collection (Cloud Logging)

Challenges we ran into

1. Getting ParallelAgent Context Right

ADK's ParallelAgent runs all sub-agents simultaneously, but each sub-agent needs to know which dish to process. We learned that ADK passes the full conversation history to each sub-agent, so the instructions needed to clearly tell each agent to focus on the most recently mentioned dish.

2. Rate Limiting with Parallel Execution

Running 3 agents in parallel means 3 simultaneous Gemini API calls per dish. We solved this by:

Implementing resilient model fallback — ImageAgent races multiple Gemini image models
Auto-retry with exponential backoff on all agents
Using Vertex AI with service accounts (higher quotas than API key access)
Caching responses to avoid redundant calls

3. Gemini Image Generation in Preview

Gemini image generation is in Preview and can be unpredictable. We built a multi-model race strategy where ImageAgent tries multiple models and returns the first successful result. This dramatically improved reliability.

4. SSE Streaming Through Cloud Run + IAP

Getting Server-Sent Events to work through IAP required specific headers (X-Accel-Buffering: no, Cache-Control: no-cache) and session affinity on Cloud Run to prevent connection drops during long-running tour generation.

5. Zero API Keys Architecture

Moving from API key auth to service account auth required restructuring how all agents initialize their Gemini clients. We used google.auth.default() which auto-detects credentials on Cloud Run but falls back to .env API keys for local development.

Accomplishments that we're proud of

1. True Multi-Agent Parallelism with Real Media Generation

This isn't a chain of sequential prompts pretending to be "agents." We built a genuine multi-agent system using ADK's ParallelAgent where story generation, Imagen 3 food photography, and Gemini TTS pronunciation audio all happen concurrently for each dish.

2. Five Modalities from One Photo

From a single menu photo input, we generate: text narratives, food photography images, pronunciation audio, allergy analysis, and interactive chat — five distinct modalities, all powered by Gemini.

3. Production-Grade Architecture

This isn't a hackathon demo that only works locally. It's deployed on Cloud Run with IAP authentication, OpenTelemetry tracing, BigQuery telemetry, CI/CD pipelines, Terraform infrastructure, and auto-scaling. The same architecture scales directly to production.

4. Google Search Grounding = No Hallucinations

StoryAgent uses Google Search grounding to ensure every cultural narrative is factually accurate. We don't just trust the LLM's training data — we verify it with real-time web data.

5. $0.013 Per Tour

With Gemini 3 Flash on Vertex AI, each full food tour (6 agents, multiple dishes) costs approximately $0.013. That's production-viable economics for a consumer app.

What we learned

About Google ADK

ADK's agent types are composable. ParallelAgent nests inside Agent sub-agent trees, enabling complex workflows with minimal code.
transfer_to_agent handles context automatically. ADK's transfer mechanism carries the full conversation context.
Function tools are the key to reliable output. LLMs are creative; tools enforce structure. The combination is powerful.
ADK Sessions enable stateful conversations. The Runner + InMemorySessionService pattern gives ChatAgent conversation memory for free.

About Multi-Agent Design

Single responsibility per agent works. Each agent does one thing. StoryAgent doesn't generate images. VisionAgent doesn't tell stories. Debugging is straightforward.
Parallelism is about UX, not just speed. Running agents in parallel with SSE streaming fundamentally changes the user experience. Results appear progressively.
Shared configuration prevents drift. All agents read model from config.py. One change updates everything.

About Google Cloud

Service accounts eliminate API key management. google.auth.default() + Cloud Run metadata server = auto-rotating credentials.
IAP is the easiest auth. Zero code changes — just enable IAP and users must sign in with Google.
agent-starter-pack accelerates everything. The scaffolding gave us Terraform, CI/CD, telemetry, and deployment configs out of the box.

What's next for food-tour

Immediate Improvements

AR overlay — point your camera at a menu and see dish photos overlaid on each item
Dietary filtering — "show me only vegetarian dishes" as a natural language filter
Offline mode with Gemini Nano — on-device text processing for areas with poor connectivity
Social sharing — share your food tour as a visual story card

Production Features

Session persistence with Cloud SQL — save and revisit past food tours
Multi-language UI — the app itself in the traveler's language
Custom domain with Cloud DNS and Cloud Armor (WAF + DDoS protection)
Restaurant partnerships — restaurants can enhance their menus with AI-generated content