Inspiration

We've all been there — sitting in a restaurant in a foreign country, staring at a menu we can't read. Google Translate gives you a flat, literal translation that strips away everything that makes food exciting. "Tonkotsu Ramen" becomes "pork bone noodle soup." Accurate? Sure. But it tells you nothing about the 12-hour simmered broth born in Fukuoka's late-night street stalls, or how to actually pronounce what you're about to order.

We realized that eating abroad isn't a translation problem — it's a storytelling problem. Travelers don't just need words converted; they need a food tour guide in their pocket. Someone who explains the history, shows them what the dish looks like, and teaches them how to say it without embarrassing themselves.

When we saw that Google ADK supports multi-agent orchestration with ParallelAgent, the idea clicked: what if we built a team of specialized AI agents — one that reads the menu, one that tells the story, one that generates the photo, one that checks allergens, and one that teaches pronunciation — all working simultaneously? Not a single monolithic prompt, but a coordinated squad of experts, each doing what they do best, streaming results to the user the instant they're ready.

That's when the Menu-to-Food-Tour Converter was born.

What it does

The Menu-to-Food-Tour Converter transforms a restaurant menu photo into an immersive, multi-sensory culinary experience.

The user flow is dead simple:

  1. Snap or upload a photo of any restaurant menu (any language).
  2. Watch the magic happen — dish cards animate onto the screen in real time via SSE streaming, each one progressively filling with:
    • A cultural narrative explaining the dish's history and significance (grounded with Google Search)
    • An AI-generated photorealistic food photograph (Gemini image generation)
    • A pronunciation audio guide with native-language speech (Gemini TTS)
    • An allergy safety check with dietary flags
  3. Explore and learn — tap any dish to hear the pronunciation, read the story, check allergens, and chat with the AI about any question.

What happens behind the scenes

Six AI agents work together through Google ADK:

  • VisionAgent — reads every dish from the menu photo using Gemini's multimodal vision
  • StoryAgent — generates cultural narratives and English translations with Google Search grounding (no hallucinations)
  • AllergyAgent — runs real-time allergen detection with dietary flag analysis (vegetarian, vegan, gluten-free, nut-free, etc.)
  • ImageAgent — creates professional food photography using Gemini image generation (Imagen 3)
  • AudioAgent — produces native-language pronunciation guides using Gemini TTS (gemini-2.5-flash-preview-tts)
  • ChatAgent — conversational Q&A about any dish with ADK session memory

The story, image, and audio agents run in parallel for each dish via ADK's ParallelAgent. Results stream to the browser via Server-Sent Events — the fastest agent's output appears first, so the user never waits for the slowest one.

How we built it

Architecture

We built a modular multi-agent system using Google ADK (Agent Development Kit) in Python, scaffolded with agent-starter-pack:

Menu to Food Tour - Architecture Diagram

Key Technical Decisions

  • Google ADK's ParallelAgent — Instead of writing custom async orchestration, we used ADK's native ParallelAgent to run story/image/audio generation concurrently. This gave us true parallelism with clean agent isolation.
  • Google Search Grounding — StoryAgent narratives are grounded with real-time Google Search data, ensuring cultural facts are accurate and preventing hallucinations.
  • Structured Output with Pydantic — Every agent produces typed, structured data through function tools. The LLM calls the tool; the tool returns clean, predictable JSON every time.
  • SSE Real-Time Streaming — Results are pushed to the frontend via Server-Sent Events as soon as each agent finishes. The user sees stories appear, then images load, then audio becomes available — all progressively.
  • Zero API Keys in Production — The Cloud Run service account provides OAuth2 tokens automatically via google.auth.default(). No API keys to rotate or leak.
  • Resilient Model Fallback — ImageAgent races multiple Gemini image models to bypass rate limits. All agents have retry logic with exponential backoff.
  • ADK Chat Sessions — ChatAgent uses ADK's Runner with InMemorySessionService for conversational memory. Users can ask follow-up questions about any dish.

Tech Stack

Layer Technology
Agent Framework Google ADK (google-adk) with Agent, ParallelAgent
LLM Gemini 3 Flash Preview (gemini-3-flash-preview) on Vertex AI
Image Generation Gemini Image Generation (Imagen 3)
Text-to-Speech Gemini TTS (gemini-2.5-flash-preview-tts)
Search Grounding Google Search via Gemini
Backend FastAPI + Uvicorn, Python 3.11
Frontend Next.js 16 + React 19 + TypeScript + Tailwind CSS + Framer Motion
Deployment Google Cloud Run (backend + frontend)
Auth Identity-Aware Proxy (IAP)
Observability OpenTelemetry + Cloud Trace + Cloud Logging + BigQuery
CI/CD Cloud Build (PR checks, staging deploy, prod deploy)
Infrastructure Terraform + agent-starter-pack
Package Manager uv (Python), npm (Node.js)

Google Cloud Services Used

Service Purpose
Cloud Run Hosts both backend (FastAPI) and frontend (Next.js) as serverless containers
Vertex AI Gemini 3 Flash model inference, image generation, TTS
Cloud Build CI/CD pipeline — PR checks, staging deploy, production deploy with approval gates
Artifact Registry Docker image storage
Cloud Trace Distributed tracing across agent calls
Cloud Logging Structured logging with GenAI log sinks
BigQuery Telemetry data warehouse (completions, latency, costs)
Cloud Storage Artifact storage (generated images/audio), log exports
IAP Identity-Aware Proxy for Google-managed authentication
Secret Manager Secure credential management

Cloud Deployment

Backend Deployment (Cloud Run)

cd my-food-tour

gcloud run deploy food-tour-backend \
  --source . \
  --region us-central1 \
  --memory 4Gi \
  --cpu 4 \
  --min-instances 1 \
  --max-instances 10 \
  --service-account food-tour-app@hackathon-demo.iam.gserviceaccount.com \
  --no-allow-unauthenticated \
  --no-cpu-throttling \
  --session-affinity \
  --set-env-vars "GOOGLE_GENAI_USE_VERTEXAI=TRUE,GOOGLE_CLOUD_LOCATION=global,MODEL=gemini-3-flash-preview"
  • Service Name: food-tour-backend
  • Region: us-central1
  • URL: https://food-tour-backend-demo.us-central1.run.app
  • Auth: IAP (Identity-Aware Proxy) — Google-managed login wall, no code changes needed
  • Dockerfile: Python 3.11 slim + uv package manager

Frontend Deployment (Cloud Run)

cd my-food-tour/frontend

gcloud run deploy food-tour-frontend \
  --source . \
  --region us-central1 \
  --memory 512Mi \
  --cpu 1 \
  --min-instances 0 \
  --max-instances 5 \
  --allow-unauthenticated \
  --set-env-vars "NEXT_PUBLIC_API_URL=$BACKEND_URL" \
  --set-build-env-vars "NEXT_PUBLIC_API_URL=$BACKEND_URL"
  • Service Name: food-tour-frontend
  • Region: us-central1
  • URL: https://food-tour-frontend-demo.us-central1.run.app
  • Dockerfile: Node 20 slim, Next.js standalone output
  • CORS: Backend updated with ALLOW_ORIGINS pointing to frontend URL

Infrastructure as Code

The project includes Terraform configs in deployment/terraform/ and Cloud Build pipelines in .cloudbuild/:

Pipeline: git push -> PR Checks -> Merge -> Staging Deploy -> Load Test -> Prod Deploy [APPROVAL]

Verified Endpoints

Endpoint Status Response
GET /health 200 OK {"status":"ok","agents":6,"model":"gemini-3-flash-preview"}
POST /api/story 200 OK Cultural narrative + translation + cuisine origin
POST /api/allergy-check 200 OK Allergens, dietary flags, safety level
POST /api/chat 200 OK Conversational response with session memory
POST /api/audio 200 OK Pronunciation audio (Gemini TTS)
POST /api/tour/json 200 OK Full SSE streaming pipeline
POST /feedback 200 OK Feedback collection (Cloud Logging)

Challenges we ran into

1. Getting ParallelAgent Context Right

ADK's ParallelAgent runs all sub-agents simultaneously, but each sub-agent needs to know which dish to process. We learned that ADK passes the full conversation history to each sub-agent, so the instructions needed to clearly tell each agent to focus on the most recently mentioned dish.

2. Rate Limiting with Parallel Execution

Running 3 agents in parallel means 3 simultaneous Gemini API calls per dish. We solved this by:

  • Implementing resilient model fallback — ImageAgent races multiple Gemini image models
  • Auto-retry with exponential backoff on all agents
  • Using Vertex AI with service accounts (higher quotas than API key access)
  • Caching responses to avoid redundant calls

3. Gemini Image Generation in Preview

Gemini image generation is in Preview and can be unpredictable. We built a multi-model race strategy where ImageAgent tries multiple models and returns the first successful result. This dramatically improved reliability.

4. SSE Streaming Through Cloud Run + IAP

Getting Server-Sent Events to work through IAP required specific headers (X-Accel-Buffering: no, Cache-Control: no-cache) and session affinity on Cloud Run to prevent connection drops during long-running tour generation.

5. Zero API Keys Architecture

Moving from API key auth to service account auth required restructuring how all agents initialize their Gemini clients. We used google.auth.default() which auto-detects credentials on Cloud Run but falls back to .env API keys for local development.

Accomplishments that we're proud of

1. True Multi-Agent Parallelism with Real Media Generation

This isn't a chain of sequential prompts pretending to be "agents." We built a genuine multi-agent system using ADK's ParallelAgent where story generation, Imagen 3 food photography, and Gemini TTS pronunciation audio all happen concurrently for each dish.

2. Five Modalities from One Photo

From a single menu photo input, we generate: text narratives, food photography images, pronunciation audio, allergy analysis, and interactive chat — five distinct modalities, all powered by Gemini.

3. Production-Grade Architecture

This isn't a hackathon demo that only works locally. It's deployed on Cloud Run with IAP authentication, OpenTelemetry tracing, BigQuery telemetry, CI/CD pipelines, Terraform infrastructure, and auto-scaling. The same architecture scales directly to production.

4. Google Search Grounding = No Hallucinations

StoryAgent uses Google Search grounding to ensure every cultural narrative is factually accurate. We don't just trust the LLM's training data — we verify it with real-time web data.

5. $0.013 Per Tour

With Gemini 3 Flash on Vertex AI, each full food tour (6 agents, multiple dishes) costs approximately $0.013. That's production-viable economics for a consumer app.

What we learned

About Google ADK

  • ADK's agent types are composable. ParallelAgent nests inside Agent sub-agent trees, enabling complex workflows with minimal code.
  • transfer_to_agent handles context automatically. ADK's transfer mechanism carries the full conversation context.
  • Function tools are the key to reliable output. LLMs are creative; tools enforce structure. The combination is powerful.
  • ADK Sessions enable stateful conversations. The Runner + InMemorySessionService pattern gives ChatAgent conversation memory for free.

About Multi-Agent Design

  • Single responsibility per agent works. Each agent does one thing. StoryAgent doesn't generate images. VisionAgent doesn't tell stories. Debugging is straightforward.
  • Parallelism is about UX, not just speed. Running agents in parallel with SSE streaming fundamentally changes the user experience. Results appear progressively.
  • Shared configuration prevents drift. All agents read model from config.py. One change updates everything.

About Google Cloud

  • Service accounts eliminate API key management. google.auth.default() + Cloud Run metadata server = auto-rotating credentials.
  • IAP is the easiest auth. Zero code changes — just enable IAP and users must sign in with Google.
  • agent-starter-pack accelerates everything. The scaffolding gave us Terraform, CI/CD, telemetry, and deployment configs out of the box.

What's next for food-tour

Immediate Improvements

  • AR overlay — point your camera at a menu and see dish photos overlaid on each item
  • Dietary filtering — "show me only vegetarian dishes" as a natural language filter
  • Offline mode with Gemini Nano — on-device text processing for areas with poor connectivity
  • Social sharing — share your food tour as a visual story card

Production Features

  • Session persistence with Cloud SQL — save and revisit past food tours
  • Multi-language UI — the app itself in the traveler's language
  • Custom domain with Cloud DNS and Cloud Armor (WAF + DDoS protection)
  • Restaurant partnerships — restaurants can enhance their menus with AI-generated content

Built With

  • agent-starter-pack
  • artifactregistry
  • bigquery
  • cloudbuild
  • cloudlogging
  • cloudrun
  • cloudstorage
  • cloudtrace
  • docker
  • fastapi
  • gemini-2.5-flash-preview-tts
  • gemini-3-flash-preview
  • google-adk
  • imagen
  • next.js
  • oauth2
  • opentelemetry
  • pydantic
  • python
  • react
  • secretmanager
  • server-sent-event
  • tailwind
  • terraform
  • typescript
  • uvicorn
  • vertexai
Share this project:

Updates