Inspiration

Food logging has been stuck in the spreadsheet era. Search a database, pick an entry, estimate a portion, repeat. It reduces one of the most shared human experiences - eating together - into a solo data-entry chore. Most people quit within two weeks.

We didn't want to build a better calorie counter. We wanted to build a live food log - something that captures meals as they happen, sees what you see, hears what you say, and works whether you're eating alone or with your whole family.

Gemini's native audio model made this possible for the first time. A single model that processes camera frames and voice simultaneously, in real-time, and calls structured tools to log what you're eating. No typing, no searching, no after-the-fact data entry. Just point, talk, eat.

What it does

FoodLog is a multimodal food logging app powered by a Gemini Live agent called Chef.

The core flow:

  1. Sign in with Google, tap the capture button, and point your camera at your meal
  2. Describe what you're eating in natural speech - "That's a chicken Caesar salad with extra parmesan"
  3. Chef sees the food through your camera and hears your description simultaneously
  4. When Chef has enough context, it calls the log_food_event tool to create a structured food entry with identified items, meal type, estimated nutrition, and optional location
  5. The food event appears in your Track Log with a summary, nutrition card, and timestamp
  6. Events persist in Firestore, scoped to your authenticated account

Five ADK tools:

  • log_food_event - log a meal from what the agent sees and hears
  • get_daily_summary - answer "What did I eat today?" from Firestore data
  • edit_food_event - voice corrections ("remove the fries", "replace white rice with brown")
  • lookup_nutrition - USDA FoodData Central lookup for grounded calorie/macro data
  • lookup_location - Google Places nearby search for restaurant identification

Two distinct personas:

  • Chef Dee - direct capture, protein-focused kitchen commander. Fast, one question max.
  • Chef Jay - guided capture, curious sous chef. Step-by-step, asks about prep methods.

Beyond the basics:

  • Interruption handling - speak over Chef mid-response and the system cancels server-side generation, then resumes listening
  • Session context - today's meal history is injected at session start so Chef knows what you've already eaten
  • Nutritional coaching - a one-liner insight after every logged meal
  • Image-only nudge - if you point the camera with no audio for 5 seconds, Chef prompts you
  • Text input + quick action chips - type during live sessions or tap quick actions
  • Voice-only capture - works without a camera
  • Offline capture - if the server is unreachable, audio and camera frames are bundled locally and batch-processed when connectivity returns
  • Adaptive image throttle - camera frame rate adjusts based on network latency (1-5s between frames with backoff/recovery)
  • Content safety - a regex-based content safety filter runs on all Gemini responses before they reach the user, backed by Gemini's built-in safety settings

How we built it

Architecture: Thin Client, Smart Backend

FoodLog follows a strict separation: the client is "eyes and ears" (camera, microphone, speaker, UI), and all intelligence lives on the server.

Flutter/Dart App (iOS / Web)
    <-> WebSocket (bidirectional)
Cloud Run - FastAPI + ADK Agent ("Chef")
    <-> run_live() streaming
Gemini 2.5 Flash Native Audio
    -> 5 function tool calls
    -> Firestore persistence
    -> USDA + Places grounding

Client - Built with Flutter/Dart for iOS and web. Five screens: Launch (Google Sign-In), Capture (live session with camera + audio streaming), Track Log (event feed), Event Detail (nutrition card + image preview), and Settings. The capture orchestrator manages the full session lifecycle: session creation, WebSocket relay connection, camera/audio streaming, timer, and cleanup. 599 Flutter tests.

Backend - A FastAPI server on Cloud Run with a WebSocket relay that bridges the client to Gemini via Google ADK's Runner.run_live(). The relay handles auth verification, frame size guards, session timeouts, rate limiting, content safety filtering, and structured telemetry. Five function tools are registered with the ADK Agent, producing structured FoodEvent objects with grounding from USDA and Google Places.

Persistence - Firestore with user-scoped collections (users/{uid}/events/{event_id}). Firebase Auth with Google Sign-In provides token verification on all HTTP and WebSocket endpoints. Security rules restrict data access to the authenticated user.

Key Technical Decisions:

Decision Choice Why
Agent framework Google ADK Native run_live() for bidirectional Gemini streaming; function tool auto-schema
Model gemini-2.5-flash-native-audio-preview-12-2025 Simultaneous vision + native audio in one model
Client framework Flutter/Dart Cross-platform (iOS + web) with platform camera/mic APIs
ASGI server Uvicorn Stable WebSocket support for streaming on Cloud Run
Testing approach Protocol-based fakes Every external dependency has a Protocol seam; fakes over mocks
Type checker ty (Rust) Zero # type: ignore in source; fast feedback

Infrastructure as Code:

The entire GCP environment is provisioned and deployed via scripts - no manual console clicks required. A multi-stage Dockerfile builds a minimal production image running as a non-root user. The GitHub Actions CI/CD pipeline (deploy-backend.yml) automates the full release cycle: build Docker image with BuildKit caching, push to Artifact Registry, deploy to Cloud Run, deploy Firestore security rules, and run automated smoke tests (health check, auth enforcement, CORS validation, Swagger lockdown). Eleven infrastructure scripts (scripts/) provision the full environment from scratch: service account creation with minimal IAM roles, log-based metrics and uptime checks, daily Firestore backups with 30-day GCS lifecycle policies, container vulnerability scanning, Cloud Armor WAF rules, API key restrictions, and Secret Manager validation. A complete self-hosting guide (docs/self-hosting.md) takes a reader from git clone to a running instance on their own GCP project.

Testing & Quality:

We enforced strict quality gates from day one:

  • 1,357 backend tests across 3,052 statements with 100% line coverage enforced - no exceptions, no pragmas
  • 599 Flutter tests covering all screens, capture orchestrator, and service layers
  • Zero # type: ignore in source code
  • Protocol-based fakes instead of unittest.mock - every external service (Firestore, Firebase Auth, Gemini, USDA, Places) has a Protocol interface with a test double
  • Pre-push hooks run the full quality gate; CI mirrors the same checks

Google Cloud Services Used:

Service How it's used
Cloud Run Hosts the FastAPI + ADK agent backend. WebSocket relay for Gemini Live streaming.
Firestore User-scoped event persistence (users/{uid}/events). Composite index for created_at descending. Security rules enforce authenticated access.
Firebase Auth Google Sign-In on the client. Server-side token verification via firebase_admin SDK. All endpoints require valid tokens.
Firebase Storage Meal photo uploads from capture sessions.
Gemini Live API Real-time bidirectional audio + vision streaming via ADK run_live(). Native audio model for voice interaction. Function calling for structured food event creation.
Secret Manager Stores gemini-api-key and firebase-client-config for Cloud Run deployment.
Artifact Registry Docker image storage for the foodlog-api container.
Cloud Trace Distributed tracing via OpenTelemetry exporter for request-level latency visibility.

Challenges we ran into

Bidirectional streaming is hard. The WebSocket relay between the Flutter client and Gemini Live needs to handle concurrent send/receive loops, client interruptions, session timeouts, and graceful cleanup - all asynchronously. Getting the asyncio.TaskGroup lifecycle right (cancel upstream on downstream error, re-raise exceptions properly) took significant iteration.

Audio format negotiation. Gemini's native audio model is particular about input format. We had to add audio configuration negotiation, minimum frame size validation, and MIME type constants to prevent silent audio processing failures.

Offline resilience. When the server is unreachable during a capture, we needed a clean fallback: save audio + camera frames + metadata as a local bundle, then batch-process through Gemini's generate_content() API when connectivity returns. The BundleWriter -> BundleReader -> SyncService -> GeminiBatchClient pipeline handles this.

100% coverage with Protocol fakes. Maintaining 100% line coverage without mocks required designing every external integration as a Protocol interface from the start. This paid off in test reliability (no flaky mock assertions) but demanded discipline - every new feature needs its fake.

Grounding LLM outputs. Gemini occasionally hallucinates implausible nutrition data - negative calories, macro values that exceed total calories, or wildly long food names. We added domain-level sanity checks (calorie bounds, macro cross-checks, item count limits, name length validation) that catch and reject these before they reach the user.

Accomplishments that we're proud of

  • The capture experience feels magical. Point, talk, done. Chef's voice persona is warm and concise - it confirms what it sees without repeating back the full item list.

  • 1,956 tests (1,357 backend + 599 Flutter) with 100% backend coverage enforced and zero type safety escapes. Mutation testing revealed 19 test gaps that had survived despite full coverage, and all were fixed.

  • Clean architecture with Protocol-based seams everywhere. Swapping Firestore for local JSON, or Firebase Auth for a test double, is a one-line change. No import-time side effects, no global state.

  • Fully automated cloud deployment. A single scripts/deploy-backend.sh production command builds, deploys, and health-checks the entire backend. Eleven infrastructure scripts provision the full GCP environment from scratch, and a complete self-hosting guide takes anyone from git clone to a running instance on their own GCP project.

  • Content safety wired end-to-end. Gemini responses pass through both the model's built-in safety settings and a regex-based content filter before reaching the user.

  • Grounding validation on every tool call. Domain-level sanity checks reject implausible outputs - negative calories, absurd macro values, excessive item counts, and overlong names. The model can hallucinate; the pipeline catches it.

What we learned

  • ADK's run_live() is powerful but opaque. The streaming event model requires careful handling of partial responses, tool call lifecycles, and turn completion signals. Documentation is sparse for the live streaming path.

  • Protocol-based testing scales. What started as architectural purism became a productivity multiplier. Adding the offline capture pipeline went fast because every seam was already testable.

  • Gemini's multimodal understanding is genuinely impressive. The model correctly identifies foods from camera frames even with poor lighting, partial views, and overlapping items. Combined with the user's verbal description, accuracy is high.

  • Cloud Run + WebSocket works well for real-time agent sessions, but you need to handle connection lifecycle carefully (timeouts, reconnection, session state).

What's next for FoodLog Live

The Food Feed

Calorie counting is the starting point, not the destination. The real vision is the Food Feed - a social, real-time food experience built on shared Gemini Live sessions.

Shared Meals - A family sits down for dinner. Each person opens FoodLog and joins the same shared session. Chef sees the table from multiple camera angles simultaneously - Dad's POV of the pasta, Mom's close-up of the salad, the kid's view of their plate. Chef correlates all perspectives, identifies what each person is eating, and logs individual food events for everyone. One meal, multiple viewpoints, personalized nutrition. The data model already supports this: every FoodEvent has a shared_session_id and guests field designed for multi-user capture sessions.

Live Cooking Shows - A chef streams a cooking session. Followers watch the Food Feed in real-time and see Chef's running commentary: identifying ingredients as they're added, estimating nutrition for the finished dish, logging recipe steps. Think "watch party" for food content. Viewers get structured food data from content they're already watching - no manual entry.

Smarter Agent

  • Meal pattern recognition - Chef learns your habits ("You usually have oatmeal on weekdays") and pre-fills likely items, reducing the conversation to a quick confirmation
  • Dietary goal tracking - Set a protein target or calorie budget and Chef coaches in real-time ("You're 30g short on protein today - this chicken bowl will get you there")
  • Allergy and restriction awareness - Chef flags ingredients that conflict with stored dietary restrictions before logging
  • Multi-turn recipe capture - Follow along as you cook, logging ingredients step by step, and produce a complete recipe card with nutrition at the end
  • Restaurant menu integration - When Chef identifies a restaurant via Google Places, pull the menu and cross-reference what the camera sees with known dishes for more accurate nutrition

Platform Expansion

  • Android build - Flutter supports Android natively; iOS and web are working, Android packaging is next
  • Wearable companion - A Wear OS / watchOS app that handles voice-only capture from your wrist when pulling out your phone isn't practical
  • Smart display mode - A kitchen counter view (Nest Hub, tablet) that passively watches meal prep and logs what you're cooking hands-free

Data and Insights

  • Daily and weekly summaries - Aggregate nutrition across meals to show trends, streaks, and macro balance over time
  • USDA enrichment during live captures - The USDA FoodData Central integration is wired for batch processing; enriching live sessions would further improve accuracy
  • Export and interoperability - Export food logs to Apple Health, Google Fit, or CSV for use with dietitians and health apps
  • Photo timeline - A visual gallery of every meal, searchable by date, restaurant, food item, or nutrition range

Social Features

  • Meal sharing - Share a food event card to social media or messaging apps with photo, items, and nutrition summary
  • Household accounts - A family plan where one subscription covers multiple users who can see each other's shared meals
  • Dietitian collaboration - A read-only view that lets a registered dietitian review your food log and leave feedback directly in the app

Built With

Share this project:

Updates