Inspiration

Tracking what you eat is tedious: manual entry, guessing portions, and generic apps that don’t understand your meal. We wanted a nutrition assistant that feels natural — show your food or describe it in your own words and get real-time, context-aware advice. We built Nutrition Assistant so logging food is as simple as taking a photo or having a quick voice conversation, with AI doing the heavy lifting.

What it does

  • Photo-based food analysis: Point your camera at a meal. The app detects foods, estimates portions, and returns calories, protein, carbs, and fats, plus a short nutrition summary.
  • Voice conversation: Talk to the assistant in real time. Describe what you ate or ask questions; you get spoken and text responses powered by Gemini Live, with low latency.
  • Recipe generation: From a food photo, get a full recipe (ingredients, steps, notes) tailored to servings and preferences, with optional nutrition estimates.
  • Nutrition logging: Save analyses to your history so you can track intake over time.

How we built it

  • Mobile (Expo): React Native 0.83, Expo SDK 55, file-based routing, NativeWind for styling, Zustand for state. Camera screen for food photos; dedicated voice screen with chunked audio streaming and real-time playback of AI speech.
  • Backend: NestJS 11 on Fastify, Prisma 6, PostgreSQL 16. REST APIs for auth, meal analysis, recipe generation, and logging; WebSocket for the live voice session with Gemini.
  • AI: Google Gemini (vision for images, Live API for real-time voice). One model handles: image → dish detection + nutrition estimate; voice → turn-by-turn conversation with text and audio out.
  • Infrastructure: Cloud Run for the API, Cloud SQL (PostgreSQL), Cloud Storage for images and audio. CI/CD with GitHub Actions and Turborepo; EAS for mobile builds.

Challenges we ran into

  • Gemini Live integration: Wiring bidirectional audio (streaming user audio in, playing model audio out) with chunking and WebSocket backpressure took care; we had to align chunk sizes and handle reconnects and session lifecycle.
  • Structured output from vision: Getting consistent JSON for food detection and nutrition from the vision model required clear prompts, schema validation (Zod), and fallbacks when the model returned free text.
  • Fusing user context with the image: Making the assistant feel truly personal meant combining what we know about the user (goals, history, preferences) with each photo they send — so advice and recipes aren’t generic but tailored. Getting that context into the right prompts and keeping vision + user data in sync was a key challenge.

Accomplishments that we're proud of

  • Multimodal input in one app: Same backend serves both “photo of my plate” and “voice conversation about my meal” with a clean, consistent experience.
  • User profile + current session in one flow: We combine the user profile (built from previous discussions and choices) with the current conversation and image in a single multimodal context — so every response is informed by both who they are and what they’re showing or saying right now.
  • Real-time voice with Gemini Live: Low-latency voice-in, text + audio-out conversation that feels responsive and is usable for quick logging and questions.
  • Production-ready backend: Auth, tokens, Prisma migrations, Cloud Run deploy, and health checks so the app can scale beyond a demo.

What we learned

  • How to integrate Gemini’s multimodal APIs (vision + Live) in a single product and when to use REST vs WebSocket.
  • Practical patterns for streaming audio (chunking, playback queue, session teardown) in a React Native app.

What's next for nutrition-assistant

  • Richer nutrition history and simple trends (e.g. weekly summaries, macro breakdowns).
  • More personalization: dietary goals, restrictions, and preferences feeding into analysis and recipes.
  • Optional social or sharing (e.g. share a recipe or a day’s log) and better offline support for logging.

Built With

Share this project:

Updates