Inspiration
Tracking what you eat is tedious: manual entry, guessing portions, and generic apps that don’t understand your meal. We wanted a nutrition assistant that feels natural — show your food or describe it in your own words and get real-time, context-aware advice. We built Nutrition Assistant so logging food is as simple as taking a photo or having a quick voice conversation, with AI doing the heavy lifting.
What it does
- Photo-based food analysis: Point your camera at a meal. The app detects foods, estimates portions, and returns calories, protein, carbs, and fats, plus a short nutrition summary.
- Voice conversation: Talk to the assistant in real time. Describe what you ate or ask questions; you get spoken and text responses powered by Gemini Live, with low latency.
- Recipe generation: From a food photo, get a full recipe (ingredients, steps, notes) tailored to servings and preferences, with optional nutrition estimates.
- Nutrition logging: Save analyses to your history so you can track intake over time.
How we built it
- Mobile (Expo): React Native 0.83, Expo SDK 55, file-based routing, NativeWind for styling, Zustand for state. Camera screen for food photos; dedicated voice screen with chunked audio streaming and real-time playback of AI speech.
- Backend: NestJS 11 on Fastify, Prisma 6, PostgreSQL 16. REST APIs for auth, meal analysis, recipe generation, and logging; WebSocket for the live voice session with Gemini.
- AI: Google Gemini (vision for images, Live API for real-time voice). One model handles: image → dish detection + nutrition estimate; voice → turn-by-turn conversation with text and audio out.
- Infrastructure: Cloud Run for the API, Cloud SQL (PostgreSQL), Cloud Storage for images and audio. CI/CD with GitHub Actions and Turborepo; EAS for mobile builds.
Challenges we ran into
- Gemini Live integration: Wiring bidirectional audio (streaming user audio in, playing model audio out) with chunking and WebSocket backpressure took care; we had to align chunk sizes and handle reconnects and session lifecycle.
- Structured output from vision: Getting consistent JSON for food detection and nutrition from the vision model required clear prompts, schema validation (Zod), and fallbacks when the model returned free text.
- Fusing user context with the image: Making the assistant feel truly personal meant combining what we know about the user (goals, history, preferences) with each photo they send — so advice and recipes aren’t generic but tailored. Getting that context into the right prompts and keeping vision + user data in sync was a key challenge.
Accomplishments that we're proud of
- Multimodal input in one app: Same backend serves both “photo of my plate” and “voice conversation about my meal” with a clean, consistent experience.
- User profile + current session in one flow: We combine the user profile (built from previous discussions and choices) with the current conversation and image in a single multimodal context — so every response is informed by both who they are and what they’re showing or saying right now.
- Real-time voice with Gemini Live: Low-latency voice-in, text + audio-out conversation that feels responsive and is usable for quick logging and questions.
- Production-ready backend: Auth, tokens, Prisma migrations, Cloud Run deploy, and health checks so the app can scale beyond a demo.
What we learned
- How to integrate Gemini’s multimodal APIs (vision + Live) in a single product and when to use REST vs WebSocket.
- Practical patterns for streaming audio (chunking, playback queue, session teardown) in a React Native app.
What's next for nutrition-assistant
- Richer nutrition history and simple trends (e.g. weekly summaries, macro breakdowns).
- More personalization: dietary goals, restrictions, and preferences feeding into analysis and recipes.
- Optional social or sharing (e.g. share a recipe or a day’s log) and better offline support for logging.
Built With
- cloud-storage
- gemini
- github-actions
- google-cloud-run
- nextjs
- react-native
Log in or sign up for Devpost to join the conversation.