Building FitMind — My AI Fitness Coach Story
Real-time posture correction, voice-guided workouts, and personalized plans — powered by Gemini 2.5 Flash, Firebase, and Google Cloud Run.
The Spark — What Inspired Me
I've always struggled with consistency in my fitness journey. Not because of a lack of motivation, but because I never had someone watching, correcting, and encouraging me in real time. Personal trainers are expensive. Fitness apps are passive. They show you a plan but never actually see you.
The question that started everything: What if your phone could be your personal trainer — one that sees you, hears you, and responds instantly?
When I discovered that Gemini 2.5 Flash could process live video frames AND voice in the same context window, the idea crystallised. I wanted to build something that felt like a real coaching conversation — not a chatbot, not a timer app — but an AI coach that genuinely reacts to what you're doing in the moment.
The Vision
FitMind is a real-time AI fitness coach that:
- 👁 Sees you — analyses your posture and form via webcam, frame by frame
- 🎙 Hears you — listens continuously through the Web Speech API, no button required
- 🧠 Thinks about you — generates a personalised 7-day fitness and nutrition plan based on your goals
- 📊 Remembers you — stores every session in Firestore, tracks streaks, and generates post-session insights
- 🎬 Captures your best moments — saves highlight video clips from peak form moments to Cloud Storage
How I Built It
The Stack
I chose a stack that could handle real-time AI workloads at scale while staying within Google Cloud's ecosystem:
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React, Tailwind CSS |
| Backend | Node.js, Express, TypeScript |
| AI | Vertex AI — Gemini 2.5 Flash |
| Database | Firestore (Firebase Admin SDK) |
| Storage | Google Cloud Storage |
| Speech | Web Speech API (STT) + Google Text-to-Speech (TTS) |
| Deployment | Google Cloud Run + Firebase Hosting |
The Agent Pipeline
The backend is organised as three specialised AI agents, each with a focused prompt:
- GoalPlanAgent — takes the user's fitness profile and generates a structured 7-day workout + nutrition plan via a single Gemini call
- LiveCoachingAgent — receives video frames (base64) + voice transcripts and returns real-time corrections, motivational cues, and rep counts
- FeedbackAgent — analyses the completed session's highlights and stats to produce a personalised post-workout summary with coach insights
The Real-Time Loop
The coaching loop runs entirely in the browser:
Camera frame (every 4s) ──► /api/coaching/frame ──► Gemini (vision + text)
│
▼
Web Speech API (continuous) ──► /api/coaching/voice ──► coach response
│
▼
Google TTS ──► audio plays
Coach Feed updated
UI Design — Inspired by Lovable
The visual design of FitMind was inspired by Lovable — a tool I used as a reference for modern, clean AI product aesthetics.
I translated that into FitMind's own design system:
- Warm blush landing (
#F5EEE8) — approachable and human, not clinical - Cool lavender coaching screens (
#EDE9F6) — focused and calm during workouts - Brand purple (
#7C5CFC) — confidence and energy, used for all primary actions - Brand pink (
#E879A0) — used in gradients for feature badges and highlights - DM Sans for headings, Nunito for body — geometric boldness balanced with readability
Every card, button radius, shadow, and gradient was designed to feel like a premium consumer app, not a hackathon prototype.
What I Learned
1. Vertex AI vs Google AI Studio
Using Vertex AI with Application Default Credentials (no API key) was a revelation. Once gcloud auth application-default login is set up, Firebase Admin, Vertex AI, Cloud Storage — everything just works with the same identity. No secrets to manage, no key files to leak.
2. Multimodal prompting is an art
Sending a base64 video frame alongside a voice transcript to Gemini and getting a coherent, contextual coaching response required careful prompt engineering. The model needed to know the exercise context, the user's fitness level, and the coach's personality — all in one prompt.
3. React state is not synchronous
The most painful bug: multiple timers running simultaneously because isActive React state hadn't propagated before a second button click fired. The fix was switching to useRef for synchronous guards — a lesson in the difference between React state (for rendering) and refs (for imperative logic).
4. Cloud Run cold starts matter
On the first request after inactivity, Cloud Run takes a few seconds to spin up. For a live coaching app, that felt jarring. Setting --min-instances 1 on the backend service keeps it warm for demos.
5. Firebase Hosting ≠ Cloud Run
Firebase Hosting is a CDN that proxies to Cloud Run — deploying Cloud Run doesn't automatically update what Firebase serves. You need firebase deploy --only hosting every time to flush the CDN cache. This tripped me up many times.
Challenges I Faced
Firestore rejecting undefined
Firestore throws a hard error if any field is undefined — it won't silently skip it. The goals form had optional fields (dietaryPreferences, injuries) that arrived as undefined instead of empty arrays. The fix: ?? [] defaults on every optional field, plus ignoreUndefinedProperties: true on the Admin SDK init.
Continuous speech without button fatigue
The original design required pressing a "Talk" button to speak to the coach. During a real workout — when your hands are busy — that's unusable. Switching to the Web Speech API's continuous: true mode made the interaction hands-free, but required careful handling to pause recognition while the coach's TTS audio was playing (otherwise the AI would hear itself).
Multiple sessions from one click
Rapid clicks created 6+ simultaneous coaching sessions and timers, making the timer race to show 36 minutes after 36 seconds. The root cause: React state updates are async, so isActive wasn't true by the time a second click arrived. Solved with a useRef guard that's synchronous.
What's Next
- 🏋 Exercise library — detect specific exercises automatically from video, not just user-selected
- 📱 Mobile PWA — make it installable on phone for workouts away from a desk
- 👥 Coach personalities — different voice styles, motivation levels, and specialties
- 🏆 Social streaks — share milestones and challenge friends
- 🎞 Highlight reel — stitch the best-form clips across sessions into a progress video
Built with ☕, Claude, curiosity, and a lot of gcloud builds submit during the Gemini Live Agent Challenge 2026.
Built With
- express.js
- firebase
- firestore
- google-cloud
- node.js
- react
- tailwindcss
- typescript
- vertex-ai
- webspeechapi
Log in or sign up for Devpost to join the conversation.