Building FitMind — My AI Fitness Coach Story

Real-time posture correction, voice-guided workouts, and personalized plans — powered by Gemini 2.5 Flash, Firebase, and Google Cloud Run.

The Spark — What Inspired Me

I've always struggled with consistency in my fitness journey. Not because of a lack of motivation, but because I never had someone watching, correcting, and encouraging me in real time. Personal trainers are expensive. Fitness apps are passive. They show you a plan but never actually see you.

The question that started everything: What if your phone could be your personal trainer — one that sees you, hears you, and responds instantly?

When I discovered that Gemini 2.5 Flash could process live video frames AND voice in the same context window, the idea crystallised. I wanted to build something that felt like a real coaching conversation — not a chatbot, not a timer app — but an AI coach that genuinely reacts to what you're doing in the moment.

The Vision

FitMind is a real-time AI fitness coach that:

  • 👁 Sees you — analyses your posture and form via webcam, frame by frame
  • 🎙 Hears you — listens continuously through the Web Speech API, no button required
  • 🧠 Thinks about you — generates a personalised 7-day fitness and nutrition plan based on your goals
  • 📊 Remembers you — stores every session in Firestore, tracks streaks, and generates post-session insights
  • 🎬 Captures your best moments — saves highlight video clips from peak form moments to Cloud Storage

How I Built It

The Stack

I chose a stack that could handle real-time AI workloads at scale while staying within Google Cloud's ecosystem:

Layer Technology
Frontend Next.js 14, React, Tailwind CSS
Backend Node.js, Express, TypeScript
AI Vertex AI — Gemini 2.5 Flash
Database Firestore (Firebase Admin SDK)
Storage Google Cloud Storage
Speech Web Speech API (STT) + Google Text-to-Speech (TTS)
Deployment Google Cloud Run + Firebase Hosting

The Agent Pipeline

The backend is organised as three specialised AI agents, each with a focused prompt:

  1. GoalPlanAgent — takes the user's fitness profile and generates a structured 7-day workout + nutrition plan via a single Gemini call
  2. LiveCoachingAgent — receives video frames (base64) + voice transcripts and returns real-time corrections, motivational cues, and rep counts
  3. FeedbackAgent — analyses the completed session's highlights and stats to produce a personalised post-workout summary with coach insights

The Real-Time Loop

The coaching loop runs entirely in the browser:

Camera frame (every 4s) ──► /api/coaching/frame ──► Gemini (vision + text)
                                                          │
                                                          ▼
Web Speech API (continuous) ──► /api/coaching/voice ──► coach response
                                                          │
                                                          ▼
                                              Google TTS ──► audio plays
                                              Coach Feed updated

UI Design — Inspired by Lovable

The visual design of FitMind was inspired by Lovable — a tool I used as a reference for modern, clean AI product aesthetics.

I translated that into FitMind's own design system:

  • Warm blush landing (#F5EEE8) — approachable and human, not clinical
  • Cool lavender coaching screens (#EDE9F6) — focused and calm during workouts
  • Brand purple (#7C5CFC) — confidence and energy, used for all primary actions
  • Brand pink (#E879A0) — used in gradients for feature badges and highlights
  • DM Sans for headings, Nunito for body — geometric boldness balanced with readability

Every card, button radius, shadow, and gradient was designed to feel like a premium consumer app, not a hackathon prototype.

What I Learned

1. Vertex AI vs Google AI Studio

Using Vertex AI with Application Default Credentials (no API key) was a revelation. Once gcloud auth application-default login is set up, Firebase Admin, Vertex AI, Cloud Storage — everything just works with the same identity. No secrets to manage, no key files to leak.

2. Multimodal prompting is an art

Sending a base64 video frame alongside a voice transcript to Gemini and getting a coherent, contextual coaching response required careful prompt engineering. The model needed to know the exercise context, the user's fitness level, and the coach's personality — all in one prompt.

3. React state is not synchronous

The most painful bug: multiple timers running simultaneously because isActive React state hadn't propagated before a second button click fired. The fix was switching to useRef for synchronous guards — a lesson in the difference between React state (for rendering) and refs (for imperative logic).

4. Cloud Run cold starts matter

On the first request after inactivity, Cloud Run takes a few seconds to spin up. For a live coaching app, that felt jarring. Setting --min-instances 1 on the backend service keeps it warm for demos.

5. Firebase Hosting ≠ Cloud Run

Firebase Hosting is a CDN that proxies to Cloud Run — deploying Cloud Run doesn't automatically update what Firebase serves. You need firebase deploy --only hosting every time to flush the CDN cache. This tripped me up many times.

Challenges I Faced

Firestore rejecting undefined

Firestore throws a hard error if any field is undefined — it won't silently skip it. The goals form had optional fields (dietaryPreferences, injuries) that arrived as undefined instead of empty arrays. The fix: ?? [] defaults on every optional field, plus ignoreUndefinedProperties: true on the Admin SDK init.

Continuous speech without button fatigue

The original design required pressing a "Talk" button to speak to the coach. During a real workout — when your hands are busy — that's unusable. Switching to the Web Speech API's continuous: true mode made the interaction hands-free, but required careful handling to pause recognition while the coach's TTS audio was playing (otherwise the AI would hear itself).

Multiple sessions from one click

Rapid clicks created 6+ simultaneous coaching sessions and timers, making the timer race to show 36 minutes after 36 seconds. The root cause: React state updates are async, so isActive wasn't true by the time a second click arrived. Solved with a useRef guard that's synchronous.

What's Next

  • 🏋 Exercise library — detect specific exercises automatically from video, not just user-selected
  • 📱 Mobile PWA — make it installable on phone for workouts away from a desk
  • 👥 Coach personalities — different voice styles, motivation levels, and specialties
  • 🏆 Social streaks — share milestones and challenge friends
  • 🎞 Highlight reel — stitch the best-form clips across sessions into a progress video

Built with ☕, Claude, curiosity, and a lot of gcloud builds submit during the Gemini Live Agent Challenge 2026.

Built With

Share this project:

Updates