FitSenseAI

calender
metrics
different exercises
powered by
homepage
get ready
start detecting movements and can talk with coach in real time
Start new workout

Inspiration

Everyone wants to work out correctly but can't afford a personal trainer. Generic YouTube videos can't see YOU. We set out to build something that actually watches you, understands your movement in real time, and calls you out — loudly — when you're doing it wrong. Like a desi trainer who doesn't let you get away with lazy squats.

What it does

FitSenseAI is a real-time AI gym trainer that:

📷 Watches your form using MoveNet Thunder pose detection (17 keypoints at 15+ FPS) running entirely in your browser via TensorFlow.js
📐 Scores every rep 0–100% using a custom angle-constraint engine per movement phase (top/bottom/eccentric/concentric) across 22 exercises
🎙️ Coaches you with voice — Sarvam AI (STT + TTS) gives you spoken feedback in Hindi by default, switchable to English
🚨 Interrupts mid-exercise when form score drops below 40% — like a trainer physically stopping you to correct your form
💬 Answers questions naturally: speak "bhai meri form kaisi hai?" and the AI coach responds contextually using Gemini 2.5 Flash
📊 Tracks your history — sessions, rep scores, streak calendar stored in Prisma + SQLite

How we built it

Frontend (React 18 + TypeScript + Vite + TailwindCSS):

MoveNet Thunder loaded via @tensorflow-models/pose-detection — runs in-browser, no server round-trip for pose inference
Custom useFormScoring hook calculates joint angles using law of cosines, compares against per-exercise, per-phase angle constraints
useVoiceAgent hook: MediaRecorder buffers audio → streams over WebSocket to backend → plays back TTS audio via Web Audio API

Backend (Node.js + Express + TypeScript):

CoachEngine: rate-limited proactive coaching triggers, bilingual Hindi/English messages, checkUrgentFormInterruption() fires when score < 40 (4s cooldown)
geminiClient: Gemini 2.5 Flash via Google GenAI SDK — system prompt with [LANG] tag for bilingual coaching, multi-model fallback chain (2.5-flash → 2.5-flash-lite → 2.0-flash)
sarvamClient: Sarvam AI STT (saarika:v2.5) and TTS (bulbul:v2, speaker anushka) — default hi-IN
WebSocket /ws/voice: duplex audio stream, each connection tracks language from ?lang= param

Challenges we ran into

Rep counting accuracy: MoveNet detects both sides of the body — we had to score the "best-side" angle at each frame to avoid counting half-reps
Hindi voice understanding: Sarvam's STT returned transliterated Roman Hindi ("bhai kitna bacha hai") which required keyword matching in our fallback feedback engine
Voice interruption timing: making the coach interrupt mid-rep without cutting off the user's own speech required WebSocket state tracking and an AudioContext cancel mechanism
Phase detection threshold calibration: loosened joint angle thresholds significantly from textbook values — real humans at home have variable body proportions and camera angles

Accomplishments that we're proud of

A coach that yells at you in Hindi when your squat form breaks — "Arre bhai! Ghutne bahar rakho!" — this genuinely feels like a real trainer
22 exercises covering home workouts and gym/dumbbell movements, all with custom biomechanical angle constraints
The voice agent being interruptible — you can cut off the coach mid-sentence and ask your own question
Full end-to-end voice pipeline: mic → WebSocket → WAV → Sarvam STT → Gemini prompt → Gemini response → Sarvam TTS → WAV → speaker in under 2 seconds

What we learned

TensorFlow.js MoveNet runs remarkably well in-browser for a model this size — pose inference at 15+ FPS on a standard laptop with no GPU
Sarvam AI's Hindi voice output (bulbul:v2) sounds genuinely natural — far more so than any multilingual TTS we tested
Gemini 2.5 Flash's short-context coaching responses (150 tokens) are fast enough for real-time use when the system prompt is tightly constrained
Building bilingual AI products for Indian users requires more than just language translation — tone, idioms, and energy all need to match the cultural context

What's next for FitSenseAI

Google Cloud Run deployment for scalable global access
Gemini Live API integration — replace the current request-response pattern with true streaming live audio for sub-500ms latency
Workout plan generation — Gemini creates a weekly plan based on your history and goals
Multi-person support — track multiple athletes in the same frame
Mobile PWA — phone camera as portable gym trainer

Built With

docker
express.js
google-gemini-2.5-flash
google-genai-sdk
movenet-thunder
node.js
prisma
react-18
sarvam-ai
sqlite
tailwindcss
tensorflow.js
typescript
vite
websocket

Updates

Amit Kumar started this project — Mar 17, 2026 07:12 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.