Inspiration

The modern wellness landscape is undergoing a silent crisis. Millions of individuals are moving away from centralized gyms to the convenience of home-based workouts, yet they do so without a safety net. According to sports science statistics, over 73% of self-trained gym-goers exercise with incorrect biomechanical form, risking joint wear, muscle tears, and chronic injuries. At the same time, professional coaching is prohibitively expensive—typically costing between $\text{INR } 2,000$ to $\text{INR } 5,000$ per session—leaving high-quality personal training as a premium luxury reserved for the elite.

Furthermore, traditional health applications are fundamentally broken:

  • They offer static, non-adaptive PDFs masquerading as "personalized plans."
  • They rely on tedious manual nutrition entry, leading to a 68% abandonment rate within the first two weeks.
  • They completely ignore the digital health accessibility needs of over 600 million Hindi-speaking users by offering services strictly in English.

We asked ourselves: What if we could build an affordable, autonomous fitness intelligence ecosystem that runs directly inside a standard web browser? An application that can SEE your body, COACH your movements, and AUTOMATE your health tracking through natural language conversation. This vision inspired us to build GenFit AI—democratizing elite personal training and putting a bionic coach in the pocket of 1.4 billion people.

What it does

GenFit AI is a comprehensive, closed-loop, AI-driven wellness platform that bridges the physical and digital worlds. It functions as an active health advisor that handles the "thinking" (nutritional planning and workout scheduling), the "watching" (computer-vision-based real-time form correction), and the "logging" (automated data bookkeeping).

Key Capabilities:

  1. Virtual Training Assistant (VTA): A browser-native pose coach powered by TensorFlow.js (MoveNet Thunder) that tracks 17 skeletal keypoints at 30 FPS. It guides the user through over 60 exercises, validates rep depth and joint angles, and rejects half-reps or bad posture.
  2. Bilingual Voice Narration: The VTA doesn't just display visual metrics; it coaches you out loud in real time, delivering biomechanical cues and calorie counts in both English and Hindi (e.g., "Keep your back straight / Apni peeth seedhi rakhein").
  3. Agentic AI Chatbot (FitBot): A Groq-powered AI agent equipped with 6 custom database tools. Users can type or speak in Hindi or English (via Whisper transcribing), and FitBot will perform actions autonomously—such as updating their BMI manually, logging a meal, generating a workout plan, or updating a profile metric.
  4. Smart Calorie Tracker with Multimodal Vision: Rather than searching databases and measuring weights, users can upload an image of their food plate. The Vision AI classifies the items, estimates the volume-to-calorie metrics, and logs the macronutrients automatically.
  5. Google Fit Wearable Integration: Users can link their Google Fit account in their profile page to passively sync daily step counts, calorie burn, and wearable biometrics into our central system.
  6. Dynamic Plan Generators: Leverages Gemini 2.0 Flash to instantly generate highly detailed, weekly workout plans and diet charts linked to the user's current BMI, goals, and medical conditions.
  7. Gamification & Community: Gamifies consistency through streaks, rank badges, a global leaderboard, weekly challenges, and a community messaging hub.
  8. Razorpay Monetization & Admin Dashboard: Prompts users with a credit-based free plan limit, offering a seamless Pro upgrade pathway via Razorpay (test mode) with robust server-side cryptographic signature verification. An advanced Admin Mode allows administrators to manage users, monitor transaction statistics, post weekly challenges, and resolve support tickets.

How we built it

We architected a decoupled, high-performance web and mobile ecosystem utilizing the MERN stack (frontend + backend) along with a dedicated python_ai subsystem and a fitsync_flutter mobile application.

graph TD
    A[React Client / Flutter App] -->|OAuth| B[Google Authentication]
    A -->|Text/Voice Speech| C[Groq LPU / Whisper API]
    C -->|Agentic Tool-Calling| D[Express.js / Node.js Backend]
    A -->|Image Upload| E[Vision AI Model]
    A -->|Camera Feed at 30FPS| F[TF.js MoveNet Engine]
    D -->|HMAC-SHA256 Verification| G[Razorpay Gateway]
    D -->|Query & Write| H[(MongoDB Atlas distributed Ledger)]
    D -->|SMS Trigger| I[Nexmo API]
    D -->|AI Orchestration| J[Gemini 2.0 Flash]
    J -->|Structured JSON Plans| H

1. The Biomechanical Math (Virtual Training Assistant)

Rather than offloading video frames to a high-cost server GPU, we executed TensorFlow.js MoveNet Thunder entirely client-side. The system calculates joint vertex angles in the browser. Let three keypoints be $A(x_a, y_a)$, $B(x_b, y_b)$ (the vertex), and $C(x_c, y_c)$. The vectors are: $$\vec{BA} = (x_a - x_b, y_a - y_b), \quad \vec{BC} = (x_c - x_b, y_c - y_b)$$

The angle $\theta$ is determined dynamically via the dot product cosine rule: $$\cos(\theta) = \frac{\vec{BA} \cdot \vec{BC}}{|\vec{BA}| |\vec{BC}|}$$ $$\theta = \arccos\left( \frac{(x_a - x_b)(x_c - x_b) + (y_a - y_b)(y_c - y_b)}{\sqrt{(x_a - x_b)^2 + (y_a - y_b)^2} \sqrt{(x_c - x_b)^2 + (y_c - y_b)^2}} \right)$$

For exercises like squats, our state-machine logic enforces depth:

  • State "Neutral": $\theta > 160^\circ$ (standing tall).
  • State "Active Down": $\theta \le 90^\circ$ (full depth reached).
  • State "Rep Complete": Transition from "Active Down" back to "Neutral" ($\theta > 160^\circ$).

If $\theta$ fails to cross the $90^\circ$ threshold during the eccentric phase, the rep is rejected, and a verbal correction is voiced.

2. Metabolic Calorie Calculation

During VTA training sessions, the calorie burn rate is modeled in real time utilizing Metabolic Equivalents (METs) relative to the user's weight: $$\text{Calories Burned per Minute} = \text{MET} \times 3.5 \times \frac{W}{200}$$ where $W$ is the body weight in kilograms (extracted dynamically from the profile). For squats ($\text{MET} \approx 5.0$), the system calculates energy expenditure based on active movement time and rep velocity.

3. Agentic FitBot Architecture

The chatbot operates on a Reasoning-over-Tools (ReAct) pattern using Groq's high-speed Language Processing Unit (LPU) architecture, running Llama 3.3 at over 500 tokens/second. We built a native JSON-based schema allowing the LLM to call 6 distinct backend controllers:

  • update_bmi(value)
  • log_food(food_item, calories)
  • create_workout_plan(params)
  • create_diet_chart(params)
  • update_profile(field, value)
  • fetch_user_plan()

For example, if a user speaks in Hindi: "Mera weight update karo taaki mera BMI 50 ho jaye" (Update my weight so that my BMI becomes 50), the system:

  1. Translates/transcribes Whisper voice to text.
  2. Extracts intent using Groq.
  3. Invokes update_bmi(50).
  4. Calculates required body weight: $$W = \text{BMI} \times H^2$$
  5. Updates the MongoDB Atlas database via Express backend.
  6. Returns a synchronized bilingual confirmation out loud.

4. Multimodal Food Logging

The image calorie tracker estimates caloric density from photos using a spatial food density estimation: $$\text{Total Caloric Density} = \sum_{i=1}^{N} \Big( \rho_i \cdot V_i \cdot C_i \Big)$$ where $N$ is the identified food items, $\rho_i$ is the density ($\text{g/cm}^3$), $V_i$ is the estimated volume ($\text{cm}^3$) based on image bounding boxes, and $C_i$ is the macronutrient caloric index ($\text{kcal/g}$).

Challenges we ran into

  1. Client-Side Biomechanical Occlusion & Barrel Distortion: Camera feeds suffer from varying lens distortion, shifting joint angles. A $90^\circ$ squat on a wide-angle mobile lens occasionally registered as $85^\circ$ or $95^\circ$ depending on camera tilt. We overcame this by implementing a T-Pose Spatial Calibration sequence during session initialization, normalizing the coordinate system relative to the user's physiological height in the frame.
  2. LLM Hallucinations in Database Queries: During agentic tool calling, early iterations of FitBot tried to write raw SQL/Mongoose commands directly, posing a critical security injection risk. We resolved this by decoupling database access entirely; the agent can only output structured JSON payloads containing strictly validated parameter values, which are subsequently parsed and checked by our Node.js middleware.
  3. Razorpay Signature Spoofing: We had to secure our payment gateway against mock payment confirmation hacks. We solved this by implementing strict server-side cryptographic signature verification using HMAC-SHA256: $$\text{Generated Signature} = \text{HMAC-SHA256}(\text{order_id} + "|" + \text{payment_id}, \text{Razorpay_Secret_Key})$$ The transaction status is only upgraded to "Pro" when the generated signature mathematically matches the Razorpay transaction payload signature.
  4. WebSocket Latency in Voice Narration: Real-time voice prompts suffered from network lag when generated on the server. We solved this by using native client-side speech synthesis engines (Web Speech API) and locally caching translation models, keeping feedback latency under $40\text{ms}$.

Accomplishments that we're proud of

  • True Browser Autonomy: Running a 30 FPS biomechanical movement engine on a standard browser tab at zero server GPU cost. This client-side execution makes our platform scalable to millions of concurrent users with an operational cost of less than $\text{INR } 40$ ($\$0.50$) per user/month.
  • Bilingual Accessibility: Developing the first agentic fitness engine that operates smoothly in both English and Hindi, bringing digital health parity to underserved regional communities.
  • Robust MERN + Python/Flutter Parity: Syncing user profiles, Google Fit metrics, and VTA results flawlessly across the React web client, Python analysis core, and Flutter mobile interfaces.
  • Zero-Friction Calorie Tracking: Building a working multimodal food recognizer that saves users over 15 minutes of manual food logging daily, converting a simple camera snap.

What we learned

  • AI Agent Security is Paramount: AI agents must never have direct write access to the database. They should act as intent parsers, leaving authorization and data validation to secure, well-structured API endpoints.
  • Edge Inference is the Future: Offloading compute to the client (TensorFlow.js) is not only cost-efficient but also ensures instantaneous feedback, which is critical for safety-critical exercises where a split-second delay could result in form failure.
  • Atomic Git Craftsmanship: We learned that maintaining clean, atomic commits with rich documentation allows team members to collaborate on complex systems—like payment signatures and posture estimation state machines—without experiencing integration conflicts.

What's next for Untitled

  1. Predictive Biomechanics (Injury Pre-emption): Expanding the VTA to track long-term posture trends. If the AI detects a gradual 2-degree hip deviation over 3 weeks, it will proactively flag a potential joint imbalance and recommend a specialized recovery plan.
  2. Zero-Knowledge Proofs for Health Insurance: Implementing ZKPs to let users verify their physical activity consistency (e.g., a Reputation Score $> 90$) with medical insurance companies to claim lower premium rates, without revealing their private videos or raw biometric data.
  3. Augmented Reality Integration: Deploying the VTA on AR glasses (such as Apple Vision Pro or Meta Quest) to project a 3D "Ghost Trainer" overlay in the user's field of view.
  4. Regional Language Expansion: Integrating other Indian languages (Tamil, Telugu, Bengali, Marathi) to expand our coverage to a wider regional population.

Built With

Share this project:

Updates