Inspiration
🏋️♂️ We all love to stay fit and hit our goals 🏃♂️, but what exactly goes behind converting a home workout into a professional training session? 🏠
Home Athletes face a massive "Feedback Void." Without a coach to watch their form, they risk chronic injury and lose the motivation to continue. 📉 It becomes nearly impossible for solo trainers to focus on what they do best, Pushing their Limits. 🦾
Beginners too, need to wait for expensive personal trainers or search through endless videos for form tips, and their specific questions regarding posture, rep counts, and calorie burn are never answered in real-time. 🤔 Managing a workout schedule based on available equipment and energy levels is also a major roadblock. 🚧
The Main Problems we are solving:
Form Management 🧘♂️ – Bridging the "Feedback Void" by providing real-time visual and audio posture correction.
Performance Tracking 📊 – Keeping track of the "hard stats" like total volume, reps, and precise calorie expenditure.
Injury Prevention 🛑 – Identifying dangerous form "Roadblocks" (like rounded backs or caving knees) before they lead to pain.
Motivation Management 🕺 – Using a human-like 3D Emoji Avatar that mirrors the user, making a lonely workout feel like a shared game.
Knowledge Access 📖 – Providing instant answers to questions regarding exercise mechanics, muscle activation, and athletic guidelines.
Customization Management ⚙️ – Adjusting the coaching intensity and feedback style based on the user's real-time fatigue and performance.
What it does
Fit Live Frame AI is a real-time, vision-powered fitness mentor. Instead of a static video or a text-based coach, it uses the Gemini 3 Flash Multimodal Live API to "watch" your workout. It syncs your movements with a 3D human-emoji avatar that mirrors your form, providing instant verbal corrections and live performance stats (precision, reps, and calories).
🚀Features
Main features of Fit Live Frame AI are:
Live Motion Shadowing 🧘♂️ – Fit Live Frame AI creates a real-time "Digital Twin" of the user using the Gemini 3 Flash Multimodal Live API. The agent observes the user’s posture and synchronizes it with a 3D human-emoji avatar, providing a visual mirror for perfect form.
Precision HUD & Scoring 📊 – Powered by Gemini 3.1 Pro’s advanced spatial reasoning, the app calculates a "Precision Score" (0-100%) for every rep. It compares the user's real-time joint angles against professional athletic standards stored in a Google Cloud Vertex AI Vector Search index.
Roadblock Detection 🚧 – Utilizing the ADK (Agent Development Kit) Bidi-streaming, the agent identifies dangerous form "Roadblocks" (like rounded backs or caving knees). When a risk is detected, the 3D avatar physically stops and points to the error while the agent provides immediate audio correction.
Biometric Analytics 📈 – Fit Live Frame AI tracks real-time "hard stats" including rep counts, total volume, and metabolic burn. It uses Google Cloud Run to process these metrics and provide a live calorie-burn counter on the user's dashboard.
Interactive Coaching 🎙️ – Backed by the Gemini 2.5 Flash Native Audio model, the agent supports natural, bidirectional conversation. Users can interrupt the coach mid-exercise to ask questions like "Is my back straight?" and receive an instant, context-aware response.
Session Summarization 📝 – At the end of every workout, the agent utilizes Gemini 3 Flash to generate a "Shadow Report." This summary includes average precision, total calories, and a personalized "Key Takeaway" for the next session, all stored in Google Cloud Firestore.
Visual Progress Reports 🎨 – Using Nano Banana 2 (Gemini 3.1 Flash Image), the app generates a custom "Victory Poster" for the user after a high-performance session, featuring their animated avatar and key achievements to share on social media.
Audio Atmosphere 🎵 – Powered by Lyria 3, Fit Live Frame AI generates high-fidelity, 30-second workout tracks that dynamically adjust their tempo and energy based on the intensity of the user's movement.
🏗️ How we built the app
BACKEND Python 🐍 – We chose Python as our core language for its robust support of asynchronous streaming and AI integration.
FastAPI 🚀 – We utilized FastAPI to manage high-speed WebSocket connections, ensuring sub-second latency between the user's camera and the AI.
Agent Development Kit (ADK) 🛠️ – We used the Google ADK to build the bidirectional (Bidi) streaming agent, managing the complex orchestration of live video, audio, and tool use.
Google Cloud Run ☁️ – The entire backend is containerized and hosted on Cloud Run, providing a scalable, serverless environment for real-time processing.
Cloud Storage & Firestore 📂 – We used Google Cloud Storage to persist session recordings and Firestore to track long-term precision trends and calorie data.
AI MODELS
Gemini 3.1 Flash 🤖 – Our primary "Brain." We used the Multimodal Live API for real-time vision analysis to identify exercises and calculate form precision.
Gemini 3.1 Pro 🧠 – We leveraged the "Thinking" mode of the Pro model for deep post-session analysis and to generate personalized "Shadow Reports."
Nano Banana 2 🎨 – We used this lightning-fast image model to generate custom "Victory Posters" and visual HUD elements for the user's dashboard.
Veo 3.1 🎥 – For users who hit a "Roadblock," Veo generates 8-second, high-fidelity 4K instructional clips showing the perfect version of that specific exercise.
Lyria 3 🎵 – We integrated the Lyria RealTime API to generate dynamic, 48kHz stereo workout music that adjusts its BPM and intensity based on the user's movement speed.
FRONTEND
ReactJS ⚛️ – Our main frontend framework, used to build the interactive HUD (Heads-Up Display) and handle the live webcam stream.
Three.js 🧊 – We used Three.js to render the 3D Human Emoji Avatar, mapping the joint coordinates received from Gemini directly onto the 3D model.
Tailwind CSS 🎨 – Our choice for styling, creating a futuristic, "Sports-Lab" aesthetic for the user interface.
Google OAuth 🔑 – We integrated Google Identity for secure user authentication and to link workout data to the user's Google account.
Challenges we ran into
🏋️ and Hurdles 🚧 are the most important part of any journey; they test our enthusiasm and patience while pushing us to innovate.
Real-Time Latency ⏱️ – We faced significant issues with "Turn Detection" and latency spikes (5-10s) when streaming continuous video. To solve this, we moved from standard unary requests to the ADK Bidi-streaming mode, allowing the AI to process frames asynchronously without waiting for the user to "finish" their turn.
Spatial Reasoning Accuracy 📐 – Initially, the model struggled to differentiate between a "shallow squat" and a "perfect squat" due to camera angles. We overcame this by implementing Thinking Level: High for specific pose-check intervals and using Media Resolution: High to ensure Gemini could clearly see joint alignment.
Audio-Visual Sync 🎙️ – Syncing the Lyria 3 generated music with the user's physical rep tempo was a massive hurdle. We had to build a custom "Tempo-Sync" logic that adjusted the music’s BPM in real-time based on the velocity data returned from the Gemini 3 Flash vision analysis.
Session Resumption 🔄 – In a gym environment, Wi-Fi can be unstable. We struggled with sessions dropping mid-workout until we implemented the ADK’s Session Resumption handle, allowing the "Coach" to pick up exactly where it left off without losing the user's rep count or calorie data.
3D Avatar Mapping 🧊 – Converting Gemini's raw text descriptions of movement into Three.js coordinates for our human emoji was a complex math challenge. We solved this by forcing Gemini to output Strict JSON Thought Signatures, which provided the precise XYZ coordinates needed for the 3D model's skeletal rig.
🏆 Accomplishments that we're proud of
Multimodal Synergy 🤖 – We are proud of the fact that we were able to use various Google AI models collaboratively—Gemini 3 Flash for vision, Nano Banana 2 for imaging, Veo for instructional video, and Lyria 3 for dynamic audio—to create a seamless, unified coaching experience.
Bidi-Streaming Mastery 📡 – We are proud that we successfully implemented the ADK (Agent Development Kit) to handle complex bidirectional streaming. Achieving sub-second latency for the 3D human-emoji mirroring was a major technical milestone.
Grounding & Safety 🛡️ – We are proud of our "Athletic Intelligence" system. By grounding the agent in professional fitness guidelines using Vertex AI Vector Search, we ensured the coach provides safe, medically sound advice rather than generic AI responses.
Fit Live Frame AI 🦾 – We are proud of the fact that we created a fully functional "Sports Lab in a Pocket" that leverages Gemini 3.1 and the Multimodal Live API to help home athletes clear form roadblocks, track precision, and stay motivated through a synchronized digital twin.
Adaptive Music Engine 🎵 – A standout achievement was our ability to use Lyria 3 to generate real-time workout tracks that physically "breathe" with the user, increasing in BPM as the user’s rep speed increases.
Cloud Infrastructure 🌐 – We are proud of the fact that we were able to host the entire project on Google Cloud Platform (Cloud Run), making it highly available, scalable, and near market-ready for global users.
Precision Engineering 📐 – We are proud that we were able to reduce the "Spatial Gap" between AI vision and human movement, allowing for a 95% accuracy rate in detecting common form errors like "knee cave" or "rounded spine."
Prompt Architecture 📜 – We are proud of our sophisticated "Thought Signatures"—complex prompt templates that force the AI to output structured JSON for the 3D avatar while maintaining a supportive, high-energy coaching persona.
Accessibility 🌍 – We are proud that by using only a smartphone camera and Gemini’s Vision, we have made elite-level personal training accessible to everyone, regardless of their proximity to a physical gym or an expensive human trainer.
What we learned
Advanced Prompt Engineering📜
Agentic Vision & ADK 🛠️
Real-Time Multimodal Integration 🤖
Spatial Reasoning & Logic 📐
Scalable GCP Deployment ☁️
Human-Centric Design 🕺
Antigravity 🛠️
What's next for Fit Live Frame AI
Google Cloud Vertex AI Vision 🤖 – We would like to integrate Vertex AI Vision's serverless streaming ingestion to handle thousands of simultaneous gym streams globally, reducing latency even further for a "zero-lag" experience.
Android Health Connect 🏥 – We plan to migrate from legacy APIs to Health Connect, allowing users to sync their Fit Live Frame AI precision and calorie data directly with Fitbit, Samsung Health, and Google Health on-device.
ARCore for Jetpack XR 👓 – We would like to bring the 3D human emoji into the real world. By using ARCore, users could wear AI Smart Glasses and see their "Shadow Mentor" standing right next to them in their actual living room or gym.
Multi-User "Squad" Mode 👥 – We would like to create a synchronized training system where a group of friends can workout together, with their 3D avatars competing on a live "Precision Leaderboard" powered by Firebase Realtime Database.
Wear OS Integration ⌚ – We plan to use Health Services on Wear OS to pull high-frequency heart rate and oxygen data from the user’s watch, combining biometric data with Gemini’s vision data for the most accurate fitness report ever created.
Pro-Athlete Benchmarking 🏆 – We would like to introduce a feature where users can "load" the motion-captured data of professional athletes into their 3D emoji, allowing them to train side-by-side with a digital twin of their favorite sports stars.

Log in or sign up for Devpost to join the conversation.