Coach Aura: The Gemini-Powered Multimodal Fitness Coach

💡 Inspiration

This winter, I set a "stretch" goal: a sub-7-minute 2000m row. As I approach 40, hitting that mark—the rowing equivalent of a 6-minute mile—is a brutal test of physiology and technique. My previous best was a 7:42 plateau. To break through, I turned to Gemini.

The results were transformative. Gemini didn't just provide a static plan; it provided data-driven technical optimization:

  • Technical Tuning: I optimized my Concept2 drag factor to 125 based on specific drag settings.
  • Physiological Awareness: I used heart rate monitoring to avoid "engine overheating," protecting my nervous system from burnout.
  • Adaptive Recovery: When I reported lower back soreness or knee issues, Gemini pivoted my training in real-time to include core activation and stability work.

But a true coach needs eyes and ears. I realized the future of fitness isn't a PDF—it’s a live agent that sees your form, hears your breathing, and adjusts your environment (like your Spotify BPM) in the moment.

🚀 What it does

Coach Aura is a Gemini-powered live agent designed to bridge the gap between high-level athletic programming and real-time execution. For this MVP, we focused on the Handstand Protocol.

The agent:

  • Analyzes Form: Leverages Gemini’s multimodal capabilities to watch the user’s alignment and provide instant verbal cues.
  • Biometric Integration: Syncs with heart rate monitors to ensure the user is in the optimal zone for neurological learning.
  • Adaptive Programming: Dynamically updates the workout based on the user's fatigue levels, reported pain, or progress speed.
  • Atmosphere Control: Automatically adjusts Spotify playback to match the intensity of the current set.

🛠️ How we built it

Coach Aura was built using the Gemini Live API and Google AI Studio. By leveraging Gemini's native multimodality, I created a loop where video frames and biometric telemetry are processed simultaneously. The frontend handles real-time streams, allowing for a low-latency "Live" coaching experience that reacts as the user moves.

🚧 Challenges we ran into

The biggest challenge was scope. Initially, I wanted to solve every fitness modality at once. However, I realized that for an AI coach to be effective, it needs deep "domain expertise." I pivoted to a niche—the handstand—to refine a specific feedback protocol. This allowed me to perfect the prompt engineering required for technical form analysis before scaling to other movements.

🏆 Accomplishments that we're proud of

  • Hardware Sync: Successfully bridged a heart rate monitor and Spotify API with the Gemini agent's logic.
  • Real-time Feedback: Achieving a latency low enough that the agent can tell you to "tuck your ribs" while you are actually upside down.
  • Functional Prototype: Moving from a conceptual "chat" to a working tool that reacts to physical movement.

🧠 What we learned

The power of these models isn't just in their knowledge, but in their ability to act as a reasoning engine for unstructured data (like a video of a wobbly handstand). I learned that AI can shorten the feedback loop of physical learning by providing the "external cue" usually reserved for expensive 1-on-1 coaching.

⏩ What's next for Coach Aura

The long-term vision for Coach Aura is to create the "Matrix-style" learning moment for physical skills. What is the "I know Kung Fu" moment (when Neo uploads the KungFu program and suddenly knows KungFu)? I want to explore how AI can accelerate "proprioception"—the sense of where your body is in space. Future iterations will include:

  • Multi-angle analysis: Using multiple camera feeds for 3D form correction.
  • Predictive Fatigue Modeling: Warning users of potential injury before it happens based on subtle changes in movement velocity.
  • The Skill Marketplace: A platform where elite athletes can "digitize" their coaching logic into Coach Aura protocols.

Eventually I'd like to close the gap on the time it takes to learn a new physical skill. Can I take this a step further and integrate this type of training to brain-computer interface? What happens at the neural level when someone learns things through muscle memory and how can we accelerate this learning?

Built With

Share this project:

Updates