About the Project

Inspiration

We built this project because there is a huge gap between personal training and working out alone. Personal trainers can be really helpful, but they are expensive and not always realistic for everyone. On the other hand, workout videos and fitness apps are convenient, but they are passive. They can tell you what to do, but they cannot actually watch you, respond in real time, or adjust to what is happening during your workout.

We wanted to build something in between those two extremes: a live AI fitness coach that lets people keep the flexibility and independence of solo workouts while still getting feedback in the moment. The idea was to make fitness guidance feel more accessible, interactive, and personal without needing a trainer physically there.

What We Built

We built a real-time AI fitness coach that can watch a user through the camera, track workout activity, count reps, check whether the user is properly in frame, and respond conversationally in real time.

Instead of feeling like a basic chatbot or a passive workout app, we wanted it to feel more like a live coach that is aware of what the user is doing. The goal was not just to answer questions, but to create an experience where the system could actually react to movement, speak at the right moments, and make the workout feel more guided.

How We Built It

We built the project using Google AI models and Google Cloud.

On the interaction side, we used Gemini Live to power the real-time conversational experience. That gave us the live agent layer, where the system could respond naturally while the workout was happening instead of waiting for a normal text-based back-and-forth.

On the activity side, we connected live workout signals such as rep count, frame awareness, and form-related events into the application. The system takes those signals and decides what is important enough to surface to the agent. That way, the agent is not constantly reacting to every tiny input, but instead responds when it is actually useful.

We also used Google Cloud to support the backend side of the project and make the system feel more like a real deployed product rather than just a local prototype.

At a high level, the project works like this:

The user starts a workout in front of the camera.
The system tracks movement and workout state in real time.
Important events, like rep milestones or frame issues, are sent into the app logic.
Gemini Live handles the conversational coaching layer.
The user can ask questions, get feedback, and interact with the coach naturally while exercising.

Challenges We Faced

One of the biggest challenges was dealing with noisy live movement data. In a real workout, people move around, drift out of frame, change pace, or perform reps inconsistently. When too much raw data was pushed directly into the agent, the experience became messy and sometimes glitchy.

That forced us to rethink the design. We realized that a good live agent is not the one that talks the most. It is the one that knows when to talk. So instead of sending everything into the agent, we moved toward a more selective event-driven design where the system mainly responds when the user asks for feedback or when something important happens, like a rep goal being reached or a major tracking issue.

Another challenge was making the system feel live and helpful without becoming distracting. In fitness, timing matters. Feedback has to come at the right moment or it starts to feel like interruption instead of coaching.

What We Learned

The biggest thing we learned is that building a live AI experience is not just about model capability. It is also about timing, restraint, and good system design.

We learned that:

more data is not always better
real-time experiences need filtering and prioritization
a live agent feels smarter when it is selective
multimodal systems become much more convincing when they respond to real context instead of just generating text

We also learned a lot about how to design around the difference between a demo and a usable product. It is one thing to make something look impressive for a few seconds, and another thing to make it feel stable, natural, and helpful during an actual interaction.

Why This Project Matters

What makes this project exciting to us is that it shows how AI can move beyond passive chat and become part of a real interactive experience. Instead of just giving information, the system can observe, respond, and support the user in the moment.

For fitness specifically, that creates a middle ground that does not really exist in a strong way today: something more responsive than a workout app, but more accessible than traditional one-on-one training.

Final Thoughts

This project started as an idea around making workouts feel less passive, but it turned into a broader exploration of what live multimodal AI can actually do. We are proud of how much more interactive and aware the experience became, and we think it points toward a future where AI systems are not just tools you type at, but assistants that can genuinely participate in what you are doing.