Inspiration
Home fitness is often a lonely and uncertain experience. Most users struggle with two things: the technical risk of improper form and the emotional hurdle of staying motivated. Existing apps are either passive videos that can't see you, or simple counters that don't understand biomechanics.
We were inspired to build Hygiea AI to bridge this gap—not just as a tracker, but as a coaching duo. We wanted to create a "Live" partner that could actually see your movement and speak corrections instantly, while providing a safe space to ask questions and learn.
The Hygiea Duo
Hygiea introduces two distinct AI personalities:
- Sarge (The Training Floor): Powered by the Gemini 2.5 Flash Native Audio (Live API). Sarge provides sub-second, technical audio feedback. He monitors your joint angles via your webcam and tells you to "get deeper" or "lock your core" exactly when you need it.
- Lyra (The Coach's Office): Powered by Gemini 3 Flash. Lyra is the supportive mentor who explains the "why" behind the exercises, helps beginners find their footing, and keeps the athlete encouraged between sessions.
How We Built It
We built Hygiea AI using a modern, high-performance stack designed for low-latency interaction:
- Biomechanical Vision: We integrated MediaPipe’s Pose Landmarker to extract 33 skeletal landmarks in real-time. We then calculated specific joint angles (knee flexion, hip levelness, spine alignment) in the browser.
- The Live Loop: These technical metrics are fed into the Gemini Live API. By sending technical state updates to the model, we achieved sub-second latency for audio feedback, creating a truly responsive "human-in-the-loop" feel.
- Dual-Model Strategy: We used Gemini 2.5 Flash Native Audio for its incredible latency in live sessions and Gemini 3 Flash for the Coach's Office to provide deep, reasoning-based answers to user questions.
- Audio Engineering: We implemented custom raw PCM decoding and scheduling using the Web Audio API to ensure gapless, professional-grade audio playback.
Challenges Faced
The biggest challenge was latency and synchronization. In a workout, a correction that arrives 5 seconds late is useless. We spent significant time optimizing the data flow from MediaPipe to the Gemini Live session, ensuring that Sarge’s voice triggers exactly as a form error occurs. Another challenge was the UI/UX balance: creating an interface that feels "high-intensity" during a workout but "calm and supportive" during a consultation.
What We Learned
We learned that the future of AI isn't just "chatting"—it's multimodal agency. By giving Gemini "eyes" (via pose coordinates) and a "voice" (via Live Audio), we transformed a static web app into a living coach. We also learned that personality matters; having two distinct AI characters made the experience feel more like a real fitness facility and less like a piece of software.
Built With
- core:-react
- dicebear
- mediadevices-api-(camera/mic)-design:-lucide-react-icons
- tailwind-css-ai-models:-gemini-2.5-flash-native-audio-preview-12-2025-(live-coaching)-gemini-3-flash-preview-(mentor-q&a)-computer-vision:-mediapipe-pose-landmarker-(google-ai-edge)-sdks:-@google/genai-apis:-web-audio-api-(raw-pcm-processing)
- typescript
Log in or sign up for Devpost to join the conversation.