Inspiration
The fitness industry often lacks accessible, personalized coaching that focuses on proper form and technique. Many people struggle with exercises at home without guidance, leading to poor form, lack of motivation, and potential injuries. We were inspired to make safe and effective exercise available for everyone with expert guidance in real-time.
What it does
FitWise is an AI-powered fitness trainer that provides real-time form correction and personalized coaching during workouts. Using your webcam, the system counts reps, analyzes form accuracy, and gives instant feedback. Users can interact with "Coach Mike," an AI voice assistant powered by Google's Gemini, who provides motivation, answers questions, and offers technique tips - all through natural speech conversation using ElevenLabs' voice technology.
How we built it
Frontend: Next.js 15 with React, Tailwind CSS, and Socket.io for real-time communication
Backend: Python with Google MediaPipe for pose detection, running on a dual-threading system to handle simultaneous video processing and Socket.io connections
AI Integration:
- Google Gemini 2.0 Flash for conversational AI responses
- ElevenLabs for speech-to-text recognition and text-to-speech synthesis with a custom fitness coach voice
Architecture: Real-time pose landmark detection feeds into our form analysis algorithm, which calculates rep counts and accuracy scores, then streams feedback to the frontend via WebSockets
Challenges we ran into
- Audio format compatibility: Browser-recorded audio formats (WebM, OGG) had to be converted to formats compatible with speech recognition APIs, requiring server-side audio processing with FFmpeg
- Real-time performance: Balancing MediaPipe's processing load with Socket.io's event loop required careful threading architecture to prevent frame drops
- State synchronization: Managing workout state across camera feeds, AI responses, and audio playback without conflicts
- Cross-browser audio handling: Different browsers support different MediaRecorder MIME types, requiring dynamic format detection
Accomplishments that we're proud of
- Built a fully functional computer vision system that accurately detects and counts reps across 30 different exercises
- Created a seamless voice interaction system where users can naturally talk to their AI coach during workouts
- Implemented real-time form feedback with minimal latency using dual-threaded Python backend
- Designed an intuitive UI that works smoothly even with multiple simultaneous data streams (video, audio, WebSocket events)
- Successfully integrated three complex AI systems (MediaPipe, Gemini, ElevenLabs) into one cohesive application
What we learned
- Advanced real-time video processing techniques and optimization strategies for MediaPipe
- Complex audio pipeline management across different formats and APIs
- Effective prompt engineering for creating consistent AI personality (Coach Mike)
- WebSocket architecture for low-latency bidirectional communication
- The importance of graceful degradation when handling multiple API dependencies
- Browser API limitations and workarounds for MediaRecorder and audio playback
What's next for FitWise
- Workout programs: Pre-built routines and progressive training plans
- Social features: Share workouts, compete with friends, community challenges
- Advanced analytics: Detailed form breakdown, progress tracking over time, injury risk assessment
- Mobile app: Native iOS/Android apps for better performance and offline capability
- Wearable integration: Connect with fitness trackers for heart rate and calorie data
- Multi-user sessions: Group workouts with synchronized feedback
- Exercise library expansion: Add yoga, pilates, and sport-specific training modules

Log in or sign up for Devpost to join the conversation.