Project Name: coach.ai
Elevator Pitch
An intelligent, real-time basketball shooting coach that uses computer vision to analyze your form and Generative AI to provide instant, encouraging voice feedback—just like a real trainer.
Inspiration
We love basketball, but improving your shooting form is hard without an expert watching you. Personal trainers are expensive, and filming yourself to watch later doesn't give you the instant correction you need to build muscle memory. We wanted to democratize elite coaching by building an AI that watches you play and speaks to you in real-time, helping you fix your form before the bad habits set in.
What it does
coach.ai is a full-stack application that turns your webcam or phone camera into a professional shooting coach.
- Real-time Analysis: It streams video to our backend where we use computer vision to track key body joints (elbow, shoulder, knee, wrist).
- Form Scoring: It calculates shooting angles and biomechanics to give you a "Form Score" (0-100) on every shot.
- Instant Feedback: If your elbow is flying out or your knees aren't bending enough, it detects the error immediately.
- AI Voice Coach: Instead of robotic alerts, we use "Google Gemini 2.5 Flash" to generate a "coach personality" that gives specific, encouraging advice (e.g., "Tuck that elbow in, you'll get more power!"). This text is instantly converted to speech using "Eleven Labs" and streamed back to you.
How we built it
We built a high-performance, low-latency pipeline to make the experience feel "live":
- Frontend: Built with "React" and "TypeScript", using HTML5 Canvas for efficient video frame capture and rendering. We implemented a custom WebSocket client to stream binary video data and receive JSON analysis results.
- Backend: A FastAPI server that orchestrates the entire pipeline.
- Machine Learning: We communicate with a specialized "Basketball Coach ML" service to extract pose landmarks and detect the ball.
- Generative AI Pipeline:
- Reasoning: We filter raw technical data (e.g., "Elbow angle 110°") and pass it to "Google Gemini 2.5 Flash". We prompted Gemini to act as a supportive but firm basketball coach.
- Voice: The generated text is streamed to "Eleven Labs" via WebSocket for ultra-low-latency Text-to-Speech (TTS).
- Optimization: We implemented debouncing (2.5s delay) and caching to prevent the coach from talking over itself or repeating the same advice too often.
Challenges we ran into
- Making The Model work with Websockets: The model initially took just an uploaded video and processed it and generated output video, but inorder to make the coach system real-time, we had to get the video frames into the model at intervals and process them instead of full vidoes at once.
- Latency: The biggest hurdle was the lag between a player moving and hearing the feedback. If the coach speaks 5 seconds later, the moment is gone. We solved this by using WebSockets for everything (video up, audio down) and choosing the fastest models available (Gemini Flash + Eleven Labs Turbo).
- Audio Synchronization: Managing the audio queue on the frontend so messages wouldn't overlap or pile up was tricky. We built a custom queue system in React to play messages sequentially.
Accomplishments that we're proud of
- Sub-1.5 Second Latency: Achieving a near-instant feedback loop where the AI "sees" a mistake and speaks to you almost immediately.
- Personality: The "Coach" genuinely feels like a person watching you, thanks to the dynamic vocabulary of the LLM.
- Architecture: A clean, modular architecture that separates the ML analysis, the API server, and the Generative AI services, allowing us to scale or swap components easily.
What we learned
- Async Python: We deepened our knowledge of
asyncioin Python to handle multiple WebSocket connections simultaneously (User <-> Server <-> ML Model <-> Eleven Labs). - Multimodal AI: We learned how to bridge the gap between numerical data (joint angles) and natural language generation (Gemini) to create a "Multimodal" experience.
What's next for coach.ai
- Shot Tracking: integrating a "Made/Missed" detector to correlate form changes with actual shooting percentage.
- 3D Pose Analysis: Moving from 2D to 3D pose estimation for more accurate depth analysis.
- Gamification: Adding challenges ("Make 10 shots with perfect form") and leaderboards to compete with friends.
- Mobile App: Porting the frontend to React Native for easier use on the court.
Built With
- 2.5
- eleven-labs
- fastapi
- flash)
- google-gemini
- mediapipe
- python
- react
- typescript
- websockets
- yolov8
Log in or sign up for Devpost to join the conversation.