Inspiration
We play a lot of recreational basketball with our friends and thought that it would be super cool to have some type of commentary over our games. Our goal was to build an agent that could be like a commentator for our rec games.
What it does
Court Vision captures a live feed of a basketball game, analyzes the visual content using Claude, and produces dynamic audio commentary using ElevenLabs. It mimics an NBA-style announcer by generating commentary that's both timely and relevant to the gameplay, streamed out loud during the game.
How we built it
- Screenshot capture: We use PIL.ImageGrab to continuously capture a portion of the screen showing the basketball court.
- Visual reasoning: The captured frame is encoded in base64 and passed to Claude 3 Sonnet via Anthropic’s API. A structured prompt provides context like player descriptions and current commentary state.
- Commentary queue: The agent determines whether to add a new comment, skip, or reset the queue. It avoids stale or irrelevant commentary using explicit logic.
- Audio rendering: Commentary is streamed via ElevenLabs with configurable voice and tone settings. We built a queueing system to handle multiple comments smoothly with overrides for urgent plays.
Challenges we ran into
- Getting it to respond quickly enough to plays as they were happening was super hard
Built With
- 11labs
- claude
- python
Log in or sign up for Devpost to join the conversation.