Inspiration
While watching professional tennis commentary, we realised we could extend this to everyone in general. Not only does this allows them to engage with sports gameplay regardless of ability level, but it could also have a profound impact from an accessibility perspective.
What it does
This generates an AI commentary of user inputted tennis gameplay, based on the presets they have selected to customise the style and tone of the commentator.
- We provide a multimodal method of input: both a direct upload and a YouTube video link can be analysed.
- The method of input is simple and intuitive.
- A final layer of customisation allows the user to tailor the commentator's style to better suit their needs.
How we built it
We split the problem up into individual, self-contained sections which we each solved in a pair.
- Vision - a tracker for each entity of interest based on their absolute position on the screen in each frame.
- In particular, we track the players, ball and boundary box.
- Logic - normalise the position of objects relative to the court and create events that fire when the appropriate movement is detected in a collection of frames.
- The Physics of the Game: Detecting bounces, net-crosses, and shots.
- Player Intent: Recognizing when a player is "out of position" or "attacking the net".
- Processing - this list of events is passed as a JSON file to Claude, which creates a transcript of the events happening on screen. Finally, ElevenLabs is used to synthesise the voice narrated on screen.
- Frontend - React & Flask were used to create a minimalist UI that is intuitive for the user. Customisation options also appear on this screen that changes the tone of the commentator.
Creating a robust specification allowed up to consider the inputs and outputs in each section and allowed us to merge them seamlessly in the final product.
Challenges we ran into
Vision: Tracking the movement of objects was initially challenging, especially when the ball got far away from the camera, or when it was behind the net. To remedy this, we tracked the location around the ball to get a better estimate on where the ball was likely to be so that we could better search in those pixels. This improved our tracking from being jittery to smooth flowing, which the logic team could better use.
Logic: Initially, recognising events using the absolute position of pixels on the screen was difficult, so we implemented a solution using planar homomorphism to normalise the position of the entities in relation to the playing field. We applied corrections to minimise the impact of parallax on processing, enabling us to fire events appropriately.
Accomplishments that we're proud of
We're proud of the social impact that this project can have on the amateur sporting world. Everyone will now be able to generate professional-grade commentary on their games, which can be stylised to the individual's preferences.
In particular, this can enable the 2.2 billion people that are visually impaired to engage better with sports without a need for a sighted companion.
What we learned
Under time pressure, we learned to delegate tasks between team members and quickly plan and implement suggestions with a robust specification so the layers could be developed independently.
Managing git effectively was key to our success in merging all the components together at the end.
What's next for Ball Knowledge
This project serves as a framework which can be scaled into multiple different sports. In the future, we could also get the AI to critique and suggest improvements in the style of play.
As a member of Imperial Taekwondo, this could be used to analyse sparring footage using YOLOv8-pose, with a similar method of generating events in order to detect weaknesses in kicking style and prepare better for competitions.
Log in or sign up for Devpost to join the conversation.