Inspiration
Most sports content gets watched vertically now. TikTok, Reels, Shorts. But almost every sports camera is still shooting landscape, which leaves creators with two annoying options: center-crop and lose half the action, or sit in Premiere keyframing pan-and-scan by hand. We figured a decent CV pipeline could just do this for you.
What it does
You hand Supersonics a landscape sports clip, tell it what sport it is, and it spits out a 9:16 vertical version that follows the ball on its own. The virtual camera pans and tilts the way a human editor would. Drop it into a reel and it looks like someone shot it that way.
How we built it
It's a five-stage pipeline. Each stage writes to a JSONL file on disk. Sounds overkill but it's the reason three of us could work on different stages at the same time without stepping on each other.
YOLO11 via Ultralytics finds the ball and players every frame. A greedy IoU tracker stitches those per-frame detections into tracks. A target estimator picks one "where should the camera look right now" point per frame, using the center of the ball if we see it, a forward prediction from recent ball velocity if we don't, or the centroid of the visible players as a last resort. A camera controller turns that stream of points into a smooth 9:16 crop box with speed and acceleration caps on the virtual camera. Finally a renderer applies the crop and ffmpeg encodes it out.
Basketball and football needed really different tuning. Basketball: tight caps (70 px/frame speed, 8 px/frame² accel) because the action is short-range and bouncy. Football: much looser (180 and 28) so the camera can keep up with a 60-yard pass.
Challenges we ran into
All the interesting problems showed up once we ran on real clips.
YOLO drops the ball constantly on dunks. It's small, moving fast, usually occluded by the rim or a hand. For 6 to 10 frames at a time we just don't see it, and a naive camera drifts off to the centroid of the players and loses the hoop. Fix: ball-carrier tracking. While the ball is visible, we remember which player it's closest to. When it disappears we follow that player until it comes back. The Harvey dunk clip went from 25% ball-in-crop to 62% after that landed.
The phantom velocity bug is the one we're most proud of catching. On long football passes the crop would sit pinned against the frame edge for five or six frames after the ball had clearly changed direction. We were storing the velocity wrong. We saved the attempted delta each frame, not the achieved delta after the crop got clamped at the frame boundary. So clamped frames were silently accumulating momentum that didn't actually exist. A four line fix. Couple hours of frame by frame traces and a lot of print() to find it.
And then there was the more obvious lesson that one set of parameters was never going to work for two sports. That ate more time than we thought it would.
Accomplishments that we're proud of
Basketball end to end ball in crop went from 70% to 74%. The number is less exciting than where the gain is. It's almost all on dunks and fast breaks, which are exactly the clips where a bad camera is most visible. Easy clips were already fine. We moved the hard ones.
The whole pipeline is streaming. Nothing loads a full video into RAM, so it works on full games, not just short highlight clips.
We also built a side-by-side demo page showing the landscape input next to the vertical output for every clip in our test set, so judges don't have to take our word for any of this.
What we learned
We started the camera controller with a Kalman filter because that's the textbook answer for smoothing a noisy signal. It added hundreds of pixels of visible lag. The camera was always a half second behind the ball on fast plays. We threw it out and replaced it with a 20 line block that just caps speed and acceleration, and the result was simultaneously smoother and more responsive. Not what we expected.
The JSONL-between-every-stage thing sounded like overkill early on. It turned out to be the reason we could iterate on the camera without re-running detection for the 30th time, and the reason three of us could actually work in parallel.
Most of the real work was in the last 10%. A rough pipeline came together quickly. Making it actually look good under a 21-hour deadline took the rest of the hackathon.
What's next for Supersonics
Adding more sports is mostly a config exercise. Soccer and hockey should be a new YAML and a few hundred clips to tune on.
We process clips offline right now. The algorithm only uses a handful of frames of lookahead, so with some work it could run live on a broadcast stream.
The camera still fights scene cuts. A transition looks to it like a massive ball jump and it takes a second or two to recover. Detecting cuts explicitly and resetting camera state on each one is the next thing we'd fix.
At scale
Sports media is a volume business (thousands of clips per day), and $2/clip manual conversion adds up fast. Our pipeline is stateless and writes intermediate results to disk, so it fans out trivially: split clips into chunks, run YOLO across GPUs in parallel, merge the outputs. On a single RTX 4090 we process a 60-second clip in about 22 seconds. A small cluster of 8 GPUs could push 1300 clips/hour at roughly $0.003 each on spot instances.
Log in or sign up for Devpost to join the conversation.