Inspiration

Sports broadcasts are shot in landscape (16:9), but social media consumption is almost entirely in portrait (9:16). Manually cropping these clips is time-consuming and often results in losing the ball or the main action. We built HighlightHub to automate this conversion by using computer vision to track the ball and key players, ensuring the action stays centered in a vertical crop.

How we built it

The backend is a Python application using FastAPI. For tracking, we used YOLOv8m with high-resolution inference (1280px) to detect the ball and players. We integrated ByteTrack for temporal consistency and a 1D Kalman Filter to maintain tracking during occlusions. The frontend is built with Next.js 16 and React 19, featuring a side-by-side review tool to compare the original broadcast with the AI-generated output.

Technical Details

To handle moments where the ball is blocked by players or moves too fast for detection, we use a linear motion model for path prediction:

$$ x_{k} = x_{k-1} + v_{k-1}\Delta t $$

We also applied Gaussian smoothing to the crop centers to prevent micro-jitters in the camera movement. The smoothing is controlled by a sigma value $\sigma$ that varies by sport:

$$ G(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{x^2}{2\sigma^2}} $$

Core Tracking Logic

The following code snippet shows how we initialize the tracking with sport-aware confidence thresholds:

results = MODEL.track(
    source=input_path,
    persist=True,
    imgsz=1280,
    conf=0.1,  # Lower threshold for ball detection
    tracker="custom_bytetrack.yaml"
)

## Challenges we faced
Tracking a small ball moving at high speed is difficult. Most standard models lose the ball several times per clip. We solved this by using a "coasting" algorithm where the Kalman filter predicts the ball's position for up to 30 frames when detections are missing. We also had to implement "look-ahead" logic so the camera anticipates the direction of the play rather than just reacting to it.

## What we learned
We learned that raw tracking data is too "noisy" for a good viewing experience. We spent significant time implementing velocity caps and dead-zones—regions where the camera stays still unless the ball moves beyond a certain percentage of the frame—to make the output look like it was shot by a professional cameraman.

## Whats Next?
We plan to add support for multi-point tracking (e.g., following both the quarterback and the receiver) and real-time processing using edge-optimized models to allow for instant highlights during live games.

Built With

Share this project:

Updates