Gridiron Gameview

Inspiration

As students at Georgia Tech, we are surrounded by a culture of elite collegiate athletics and cutting-edge engineering. We saw how the Yellow Jackets and other top-tier programs rely on high-end, expensive motion-capture systems and specialized hardware to gain a competitive edge. Our goal was to democratize this technology. We wanted to lower the barrier to entry for 3D extrapolation, proving that with smart spatial geometry and lightweight ML, you don't need a multi-million dollar stadium install to get professional-grade "Next Gen" stats. We built this for the coaching staffs and analysts who need 3D insights without the 3D price tag.

What it does

Gridiron Gameview ingests multiple 2D video feeds and transforms them into a live-action 3D reconstruction. 3D Mapping: It deconstructs video frames to triangulate player positions in a 3D coordinate system. Skeletal Tracking: It breaks every player down into a 17-point skeletal rig (x, y, z). Identification: A light ML model identifies team affiliation and jersey numbers in real-time. Immersive Viewing: The data is fed into an Unreal engine, allowing users to rotate, zoom, and view the "game" from any angle—including the perspective of any player on the field.

How we built it

Computer Vision: We used OpenCV for frame deconstruction and camera calibration (intrinsic/extrinsic) to handle the geometry of the field.

The ML Stack: We implemented a lightweight ML detector for player bounding boxes and to classify team colors jersey numbers.

Pose Estimation: We utilized a pose estimation library (SMPL) to extract joint data, which we then lifted from 2D to 3D using multi-view triangulation.

Graphics Engine: Utilised Unreal Engine 5 to handle the rendering of the 3D meshes and the virtual stadium environment based of off joint xyz data.

Challenges we ran into

Occlusion: In football, players are constantly huddling or tackling. Maintaining a "lock" on a player’s skeletal joints when they are buried under a pile was our biggest hurdle.

Temporal Sync: Aligning frames from different cameras with millisecond precision was vital; even a one-frame offset caused the 3D models to "jitter" or distort during triangulation.

Accomplishments that we're proud of

Live Triangulation: Successfully converting 2D pixels into accurate $x, y, z$ coordinates that map perfectly to a scale-accurate football field.

What we learned

We also learned the hard way that "clean data" doesn't exist in sports; shadows, turf spray, and similar jersey colors require robust error-correction algorithms rather than just "perfect" ML models.