Inspiration

For me, this project is the culmination of a lifelong passion. I love soccer, and I have always dreamed of building intelligent systems that can decode the beautiful game. Last year, i built a project called Expected Pressing Success (xPS), which analyzed defensive strategies and pressing intensity using data analytics.

This work was presented at several venues, and you can see the depth of that research in my previous pieces here: Data Science at Penn State - xPS Analysis

Pressure Analysis & Sports Analytics

What it does

AI Director’s Box is an autonomous production truck in the cloud. It transforms raw, unedited sports footage into a professional broadcast experience. The system: 1) Analyzes video in real-time to identify tactical events and atmosphere.

2) Narrates the action with persona-driven, emotional commentary.

3) Visualizes complex plays using dynamic tactical overlays.

4) Directs a cohesive production by interleaving media layers autonomously.

5) Synthesizes the entire match into a beautifully illustrated "Storybook" recap.

How we built it

We architected the system as a collaborative multi-agent production environment hosted on Google Cloud: -> The Analyst (Gemini 1.5 Pro): Acts as the video room, processing multimodal inputs to extract tactical state.

-> The Commentator (Gemini 1.5 Flash): Generates narrative scripts, which are synthesized into high-cadence speech via Google Cloud Text-to-Speech.

-> The Director (Orchestrator): Uses Gemini's interleaved output capabilities to coordinate video, audio, and Mermaid.js overlays.

-> The Storyteller (Gemini 2.5 Flash): Compiles the match history and uses Vertex AI Imagen 3 to generate cinematic recaps.

Challenges we ran into

-> Managing Production Latency: Generating high-quality TTS and analyzing video frames introduces natural lag. We overcame this by implementing a Buffer-and-Sync strategy, where "anticipatory" buildup commentary is fired early to mask the processing time needed for highlight clipping.

-> Media Persistence on Cloud Run: Moving to a containerized environment meant dealing with ephemeral filesystems. We had to integrate Google Cloud Storage (GCS) mid-development to ensure that AI-generated assets survived server restarts and were shareable.

-> Multimodal Synchronization: Ensuring the "Director" triggered overlays at the exact visual climax required significant prompt tuning and strict JSON schema enforcement.

Accomplishments that we're proud of

-> Full-Stack Automation: We built a unified deployment pipeline that sets up GCS buckets, builds containers, and deploys everything with a single script.

-> Zero-Latency Feel: Despite complex processing, the application feels like a live broadcast due to our innovative audio orchestration.

-> Interactive Storybooks: The post-match recap isn't just a summary; it's a dynamic social-ready archive of the game's legend.

What we learned

We learned that Interleaved Multimodality is the key to creating "human" AI agents. Gemini 1.5 doesn't just see video; it understands context and timing. We also learned that the most robust AI applications are those that treat infrastructure (like GCS and Cloud Run) as first-class citizens in the agentic flow.

What's next for The AI Director's Box

In the future, we plan to: -> Multiple Persona Channels: Allow users to toggle between different commentary styles (e.g., "Hype" vs. "Tactical Analysis") in real-time.

-> Direct YouTube/Twitch Streaming: Integrate RTMP streaming to output the produced broadcast directly to major platforms.

-> Advanced 3D Replays: Use Gemini to generate 3D spatial reconstructions of key goals based on multiple camera angles.

Built With

Share this project:

Updates