Inspiration

People watch the same game. But they have different interests within the game. . Another parallel we can draw is: Instagram and TikTok builds a user persona and shows us Reels/ TikToks based on the user persona.

What it does

We understand what a human might want in a game, by building their persona Also, we deeply analyze the game footage and dynamically select the preferred camera view based on user persona and what's going on live.

How we built it

We built a recommendation system that acts like a real-time director using Foundation Models and mathematical mapping.

  1. Visual Perception (TwelveLabs) We don't just tag videos; we align them mathematically. We feed the raw footage from multiple camera angles (4CAM, 2CAM, 5CAM, 11CAM, BREP) into TwelveLabs Marengo. This generates video embeddings for every moment of the game, creating a semantic understanding of the visual feed.

  2. User Persona (Google Gemini) We use Google Gemini to process natural language descriptions of the user (e.g., "I love high-intensity defense and crowd reactions") and generate User Embeddings. This maps the user's abstract preferences into the same vector space as the video.

  3. The Director Agent (Cosine Similarity) As the game progresses, our Agent calculates the Cosine Similarity between the User Persona Embedding and the Video Embeddings of all available camera angles.

The Agent dynamically routes the view to the camera with the highest similarity score for that specific user at that specific second.

Challenges we ran into

Finding candidates for multi-view live footage was a task (that's why we have low-res clips of a game)

Twelvelabs Marengo does not accept videos with low-res, hence, it made video understanding difficult. We found a workaround to feed all-camera combined video to the Marengo model to generate embeddings

Accomplishments that we're proud of

We worked to create shared embedding space between the User Persona Embeddings and Video feed Embeddings find the best camera angle at the right time. Also, we have developed an agent that takes action along as the game progresses. It chooses the right camera based on the cosine similarity.

What we learned

We learned how to make recommendation system that acts like a real-time director with the help of Foundation models.

What's next for Owngoal

We would be further working towards learning the User Persona organically, what are the high-stakes scenes where the user increases the volume, scenes where the user seems to be distracts (Watching Instagram?), etc.

Built With

Share this project:

Updates