Inspiration

The inspiration for AISR stemmed from our passion for sports and the challenge of keeping up with numerous press conferences and interviews. With the ongoing International Cricket World Cups, we regularly watch press conferences and saw a clear use-case that could be solved using AI. We realized that fans and media professionals alike needed a tool to quickly grasp key takeaways and memorable moments without investing hours watching entire videos. This led us to develop a solution that efficiently distills lengthy press conferences into concise summaries and engaging highlight reels.

What it does

AISR is an AI-powered solution that efficiently distills lengthy press conferences into concise summaries and engaging highlights. On a high level, we offer three functionalities. (1) Post Game Recaps (2) Player/Coach Interview (3) Shareable Social Media Content

How we built it

AI Sports Recap is a Streamlit-based application designed to generate video highlights and textual summaries of sports press conferences given a YouTube video link. The app leverages cutting-edge technologies including OpenAI's GPT-4o, Pegasus1 by TwelveLabs, and Docker for seamless integration and deployment.

Features

  1. Highlight Extraction: Identifies and extracts key video segments that are relevant to the user's query.
  2. Summary Generation: Creates a concise summary of the press conference.
  3. Social Media Sharing: Allows users to share the generated summary and highlights on social media platforms.

Challenges we ran into

  1. Latency: As we used videos in our pipeline, the biggest issue we face is the latency associated with Pegasus1 API concerning indexing, uploading, and processing of videos.
  2. Hallucinations: The LLM that we used is a state-of-the-art model, and yet, we saw a lot of hallucinations in the output of the model, thus causing a significant challenge in parsing the output to suit our use case.

Accomplishments that we're proud of

By overcoming these challenges, we develop a powerful tool that not only saves time but also enhances the way sports content is consumed and shared.

What we learned

We gained expertise in natural language processing, speech recognition, and video analysis. We mastered advanced NLP for concise summaries, explored speech-to-emotion algorithms for diverse accents, and optimized video processing for highlight extraction. Efficiently integrating AI models and designing a user-friendly interface were the key challenges. Iterative testing, handling edge cases, and in-team feedback were crucial for refining our tool. These experiences enhanced our technical skills and teamwork, leading to a powerful AI solution for sports content.

What's next for AISR: AI Sports Recap

  1. We attempted to add emotion analysis to enhance summaries by classifying speech signals into nine emotions (e.g., sad, angry, excited, happy) to make them more relatable. However, due to computational limitations and time constraints, we couldn't integrate it. The 500M parameter Huggingface model we used slowed the entire pipeline. In the future, we plan to develop a custom lightweight emotion detector from audio signals.
  2. We will add caching to our pipeline to improve efficiency. Currently, if a user queries the same video again, it takes as long as processing a new video. Caching will enable faster processing and enhance the customer experience.

Built With

Share this project:

Updates