PROBLEM: Billions of people watch sports for entertainment, but the majority can't even follow along with what's happening. Broadcasting platforms do a terrible job at onboarding new fans to the sport's rules. They're focused on promoting the culture, growing the fan base, and making money.

With FIFA 2026 kicking off, along with NBA finals and the MLB is having its busiest season yet, people around the globe are getting more inclined to join a watch party with their friends, but who wants to kill the fun, asking questions after every single play?

SOLUTION: An expandable overlay that captures your screen in real time, intelligently selects key moments, and uses multimodal AI to answer any question about the game in both text and audio formats at the user's discretion.

BUILD DETAILS: We used OpenCV to analyze consecutive grayscale frames using cv2.absdiff to measure differences between consecutive grayscale frames and detect motion, while cv2.compareHist quantifies visual similarity to filter out redundant frames. Selected key frames are compiled into short MP4 clips using cv2.VideoWriter, reducing token usage and improving the efficiency of Gemini's analysis.

  • Real-time screen analysis: Captures user's screen at 3fps in the background, and circular frame buffer holds 30s of gameplay for instant context. Real-time frame analysis, motion detection, scene-cut detection, and YOLO pose inference help identify key moments and condenses a 30-second capture window into only the frames that matter, maximizing AI context quality while minimizing token usage.

  • Multimodal Analysis: Google Gemini 2.5 Flash processes the curated video frames and generated context to deliver real time answers about what's happening on screen. System prompt explains sports jargon in plain language.

  • Interactions: Hold to record microphone input with ElevenLabs Speech-to-Text (STT) or type in chat-box, returns a text form output as well as an option to hear the response back with ElevenLabs Text-to-Speech (TTS). Non-distracting transparent overlay that floats over any media on screen. Can be shrunk, expanded, and dragged around to improve viewership and experience.

  • Cross-Platform Compatibility: Native macOS (Quartz capture, NSPanel carrier, CoreAnimation) and Windows (mss, Win32, WinMM) support from a single codebase

Built With

Share this project:

Updates