Dota Intel
Inspiration
Every sport produces hundreds of hours of broadcast footage, but the moments fans care about are buried in endless VODs. Editors spend days scrubbing footage for highlights, and stat sheets tell you what happened but never how it felt. We wanted to build a system that watches footage like a superfan — reading commentator excitement, on-screen graphics, and crowd energy — then connects those moments to the statistical record. Not just "here's a cool clip," but "here's why it mattered for the match, the player's career, and the tournament." We started with Dota 2 because we love it, but the architecture is sport-agnostic by design.
What it does
Dota Intel is AI-powered highlight discovery tied to structured competition data. The core insight: highlights without context are just clips, but highlights anchored to leaderboards, player stats, and match outcomes become stories.
The system discovers highlight moments from tournament VODs using TwelveLabs' video understanding models and ranks pro players by AI Impact Score — a 0–100 metric blending traditional stats (KDA, GPM, win rate) with AI-extracted signals like commentator excitement and highlight density. Users browse a global leaderboard, drill into player profiles with match histories and curated clips, and perform semantic search across all indexed footage (e.g., "exciting teamfight," "clutch play") to find moments by meaning, not keywords.
The pattern generalizes to any sport — basketball (box scores + broadcast footage), soccer (Opta data + match broadcasts), MMA (fight stats + crowd reactions). The magic isn't the clips or the stats alone — it's the join. When you see a player ranked #1 and can instantly watch the plays that got them there, with AI-measured excitement confirming the crowd went wild, that's fundamentally different from a spreadsheet or a random YouTube compilation.
How we built it
Three layers:
- Ingestion Pipeline (Python): Downloads match segments from Twitch VODs via
yt-dlp, uploads to TwelveLabs, and patches each video with structured match metadata from the OpenDota API. The metadata source swaps out (NBA API, Opta, ESPN) but the pipeline shape stays the same. - AI Highlight Discovery (TwelveLabs Marengo 3.0 + Pegasus 1.2): Marengo searches for candidate moments across visual and audio modalities. Pegasus classifies each clip, extracting a play type, excitement score (0–10), and natural language description. This layer is sport-agnostic — it detects excitement, not Dota-specific events.
- Dashboard (FastAPI + React 19): A "Premium Obsidian" themed frontend with leaderboard, player profiles, and a clip player streaming HLS video directly from TwelveLabs. Thumbnails are captured frame-by-frame using offscreen canvas.
Challenges we ran into
- Rate limiting at scale. TwelveLabs API rate limits would crash the event loop during burst Pegasus calls. We implemented exponential backoff with jitter and restructured discovery to process clips sequentially per player.
- Game-start calibration. Twitch VODs include draft and pre-game — timestamps don't align to game time. We built a "horn calibration" system using Marengo to find the First Blood announcement or game-start horn, then offset all timestamps accordingly. Every sport has this dead-air problem.
- HLS thumbnail capture. We needed frame-accurate thumbnails at specific clip moments. We solved this with offscreen
<video>elements via HLS.js, seeking to the exact second and capturing a canvas frame — lazily loaded on scroll.
Accomplishments we're proud of
- AI Impact Score works. The 35% AI weighting — from commentator excitement and highlight density — surfaces players who feel dominant on screen, not just those with the best stat lines. It catches electric, high-impact players that raw numbers miss.
- Semantic search works across modalities. "Clutch play" returns genuinely clutch moments because Marengo understands what a clutch play looks and sounds like simultaneously. No sport-specific classifiers needed — just multimodal understanding.
- End-to-end automation. From a raw broadcast URL to a ranked, highlight-annotated player profile in a single script. No manual labeling, no human curation. Swap the stats API, point at new footage, same result.
What we learned
- Commentator excitement is a universal signal. Pegasus' excitement scores match what humans rate as "hype" regardless of sport or language — the insight that makes AI Impact Score work and the system generalizable.
- The stat-to-footage join is the product. Neither leaderboards nor clips are novel alone. The value is the connection: seeing a #1 ranking and immediately watching the AI-surfaced plays that earned it, with excitement scores confirming the moments were electric.
- Video AI pipelines need aggressive deduplication. A single kill streak generates 3–4 overlapping clips from different search strategies. Without dedup (10-second overlap threshold, best-excitement-wins), the feed fills with near-duplicates.
What's next
- Multi-sport expansion: Point the same pipeline at NBA League Pass + NBA API, or Premier League + Opta data, and generate the same experience without rewriting the core engine.
- Live tournament mode: Real-time highlight discovery via WebSocket during ongoing matches — plays surfaced within seconds of happening on stream.
- Team and matchup analytics: Aggregate AI Impact Scores at the team level — which teams generate the most highlights? Which matchups produce the most excitement?
- Fan engagement layer: Viewer voting, personal highlight reels, and shareable clips with embedded AI context (rank, excitement score, match stakes).
Built With
- fastapi
- python
- react
- typescript
- vite

Log in or sign up for Devpost to join the conversation.