WorldCup26AI

Groups Overview — 12 groups × 4 teams with model-projected advance probabilities (P to knockout) from 10K Monte Carlo runs.
Champion Probabilities — top-20 teams by P(Win). Argentina #1 at 26.1% across 10K Monte Carlo tournament simulations.
Authentic-first jersey — Claude surfaces the $180 official on-pitch Brazil kit (not the cheaper Tee), with promo code + Fanatics link.
Fully responsive — same model dashboard works cleanly on mobile for fans on the go.
What-If Simulator — drag teams to set 1st/2nd/3rd/4th order. Medal-colored slots (gold/silver/bronze/gray) auto-update on drag.
Hero — Polymarket -13% wrong about France, +17% wrong about Argentina. +7.0% Brier skill on WC 2018+2022 holdout.

Inspiration

The 2026 FIFA World Cup is the first 48-team tournament in history — a brand-new format with no historical priors. That's exactly the gap where a calibrated AI model can earn an edge over the betting market. Polymarket has $700M in winner-market liquidity, but those prices reflect a thin set of opinion-makers, not a model trained on every men's international since 1872. We saw an opportunity: build a transparent probability engine for the World Cup, surface where the market is wrong, let football fans interrogate it in their own language — and turn that insight into something they can actually act on.

What it does

WorldCup26AI is a Dixon-Coles bivariate-Poisson model fit on 49,287 men's international matches since 1872, run through 10,000 Monte Carlo tournament simulations, and live-compared against Polymarket winner markets for every team in the field.

Champion probabilities — full 48-team chart with stage-reach breakdown (R32 → Final).
Mispricing leaderboard — ranked by |edge| × √(liquidity), every row backed by a data-citable reason.
What-If Simulator — drag teams into any 1st / 2nd / 3rd / 4th finishing order in any group, rerun 500–5,000 Monte Carlos conditional on your picks, watch the bracket and dumbbell chart respond in real time. Only groups you actually drag get locked (🔒); the rest stay automatic so the simulator shows real conditional shifts rather than restating the baseline.
Ask the Model — a Claude-powered chat that answers in the user's UI language and grounds every claim in the model's own data. Claude can autonomously call tools to surface real Fanatics-licensed merchandise — Authentic kit first (~$130–180), with honest fallback to Replica (~$90) or Tee (~$35) when the premium tier isn't in stock.
Team Explorer, Schedule, Stage Reach, Calibration, Methodology — every number traceable back to the source.
5 languages baked in from day one: English / 中文 / Español / Português / Français.

Real partnership unlocked mid-hackathon: Fanatics × WorldCup26AI

We applied to the Fanatics Affiliate Program through Impact.com mid-hackathon. They approved us (Publisher ID 7225697), giving live access to the global Fanatics catalogue (~1.5M SKUs across 8 storefronts). Filtered down to 1,812 World Cup-relevant SKUs across 34 teams, used as the live source for Claude's shoppable product cards.

This turned the project from "analytics demo" into "analytics → real commerce." The model doesn't just show you Argentina is +17.1pp UNDER-priced vs. Polymarket at 26.1% to win — it can offer you the official $184.99 Authentic Messi #10 jersey to back the call. Single chat turn: probability + market edge + shoppable kit, all grounded in the same source of truth.

How I built it

Data: scraped men's international results since 1872 plus FIFA squad lists for the 48 confirmed teams. Total parquet ~7 MB, base64-embedded into the deployable Streamlit bundle for portability.
Model: Dixon-Coles bivariate Poisson with low-score correction, time-decay weighting (recent matches > 2014 results). Calibration verified on 2018 + 2022 World Cups (128 matches, +7.0% Brier skill vs. uniform baseline).
Simulation: 10,000 Monte Carlo tournaments — full 48-team group stage + 32-team knockout — checkpointed to fit AWS Lambda's 15-minute timeout on Zerve.
Live market: Polymarket Gamma API polled for 43-of-48 winner markets (≥$500K liquidity), overround stripped to recover implied probabilities before computing edges.
Chat: Claude (Anthropic) as the primary analyst with custom tools recommend_team_merch and check_team_merch_pricing that hit the filtered Fanatics product feed. The model autonomously decides when to surface a shoppable card vs. stay purely analytical, and answers in the user's UI language.
UX: Streamlit + custom CSS + streamlit-sortables for the drag-to-rank What-If picker. Medal-colored slots use :nth-child(1..4) so colors auto-update on drag — zero JS, the browser repaints when SortableJS reorders the DOM.
Infra — Zerve serverless: the entire data pipeline ran as composable Zerve notebook blocks (ingest_fanatics → clean_skus → stage_parquet_files), each with its own runtime budget — no AWS console, no Docker, no infra to manage. Zerve's block model let us stage even the 10K Monte Carlo simulations as independent checkpointed runs rather than one monolithic job; we just wrote the math, Zerve handled the orchestration. One-click Streamlit deploy on Zerve hub, versioned on GitHub, public demo on Streamlit Cloud.

Challenges I ran into

48-team format means no historical priors — backtesting had to be done on 32-team WC 2018 + 2022; the 2026 confidence intervals are wider by design, and we surface that honestly rather than pretending otherwise.
Polymarket overround is non-trivial — winner markets sum to >100% (vig); had to strip overround per-market before computing edges or the leaderboard would have been junk.
Impact API caps every catalog at 20 pages × 1000 items — the 21st page returns 400, which had to be caught as "end of catalog" rather than an error. Had to loop across multiple Fanatics catalogs (US, UK, EU storefronts) to assemble enough soccer SKUs.
Streamlit native widgets don't drag — the What-If picker needed streamlit-sortables + :nth-child CSS to give users an intuitive ranking UI without rebuilding from scratch.
i18n purity across LLM prompts — 5 languages × every UI string × system prompts in each language. Found and fixed 4 mid-hackathon mixing bugs (router hint, quick-pick prompts, etc.) where English labels paired with Chinese chat triggers leaked through.
Authentic-first jersey logic — first version recommended whatever was cheapest, surfacing $35 cotton Tees instead of $184 stitched on-pitch kits. Fixed with three layers: Python tier ranking (Authentic > Replica > Jersey > Tee), reversed price tiebreaker (most premium wins within tier), and explicit Claude system-prompt rules to always include authentic in keyword calls unless the user explicitly asks for cheaper.

Accomplishments I'm proud of

+7.0% Brier-skill improvement over uniform baseline on holdout (WC 2018 + 2022, 128 matches).
Closed the loop from model → market → merchandise — a single chat turn surfaces the probability, the Polymarket edge, and a real shoppable Authentic kit.
Mid-hackathon Fanatics partnership approval — went from "I wish I could affiliate" to publisher-ID approved while still building.
Drag-and-drop What-If picker with auto-coloring medals — gold / silver / bronze / gray slots that re-paint instantly when teams are dragged, with zero JS state sync.

What I learned

Calibration > leaderboards. Showing a clean reliability diagram and Brier numbers built more trust than any single dramatic prediction.
Tools are the right abstraction for shopping. Claude calls check_team_merch_pricing(team, keywords="authentic men's Messi") and the tool returns a deterministic, auditable product — much cleaner than asking the LLM to hallucinate URLs.
:nth-child + DOM-reordering libraries is a magic combo. Wrote position-based CSS once; SortableJS's drag handler reorders the DOM and the browser repaints colors for free. The code I didn't write was the best part of this UI.
Three-layer defense for high-stakes recommendations. Python ranking + LLM system prompt + tool-schema description, all carrying the same rule (Authentic first, fall back honestly), so any one failing doesn't drop the user to a $35 Tee when they wanted a $185 jersey.

What's next

Live tournament tracking during June–July 2026: re-fit Dixon-Coles after every match, dynamic Brier scoring vs. our published probabilities.
More catalogs: pull Fanatics FR / ES / DE / IT / CA storefronts for an additional ~5K SKUs and broader regional coverage.
Cross-platform chat — WeChat + WhatsApp + LINE bots: the same model brain wrapped as a native chatbot inside the messaging app fans already use every day. WeChat for Greater China (Mandarin), WhatsApp for Latin America / Iberia / India / Africa, LINE for Japan / Taiwan / Thailand. Football is a global sport; the AI should meet fans wherever they already chat.
Per-tournament endpoints: add the same engine to the 2027 Women's World Cup, AFCON, Copa, Euros — the framework is tournament-agnostic.

Live app: https://wc26apppy-sr8pkhgj4ibnhyhu4gpafq.streamlit.app/
GitHub: https://github.com/AIoOS-67/worldcup26iq
Built for: Zerve AI Hackathon 2026

Built With

anthropic-claude
dixon-coles
fanatics
fastapi
fifa-world-cup-2026
impact-radius
monte-carlo-simulation
pandas
plotly
polymarket
python
sortablejs
sports-analytics
streamlit
zerve

Updates

AIoOS-67 Liao started this project — Apr 29, 2026 10:40 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.