The Problem

$336 million is riding on World Cup 2026 prediction markets. Analysts cite these odds. Journalists report them. Fantasy players draft from them. Data scientists use them as ground truth.

But the market is systematically wrong — not randomly, not slightly, but directionally and measurably wrong in the same way about the same teams every time.

We identified three distinct biases:

  • Prestige Tax — 8 elite nations overpriced because the crowd pays for reputation, not results. France sits at 16.35% market odds. The football data says 2.91%. That gap is:

$$\Delta P = P_{market} - P_{model} = 16.35\% - 2.91\% = 13.44\%$$

  • Dark Horse Discount — 17 teams with genuine statistical merit trading at near-zero odds simply because they lack brand recognition

  • Host Nation Discount — USA, Mexico, and Canada underpriced despite measurable home advantage in a 48-team format


What We Built

OddsAutopsy — a live prediction market calibration engine that compares market implied probabilities against a statistical reference model built purely from football data.

The model scores each team using:

$$\text{model score} = (xG \times 0.5) + (\text{pedigree} \times 0.35) + (\text{host bonus} \times 0.15 \times 10)$$

Probabilities are then normalized across all 50 teams:

$$P_{model}(team_i) = \frac{\text{score}i}{\sum{j=1}^{50} \text{score}_j} \times 100$$

The calibration gap for each team is then:

$$\text{edge}i = P{model}(team_i) - P_{market}(team_i)$$

Positive edge = market underpricing. Negative edge = market overpricing.


What We Deployed

This is not a notebook. This is a production system:

Live App → oddsautopsy.hub.zerve.cloud

  • Interactive leaderboard of all 50 teams ranked by mispricing
  • Filter by bias type: Prestige Tax, Dark Horse, Host Nation
  • Visual calibration map — market vs model for every team
  • Team deep dive with plain English verdict for every country

Live API → oddsautopsy.hub.zerve.cloud/api

  • GET /api/summary — full market bias statistics
  • GET /api/calibration — all 50 teams ranked by edge
  • GET /api/calibration/{team} — single team deep dive
  • GET /api/overpriced — all overpriced teams
  • GET /api/underpriced — all underpriced teams

What Inspired Us

Everyone predicts who wins the World Cup. We wanted to ask a harder question — is the market doing its job correctly?

Prediction markets are supposed to be the gold standard of collective intelligence. With $336M in volume they feel authoritative. But collective intelligence breaks down when the crowd is emotional rather than analytical — and football fans are nothing if not emotional about their favorite nations.

The inspiration was simple: what if you treated the prediction market itself as the subject of analysis rather than the source of truth?


How We Built It

Built entirely inside Zerve — from first prompt to production deployment — in under 24 hours.

Block What it does
Block 1 Live Polymarket API ingestion — 50 teams, real odds
Block 2 Statistical reference model — xG, pedigree, host bonus
Block 3 Calibration engine — edge computed for every team
Block 4 Static matplotlib dashboard
Block 5 Interactive Plotly dashboard saved as HTML
Block 6 Deployed as live Dash app with public URL
Block 7 Flask API routes added — 6 live endpoints

The Zerve agent wrote the core pipeline from natural language prompts, debugged errors autonomously, and enabled deployment without leaving the platform.


Challenges We Faced

Market structure complexity — Polymarket uses a negRisk categorical structure where each team is a separate Yes/No market. Parsing implied probabilities correctly required understanding that the Yes price IS the implied probability, not a raw odds figure.

Model gap interpretation — Large calibration gaps (France at -13.44%) required careful framing. The finding is not that France will lose — it is that the market over-concentrates probability on prestige names, which creates a measurable and systematic bias.

Deployment limits — Free plan allows only 1 active deployment. Solved by embedding Flask API routes directly into the Dash server using @app.server.route() — giving us both a visual app and a REST API from a single deployment.


What We Learned

The prediction market is not a neutral signal. It is a reflection of crowd psychology — and crowd psychology in sports is heavily biased toward famous names and against unknown ones.

The most important technical lesson: Zerve's agent dramatically accelerates the path from question to deployed production system. What would have taken days of setup, debugging, and DevOps was compressed into hours by describing intent in plain English and letting the agent handle execution.


What's Next

  • Connect to live Polymarket API for real-time odds refresh
  • Add match-level calibration as the tournament begins June 11
  • Track calibration drift — does the market get more accurate as kickoff approaches?
  • Expand to individual match markets across all 104 games
  • Build a calibration score for Polymarket accuracy over time

Built With

  • Zerve (agent, notebook, deployment)
  • Python (pandas, plotly, dash, flask)
  • Polymarket API
  • FBref football statistics
  • Dash + Flask for app and API

Built With

Share this project:

Updates