The Problem
$336 million is riding on World Cup 2026 prediction markets. Analysts cite these odds. Journalists report them. Fantasy players draft from them. Data scientists use them as ground truth.
But the market is systematically wrong ā not randomly, not slightly, but directionally and measurably wrong in the same way about the same teams every time.
We identified three distinct biases:
- Prestige Tax ā 8 elite nations overpriced because the crowd pays for reputation, not results. France sits at 16.35% market odds. The football data says 2.91%. That gap is:
$$\Delta P = P_{market} - P_{model} = 16.35\% - 2.91\% = 13.44\%$$
Dark Horse Discount ā 17 teams with genuine statistical merit trading at near-zero odds simply because they lack brand recognition
Host Nation Discount ā USA, Mexico, and Canada underpriced despite measurable home advantage in a 48-team format
What We Built
OddsAutopsy ā a live prediction market calibration engine that compares market implied probabilities against a statistical reference model built purely from football data.
The model scores each team using:
$$\text{model score} = (xG \times 0.5) + (\text{pedigree} \times 0.35) + (\text{host bonus} \times 0.15 \times 10)$$
Probabilities are then normalized across all 50 teams:
$$P_{model}(team_i) = \frac{\text{score}i}{\sum{j=1}^{50} \text{score}_j} \times 100$$
The calibration gap for each team is then:
$$\text{edge}i = P{model}(team_i) - P_{market}(team_i)$$
Positive edge = market underpricing. Negative edge = market overpricing.
What We Deployed
This is not a notebook. This is a production system:
Live App ā oddsautopsy.hub.zerve.cloud
- Interactive leaderboard of all 50 teams ranked by mispricing
- Filter by bias type: Prestige Tax, Dark Horse, Host Nation
- Visual calibration map ā market vs model for every team
- Team deep dive with plain English verdict for every country
Live API ā oddsautopsy.hub.zerve.cloud/api
GET /api/summaryā full market bias statisticsGET /api/calibrationā all 50 teams ranked by edgeGET /api/calibration/{team}ā single team deep diveGET /api/overpricedā all overpriced teamsGET /api/underpricedā all underpriced teams
What Inspired Us
Everyone predicts who wins the World Cup. We wanted to ask a harder question ā is the market doing its job correctly?
Prediction markets are supposed to be the gold standard of collective intelligence. With $336M in volume they feel authoritative. But collective intelligence breaks down when the crowd is emotional rather than analytical ā and football fans are nothing if not emotional about their favorite nations.
The inspiration was simple: what if you treated the prediction market itself as the subject of analysis rather than the source of truth?
How We Built It
Built entirely inside Zerve ā from first prompt to production deployment ā in under 24 hours.
| Block | What it does |
|---|---|
| Block 1 | Live Polymarket API ingestion ā 50 teams, real odds |
| Block 2 | Statistical reference model ā xG, pedigree, host bonus |
| Block 3 | Calibration engine ā edge computed for every team |
| Block 4 | Static matplotlib dashboard |
| Block 5 | Interactive Plotly dashboard saved as HTML |
| Block 6 | Deployed as live Dash app with public URL |
| Block 7 | Flask API routes added ā 6 live endpoints |
The Zerve agent wrote the core pipeline from natural language prompts, debugged errors autonomously, and enabled deployment without leaving the platform.
Challenges We Faced
Market structure complexity ā Polymarket uses a negRisk categorical structure where each team is a separate Yes/No market. Parsing implied probabilities correctly required understanding that the Yes price IS the implied probability, not a raw odds figure.
Model gap interpretation ā Large calibration gaps (France at -13.44%) required careful framing. The finding is not that France will lose ā it is that the market over-concentrates probability on prestige names, which creates a measurable and systematic bias.
Deployment limits ā Free plan allows only 1 active deployment.
Solved by embedding Flask API routes directly into the Dash server
using @app.server.route() ā giving us both a visual app and a
REST API from a single deployment.
What We Learned
The prediction market is not a neutral signal. It is a reflection of crowd psychology ā and crowd psychology in sports is heavily biased toward famous names and against unknown ones.
The most important technical lesson: Zerve's agent dramatically accelerates the path from question to deployed production system. What would have taken days of setup, debugging, and DevOps was compressed into hours by describing intent in plain English and letting the agent handle execution.
What's Next
- Connect to live Polymarket API for real-time odds refresh
- Add match-level calibration as the tournament begins June 11
- Track calibration drift ā does the market get more accurate as kickoff approaches?
- Expand to individual match markets across all 104 games
- Build a calibration score for Polymarket accuracy over time
Built With
- Zerve (agent, notebook, deployment)
- Python (pandas, plotly, dash, flask)
- Polymarket API
- FBref football statistics
- Dash + Flask for app and API

Log in or sign up for Devpost to join the conversation.