Inspiration
The 2026 FIFA World Cup will be the first to feature 48 teams, dramatically expanding the competitive landscape. While this makes the tournament more exciting, it also exposes a core weakness in many prediction systems: they are forced to guess, even when information is limited, teams are unfamiliar, or match context is noisy.
What inspired this project was a simple question:
Can a prediction system be honest enough to say “I don’t know” — and be more useful because of it?
Rather than optimizing for a single headline accuracy number, I wanted to build a system that behaves like a cautious human analyst: confident when signals are strong, restrained when uncertainty is high, and transparent about how its beliefs compare to the betting market.
How We Built It
The project is structured as a probability-first decision system, implemented as a stacking ensemble that combines complementary models and signals.
1. Feature engineering with common-sense signals and market expectations
All features are computed strictly before kickoff to avoid leakage. Instead of relying on technical football details, the features are based on intuitive ideas that apply to any competitive event:
- Team strength over time: how strong each team has been historically, updated as teams improve or decline
- Match importance: high-stakes games behave differently from exhibition matches
- Past matchups: whether two teams have faced each other before, and how balanced those games were
- Rest and readiness: how long each team had to recover since their previous match
- Context: whether the match is played on neutral ground or gives one side an advantage
- Market expectations: fair implied probabilities derived from bookmaker odds that I scraped myself, representing collective market belief
The odds-based features are treated as inputs, not labels. They capture how the market prices each outcome before kickoff and are fed directly into the machine-learning model alongside football signals.
Together, these features describe who is strong, how prepared they are, how important the match is, and what the market expects, using only information available before the match starts.
2. Core models: XGBoost and Dixon–Coles
At the core of the system are two complementary models, each solving a different part of the problem.
First, an XGBoost classifier serves as the main predictive engine. It learns how football signals and market expectations from scraped odds interact to produce win, draw, and loss probabilities. XGBoost is well-suited here because it handles nonlinear relationships and complex interactions between team strength, context, rest, and odds.
However, football has one outcome that behaves differently: draws. Draws usually come from close, low-scoring games, where small changes matter more than overall strength.
To explicitly model this behavior, I use a Dixon–Coles score model as a second core model. Instead of predicting outcomes directly, Dixon–Coles estimates how many goals each team is likely to score. From these expected goals, it derives probabilities for each scoreline—especially low-scoring draws like 0–0 or 1–1.
This gives the system a goal-based understanding of match balance, which complements the pattern-learning strength of XGBoost.
3. Era-aware stacking and time-safe learning
International football evolves over time. Rather than training a single model on all history, I trained multiple era-specific models (e.g., post-1950, post-1990, post-2010).
These models—both XGBoost and Dixon–Coles—act as base learners in a time-based stacking framework. Using expanding-window training, the stacked model learns:
- when to trust recent data
- when longer history provides stability
- how to balance machine-learning predictions, score-based probabilities, and market signals
This design is especially important in a 48-team World Cup, where teams vary widely in data availability.
4. Odds as both signal and benchmark
The scraped odds play a dual role:
- As features, they inform XGBoost about market expectations before kickoff
- As a benchmark, they provide a strong baseline for evaluating whether the model adds value beyond the market
This ensures the system learns with the market while still being evaluated against it.
What I Learned
The most important lesson was that confidence matters more than coverage.
By allowing the system to abstain when uncertainty is high, accuracy rises sharply as confidence thresholds increase while coverage decreases. This reframes prediction from “guess every match” to “act only when the signal is strong.”
I also learned the value of stacking diverse signals: machine learning captures complex patterns, Dixon–Coles models match structure, and odds reflect collective market belief. Together, they produce more reliable probabilities than any single approach.
Challenges
- Modeling draws realistically: Treating draws as first-class outcomes required explicitly modeling goal-scoring, not just classification
- Balancing signals: Integrating XGBoost, Dixon–Coles, and odds features without letting market information dominate
- Maintaining time realism: Enforcing strict time-based stacking to avoid optimistic evaluation
Closing Thought
FIFA Predictions: When Not to Bet is not about beating the market on every match. It is about building a stacked decision system that combines common-sense signals, machine learning, score-based modeling, and market expectations—and knows when to speak, when to stay silent, and how confident it really is.
Built With
- dixon?coles-model
- hex
- playwright
- python
- scikit-learn
- scrapped-odds-data
- sql
- xgboost
Log in or sign up for Devpost to join the conversation.