Inspiration
With the World Cup around the corner, I was genuinely excited to explore what data could say about it. When I came across this hackathon and the Hex platform, it felt like the perfect opportunity to finally build a model that attempts to predict the World Cup outcome. This project was driven purely by curiosity—how close can we get using data, probability, and simulation?
What it does
RoadTo26 uses a Monte Carlo simulation (10,000 runs) to predict the most likely World Cup winner, along with probable semi-finalists and runners-up. Each run simulates the entire tournament, so every execution produces a different outcome—just like real football. The model also supports live re-simulation, allowing users to explore alternate tournament scenarios.
How we built it
The model combines datasets already available on Hex with additional football datasets sourced from Kaggle. These datasets are cleaned, merged, and structured into a single consolidated dataset.
Each team is assigned a strength score based on:
- Recent team form (higher weight)
- Historical World Cup and international performance
Match outcomes are determined using a logistic probability model, similar to Elo-based systems. The probability of Team A defeating Team B is calculated using the rating difference between the two teams:
$$ P(A\ \text{wins}) = \frac{1}{1 + e^{-(R_A - R_B)}} $$
To better reflect real-world uncertainty, controlled randomness is introduced into each match outcome. These probabilities are then used repeatedly across 10,000 Monte Carlo simulations to generate tournament-level predictions.
Challenges we ran into
Long runtime Each Monte Carlo simulation initially took close to 150 minutes, significantly increasing development time and complexity.
Bias in historical data Models trained purely on historical records produced unrealistic results. To address this, data had to be manually reweighted to emphasize current form, increasing build time but improving prediction quality.
Accomplishments that we're proud of
Running large-scale Monte Carlo simulations efficiently Executing this simulation on a local CPU would have been extremely memory-intensive, but Hex made it scalable and manageable.
Producing realistic tournament predictions After multiple iterations and probability adjustments, the model began producing outcomes closely aligned with expert expectations—while also highlighting how biased data can influence predictions.
What we learned
- Data assumptions and weighting strongly influence predictions
- Small probability changes can dramatically affect long-term outcomes
- Monte Carlo simulations are powerful but computationally expensive
What's next for RoadTo26
- Expanding RoadTo26 into an interactive platform where users can run and explore World Cup simulations in real time
- If the predicted champion turns out to be accurate, it will strongly demonstrate how probabilistic models and data-driven approaches can predict complex real-world events
Built With
- hex
Log in or sign up for Devpost to join the conversation.