Inspiration

We loved Messi winning the last world cup & can't wait for this next one. We wanted to predict the winner by combining WC & your FIFA historical datasets with modern simulation techniques. We have moved past "Paul the Octopus" type simple win/loss predictions and created an engaging simulation game that captures the drama of football—the upsets, & the story.

What it does

This is a comprehensive tournament prediction engine inspired from Chess' ELO style player matching & rankings.

  1. Simulation Modes: According to the player mode selected this simulates the entire 2026 FIFA World Cup. The modes are:

    • "Heavyweights bully"
    • "Homecourt advantage"
    • "Upsets gallore" (where chaos reigns and a new underdog world champion is born).
  2. Realistic Match Generation: Every match includes minute-by-minute goal events, possession stats correlated with Elo differences, and realistic foul/card distributions.

  3. Interactive Visualization: Users can explore the results through a clean Matplotlib tournament bracket or a fully gamified HTML/JS engine with click-to-advance functionality and engaging popups for lineups & player squads.

How we built it

We built the core engine using Python and Pandas, processing over 47,000 historical international matches dating back to 1872 to build a custom Elo rating system.

  • The Engine: We implemented a custom chess inspired EloSimulator class with dynamic K-factors (adjusting for tournament importance) and home-field advantage calculations.

  • Data Pipeline: The architecture loads data, builds historical ratings for 200+ nations, and feeds into a 3-tier fallback system to generate player lineups. If 2022 World Cup squad data is missing, the system intelligently pulls from a historical goalscorers dataset to ensure 31/32 teams have real player names.

  • Visualization: We used Matplotlib for static, publication-ready brackets and injected custom HTML/CSS/JavaScript directly into the notebook to create the interactive "Game Engine" view. Yep - all was done inline within Hex.

Challenges we ran into

  • Data Gaps: Finding current squad lists for all 32 teams was difficult. We had to engineer a 3-tier fallback architecture that prioritizes 2022 World Cup squads, then falls back to historical goalscorer data, ensuring we didn't end up with "Player 1 vs Player 2" scenarios.

  • Serialization Issues: Bridging the gap between the Python backend and the JavaScript frontend was tricky. We had to write a custom NumpyEncoder to handle int64/float64 types and ensure seamless JSON serialization for the game engine.

  • Encoding: Handling thousands of international player names led to severe UTF-8 errors. We had to standardize file encodings (using latin-1) to correctly render names like Müller and José without breaking the visualization.

    Accomplishments that we're proud of

  • The "Gamified" Notebook Experience We are particularly proud of the Interactive HTML Game Engine. We pushed the boundaries of standard data science environments to build a fully responsive, dual-wing tournament bracket with CSS transitions and modal popups—all running entirely within the notebook.

  • Validating the Simulation Logic We successfully proved that our three simulation modes produce mathematically distinct outcomes. Watching the "Upset Heavy" mode statistically dismantle favorites while "House Wins" protected them confirmed that our underlying probability logic was sound and effective.

  • Performance at Scale We optimized the engine to handle significant throughput, processing over 90,000 data points and executing 189 complex match predictions (across all three modes) in mere seconds.

Share this project:

Updates