Inspiration
I built a system that uses historical data to predict outcomes, win probabilities, and performance metrics—similar to FiveThirtyEight's sports predictions but tailored for racing.
What it does
Pre-Race Statistics provides interactive tabs with:
- Driver Overview: Sortable driver cards with historical performance, track stats, and simulation projections
- Race Simulation: Monte Carlo predictions showing win probabilities, and finish position distributions The system runs 15,000 Monte Carlo simulations using Bayesian hierarchical modeling to predict race outcomes, stage finishes, and fastest lap winners before the green flag.
How we built it
Backend (Python/Django):
- Bayesian hierarchical model using PyMC with Plackett-Luce ranking for multi-head predictions
- Feature engineering with temporal decay and track similarity weighting, stored in Arrow format for efficient frontend consumption
Frontend (React/TypeScript):
- Tabbed interface with Arquero for client-side data processing and real-time simulation progress tracking
Challenges we ran into
- Model Complexity: Balancing sophistication with efficiency—reduced from 50k to 15k simulations while maintaining convergence
- Data Pipeline: Efficiently loading and processing large simulation datasets (15k simulations × 40 drivers) for frontend consumption
- Caching Strategy: Implementing robust cache invalidation based on model version, race entries, and simulation parameters
Accomplishments that we're proud of
- Predictive Accuracy: Bayesian model captures driver skill, team effects, track-specific performance, and momentum
Performance: Reduced simulation time by 3x (50k → 15k) while maintaining statistical precision
What we learned
Bayesian Modeling: Hierarchical models with proper priors can capture complex interactions while avoiding overfitting
Simulation Efficiency: Convergence analysis showed diminishing returns beyond 15k simulations, enabling faster iteration
Log in or sign up for Devpost to join the conversation.