AI-Powered Race Strategy Optimizer
Real-time pit strategy, lap-time prediction, and driver performance intelligence using TRD GR Cup data
Inspiration
Motorsports strategy is a high-pressure chess match at 200 km/h. Race engineers must juggle tire wear, fuel load, weather changes, traffic, and driver performance — all in real time. Most amateur and semi-pro racing series don’t have access to the kind of predictive tools used in Formula 1.
The inspiration was simple:
Why shouldn’t GR Cup engineers and drivers have the same level of predictive intelligence that elite motorsport teams use?
The TRD dataset offered a rare opportunity: real lap-level, weather, and telemetry data from multiple races and tracks. That opened the door to building an AI digital twin of the race, capable of predicting tire degradation, optimal pit windows, and even the performance drop-off of a driver over a stint.
What I Learned
Building this project taught me several core concepts in applied ML and motorsports analytics:
1. Lap-Time Modeling Is a Multi-Variable Problem
Lap time isn’t just a function of track and driver. It depends on:
laps on tires
stint progression
track evolution
temperature gradients
driver consistency variance
pit event timing
This turned the problem into a feature engineering challenge more than a modeling challenge.
2. Stint Segmentation Is the Backbone of Strategy
I learned how pit events create natural segments (“stints”) and how degradation curves can be modeled mathematically as:
$$ \text{LapTime}_{t} = \alpha + \beta t + \gamma e^{-\delta t} $$
Which helped approximate both linear and exponential degradation.
3. Telemetry Aggregation Matters More Than Resolution
Instead of raw 1000 Hz telemetry, lap-level aggregated stats (mean/max/std) gave clean signals like:
steering smoothness
brake consistency
stability before corners
These were far more ML-friendly.
4. Weather Alone Doesn’t Explain Pace — But Trends Do
I learned that absolute temperature isn’t as useful as the rate of change:
$$ \Delta T = T_{t} - T_{t-3} $$
This became a key predictive feature.
How I Built It
1. ETL Pipeline: Cleaning and Standardizing the Dataset
I built a full ETL pipeline that converted raw CSVs into compact Parquet files.Steps included:
Parsing lap times: converting MM:SS.mmm → seconds
Normalizing track and race naming
Detecting pit stops using speed drops, lap time spikes, and sector deltas
Assigning stint_id and computing laps_on_tires
Aggregating telemetry (mean/max/min/std) per lap
Aligning weather in 2-minute windows
Final outputs:
race_analysis.parquet
weather.parquet
telemetry_sample.parquet
All clean, consistent, and ML-ready.
2. Feature Engineering
Added high-signal features such as:
rolling 3-lap pace
stint-relative lap delta
degradation slope
smoothed weather changes
driver consistency index
pit time normalization
These features are the core inputs to the predictive models.
3. Predictive Models
I built three models:
A. Lap Time Prediction Model
Uses LightGBM with features:
laps_on_tires
stint_id
weather
sector_times
driver consistency
Predicts next-lap time in real time.
B. Tire Degradation Model
Fits a degradation curve and estimates:
$$ \text{Remaining Optimal Laps} = \frac{\Delta_{\text{threshold}}}{\beta} $$
C. Pit Window Optimization Engine
Runs 1000+ strategy simulations using Monte Carlo methods:
pit on lap t
simulate fuel load effects
simulate traffic delays
simulate degradation
Chooses the pit window with the best projected total race time.
4. Frontend Dashboard
Engineers and drivers get a race-control-style dashboard:
real-time predicted vs actual lap times
tire wear curve
recommended pit window
stint comparison
driver consistency analysis
undercut/overcut probability
Challenges I Faced
1. Pit Detection Was Messy
Real lap times fluctuate, so pit-lap detection needed:
speed heuristics
sector anomaly detection
lap-time outlier suppression
merging multiple heuristics
It took multiple passes to get all 27 pit stops correctly identified.
2. Weather Data Was Noisy
Missing track temperatures forced me to:
interpolate
estimate using air temp gradients
derive synthetic features
Surprisingly, these worked well.
3. Telemetry Files Were Huge (17 GB Raw)
I couldn’t process raw data fully within reasonable time, so I:
sampled
aggregated
compressed
This still kept the critical behavioral signals intact.
Built With
- docker
- fastapi
- google-cloud-run
- lightgbm)
- pandas
- parquet
- python
- react-+-typescript
- recharts
- scikit-learn
- tailwind-css
- xgboost
Log in or sign up for Devpost to join the conversation.