AI-Powered Race Strategy Optimizer

Real-time pit strategy, lap-time prediction, and driver performance intelligence using TRD GR Cup data

Inspiration

Motorsports strategy is a high-pressure chess match at 200 km/h. Race engineers must juggle tire wear, fuel load, weather changes, traffic, and driver performance — all in real time. Most amateur and semi-pro racing series don’t have access to the kind of predictive tools used in Formula 1.

The inspiration was simple:

Why shouldn’t GR Cup engineers and drivers have the same level of predictive intelligence that elite motorsport teams use?

The TRD dataset offered a rare opportunity: real lap-level, weather, and telemetry data from multiple races and tracks. That opened the door to building an AI digital twin of the race, capable of predicting tire degradation, optimal pit windows, and even the performance drop-off of a driver over a stint.

What I Learned

Building this project taught me several core concepts in applied ML and motorsports analytics:

1. Lap-Time Modeling Is a Multi-Variable Problem

Lap time isn’t just a function of track and driver. It depends on:

laps on tires
stint progression
track evolution
temperature gradients
driver consistency variance
pit event timing

This turned the problem into a feature engineering challenge more than a modeling challenge.

2. Stint Segmentation Is the Backbone of Strategy

I learned how pit events create natural segments (“stints”) and how degradation curves can be modeled mathematically as:

$$ \text{LapTime}_{t} = \alpha + \beta t + \gamma e^{-\delta t} $$

Which helped approximate both linear and exponential degradation.

3. Telemetry Aggregation Matters More Than Resolution

Instead of raw 1000 Hz telemetry, lap-level aggregated stats (mean/max/std) gave clean signals like:

steering smoothness
brake consistency
stability before corners

These were far more ML-friendly.

4. Weather Alone Doesn’t Explain Pace — But Trends Do

I learned that absolute temperature isn’t as useful as the rate of change:

$$ \Delta T = T_{t} - T_{t-3} $$

This became a key predictive feature.

How I Built It

1. ETL Pipeline: Cleaning and Standardizing the Dataset

I built a full ETL pipeline that converted raw CSVs into compact Parquet files.Steps included:

Parsing lap times: converting MM:SS.mmm → seconds
Normalizing track and race naming
Detecting pit stops using speed drops, lap time spikes, and sector deltas
Assigning stint_id and computing laps_on_tires
Aggregating telemetry (mean/max/min/std) per lap
Aligning weather in 2-minute windows

Final outputs:

race_analysis.parquet
weather.parquet
telemetry_sample.parquet

All clean, consistent, and ML-ready.

2. Feature Engineering

Added high-signal features such as:

rolling 3-lap pace
stint-relative lap delta
degradation slope
smoothed weather changes
driver consistency index
pit time normalization

These features are the core inputs to the predictive models.

3. Predictive Models

I built three models:

A. Lap Time Prediction Model

Uses LightGBM with features:

laps_on_tires
stint_id
weather
sector_times
driver consistency

Predicts next-lap time in real time.

B. Tire Degradation Model

Fits a degradation curve and estimates:

$$ \text{Remaining Optimal Laps} = \frac{\Delta_{\text{threshold}}}{\beta} $$

C. Pit Window Optimization Engine

Runs 1000+ strategy simulations using Monte Carlo methods:

pit on lap t
simulate fuel load effects
simulate traffic delays
simulate degradation

Chooses the pit window with the best projected total race time.

4. Frontend Dashboard

Engineers and drivers get a race-control-style dashboard:

real-time predicted vs actual lap times
tire wear curve
recommended pit window
stint comparison
driver consistency analysis
undercut/overcut probability

Challenges I Faced

1. Pit Detection Was Messy

Real lap times fluctuate, so pit-lap detection needed:

speed heuristics
sector anomaly detection
lap-time outlier suppression
merging multiple heuristics

It took multiple passes to get all 27 pit stops correctly identified.

2. Weather Data Was Noisy

Missing track temperatures forced me to:

interpolate
estimate using air temp gradients
derive synthetic features

Surprisingly, these worked well.

3. Telemetry Files Were Huge (17 GB Raw)

I couldn’t process raw data fully within reasonable time, so I:

sampled
aggregated
compressed

This still kept the critical behavioral signals intact.

Built With

docker
fastapi
google-cloud-run
lightgbm)
pandas
parquet
python
react-+-typescript
recharts
scikit-learn
tailwind-css
xgboost

Updates

Venkatesh K started this project — Nov 24, 2025 03:54 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.