AI-Powered Race Strategy Optimizer

Real-time pit strategy, lap-time prediction, and driver performance intelligence using TRD GR Cup data

Inspiration

Motorsports strategy is a high-pressure chess match at 200 km/h. Race engineers must juggle tire wear, fuel load, weather changes, traffic, and driver performance — all in real time. Most amateur and semi-pro racing series don’t have access to the kind of predictive tools used in Formula 1.

The inspiration was simple:

Why shouldn’t GR Cup engineers and drivers have the same level of predictive intelligence that elite motorsport teams use?

The TRD dataset offered a rare opportunity: real lap-level, weather, and telemetry data from multiple races and tracks. That opened the door to building an AI digital twin of the race, capable of predicting tire degradation, optimal pit windows, and even the performance drop-off of a driver over a stint.

What I Learned

Building this project taught me several core concepts in applied ML and motorsports analytics:

1. Lap-Time Modeling Is a Multi-Variable Problem

Lap time isn’t just a function of track and driver. It depends on:

  • laps on tires

  • stint progression

  • track evolution

  • temperature gradients

  • driver consistency variance

  • pit event timing

This turned the problem into a feature engineering challenge more than a modeling challenge.

2. Stint Segmentation Is the Backbone of Strategy

I learned how pit events create natural segments (“stints”) and how degradation curves can be modeled mathematically as:

$$ \text{LapTime}_{t} = \alpha + \beta t + \gamma e^{-\delta t} $$

Which helped approximate both linear and exponential degradation.

3. Telemetry Aggregation Matters More Than Resolution

Instead of raw 1000 Hz telemetry, lap-level aggregated stats (mean/max/std) gave clean signals like:

  • steering smoothness

  • brake consistency

  • stability before corners

These were far more ML-friendly.

4. Weather Alone Doesn’t Explain Pace — But Trends Do

I learned that absolute temperature isn’t as useful as the rate of change:

$$ \Delta T = T_{t} - T_{t-3} $$

This became a key predictive feature.

How I Built It

1. ETL Pipeline: Cleaning and Standardizing the Dataset

I built a full ETL pipeline that converted raw CSVs into compact Parquet files.Steps included:

  • Parsing lap times: converting MM:SS.mmm → seconds

  • Normalizing track and race naming

  • Detecting pit stops using speed drops, lap time spikes, and sector deltas

  • Assigning stint_id and computing laps_on_tires

  • Aggregating telemetry (mean/max/min/std) per lap

  • Aligning weather in 2-minute windows

Final outputs:

  • race_analysis.parquet

  • weather.parquet

  • telemetry_sample.parquet

All clean, consistent, and ML-ready.

2. Feature Engineering

Added high-signal features such as:

  • rolling 3-lap pace

  • stint-relative lap delta

  • degradation slope

  • smoothed weather changes

  • driver consistency index

  • pit time normalization

These features are the core inputs to the predictive models.

3. Predictive Models

I built three models:

A. Lap Time Prediction Model

Uses LightGBM with features:

  • laps_on_tires

  • stint_id

  • weather

  • sector_times

  • driver consistency

Predicts next-lap time in real time.

B. Tire Degradation Model

Fits a degradation curve and estimates:

$$ \text{Remaining Optimal Laps} = \frac{\Delta_{\text{threshold}}}{\beta} $$

C. Pit Window Optimization Engine

Runs 1000+ strategy simulations using Monte Carlo methods:

  • pit on lap t

  • simulate fuel load effects

  • simulate traffic delays

  • simulate degradation

Chooses the pit window with the best projected total race time.

4. Frontend Dashboard

Engineers and drivers get a race-control-style dashboard:

  • real-time predicted vs actual lap times

  • tire wear curve

  • recommended pit window

  • stint comparison

  • driver consistency analysis

  • undercut/overcut probability

Challenges I Faced

1. Pit Detection Was Messy

Real lap times fluctuate, so pit-lap detection needed:

  • speed heuristics

  • sector anomaly detection

  • lap-time outlier suppression

  • merging multiple heuristics

It took multiple passes to get all 27 pit stops correctly identified.

2. Weather Data Was Noisy

Missing track temperatures forced me to:

  • interpolate

  • estimate using air temp gradients

  • derive synthetic features

Surprisingly, these worked well.

3. Telemetry Files Were Huge (17 GB Raw)

I couldn’t process raw data fully within reasonable time, so I:

  • sampled

  • aggregated

  • compressed

This still kept the critical behavioral signals intact.

Built With

Share this project:

Updates