Inspiration

The world of motorsports analytics is dominated by Formula 1, where billion-dollar budgets and complex pit strategies rule. However, Spec-Series Racing—like the Toyota GR Cup—is a completely different beast. In the GR Cup, every driver races identical Toyota GR86 Cup cars. Victory isn't bought with better engineering; it is earned through racecraft, drafting, and tire management. We were inspired to build a tool that ignores the noise of "car performance" (since all cars are equal) and focuses entirely on the driver-centric variables that actually decide these races: qualifying pressure, draft partnerships, and mistake-free consistency.

What it does

The GR Cup Predictor is a dual-purpose Machine Learning engine:

Outcome Prediction: It predicts the probability of a driver winning a race based on pre-race conditions like Qualifying Position, Track Temperature, and Drafting availability.

Digital Race Engineer: Instead of just giving a percentage, it acts as a strategist. It analyzes the prediction to provide text-based, actionable advice tailored to the specific track type (e.g., "Speed" tracks like Indianapolis vs. "Technical" tracks like VIR). It tells a driver whether to defend the inside line, hunt for a drafting partner, or conserve tires for late-race attrition.

How we built it

We built the solution entirely in Python using Google Colab for accessibility.

Data Simulation: Since public telemetry for the GR Cup is scarce, we built a robust Synthetic Data Generator (generate_gr_cup_data) that creates thousands of realistic race scenarios. This engine simulates physics-based logic, such as the advantage of drafting at Road America or the penalty of poor tire management in the rain.

Machine Learning: We utilized Scikit-learn to implement a Random Forest Classifier. This model was chosen for its ability to handle non-linear relationships between categorical data (Track Type, Weather) and numerical data (Qualifying Position).

Strategy Engine: We wrote a custom logic layer that interprets the model's confidence scores and feature inputs to output human-readable racing strategy commands.

Challenges we ran into

The "Data Desert": Unlike F1, the GR Cup does not have a public API for historical lap times. We had to research the series deeply to understand the physics and rules (e.g., no pit stops, 45-minute sprints) to create a synthetic dataset that was actually statistically representative of real racing.

Quantifying "Drafting": Drafting is a physical phenomenon that is hard to capture in a spreadsheet. We had to engineer a specific feature (Draft_Partner) and weight it heavily only on "Speed" tracks to accurately reflect how a GR86 behaves at high velocity.

Balancing the Model: Initially, the model over-weighted Qualifying. In spec racing, qualifying is huge, but not everything. We had to tune the synthetic generation to ensure that "Tire Management" allowed for late-race comebacks, making the predictions less deterministic.

Accomplishments that we're proud of

Context-Aware Strategy: We didn't just build a calculator; we built a coach. We are proud that the model gives different advice for a P3 start at a technical track (Attack) versus a P3 start at a speed track (Draft/Patience).

Realistic Simulation: The model successfully captures the "chaos" of wet weather racing, correctly identifying that tire management skill becomes the dominant factor over raw speed when it rains.

Portability: The entire solution runs in a single Google Colab cell, making it accessible to any sim racer, engineer, or fan without complex installation.

What we learned

Feature Engineering is King: In a spec series where the hardware is identical, the subtle features (Track Temp, Draft Availability) matter far more than they do in other racing series.

The Psychology of Racing: Translating a probability (85% win chance) into human advice ("Defend the inside") required us to understand the psychology of the driver, not just the math of the car.

Synthetic Data Utility: We learned that well-constructed synthetic data based on domain knowledge can be a powerful proxy when real-world data is unavailable.

What's next for GR Cup Predictor

Real Telemetry Integration: We plan to allow users to upload CSV files from MoTeC (the data system used in real GR86 Cup cars) to replace our synthetic data with real-world lap traces.

Computer Vision: Implementing a module to analyze onboard video footage to automatically detect drafting opportunities or racing lines.

Live Weather API: Integrating a real-time weather API to pull track temperature and precipitation data for the specific circuit on race day.

Built With

  • and-handle-label-encoding-for-categorical-variables-(like-weather-and-track-type).-pandas:-used-for-data-manipulation
  • and-run-the-solution-without-local-installation.-scikit-learn-(sklearn):-the-core-machine-learning-library-used-to-build-the-random-forest-classifier
  • creating-the-structured-dataset
  • googlecolab
  • host
  • matplotlib
  • numpy
  • pandas
  • perform-train/test-splits
  • python
  • scikit-learn
  • seaborn
Share this project:

Updates