Inspiration
The world of motorsports analytics is dominated by Formula 1, where billion-dollar budgets and complex pit strategies rule. However, Spec-Series Racing—like the Toyota GR Cup—is a completely different beast. In the GR Cup, every driver races identical Toyota GR86 Cup cars. Victory isn't bought with better engineering; it is earned through racecraft, drafting, and tire management. We were inspired to build a tool that ignores the noise of "car performance" (since all cars are equal) and focuses entirely on the driver-centric variables that actually decide these races: qualifying pressure, draft partnerships, and mistake-free consistency.
What it does
The GR Cup Predictor is a dual-purpose Machine Learning engine:
Outcome Prediction: It predicts the probability of a driver winning a race based on pre-race conditions like Qualifying Position, Track Temperature, and Drafting availability.
Digital Race Engineer: Instead of just giving a percentage, it acts as a strategist. It analyzes the prediction to provide text-based, actionable advice tailored to the specific track type (e.g., "Speed" tracks like Indianapolis vs. "Technical" tracks like VIR). It tells a driver whether to defend the inside line, hunt for a drafting partner, or conserve tires for late-race attrition.
How we built it
We built the solution entirely in Python using Google Colab for accessibility.
Data Simulation: Since public telemetry for the GR Cup is scarce, we built a robust Synthetic Data Generator (generate_gr_cup_data) that creates thousands of realistic race scenarios. This engine simulates physics-based logic, such as the advantage of drafting at Road America or the penalty of poor tire management in the rain.
Machine Learning: We utilized Scikit-learn to implement a Random Forest Classifier. This model was chosen for its ability to handle non-linear relationships between categorical data (Track Type, Weather) and numerical data (Qualifying Position).
Strategy Engine: We wrote a custom logic layer that interprets the model's confidence scores and feature inputs to output human-readable racing strategy commands.
Challenges we ran into
The "Data Desert": Unlike F1, the GR Cup does not have a public API for historical lap times. We had to research the series deeply to understand the physics and rules (e.g., no pit stops, 45-minute sprints) to create a synthetic dataset that was actually statistically representative of real racing.
Quantifying "Drafting": Drafting is a physical phenomenon that is hard to capture in a spreadsheet. We had to engineer a specific feature (Draft_Partner) and weight it heavily only on "Speed" tracks to accurately reflect how a GR86 behaves at high velocity.
Balancing the Model: Initially, the model over-weighted Qualifying. In spec racing, qualifying is huge, but not everything. We had to tune the synthetic generation to ensure that "Tire Management" allowed for late-race comebacks, making the predictions less deterministic.
Accomplishments that we're proud of
Context-Aware Strategy: We didn't just build a calculator; we built a coach. We are proud that the model gives different advice for a P3 start at a technical track (Attack) versus a P3 start at a speed track (Draft/Patience).
Realistic Simulation: The model successfully captures the "chaos" of wet weather racing, correctly identifying that tire management skill becomes the dominant factor over raw speed when it rains.
Portability: The entire solution runs in a single Google Colab cell, making it accessible to any sim racer, engineer, or fan without complex installation.
What we learned
Feature Engineering is King: In a spec series where the hardware is identical, the subtle features (Track Temp, Draft Availability) matter far more than they do in other racing series.
The Psychology of Racing: Translating a probability (85% win chance) into human advice ("Defend the inside") required us to understand the psychology of the driver, not just the math of the car.
Synthetic Data Utility: We learned that well-constructed synthetic data based on domain knowledge can be a powerful proxy when real-world data is unavailable.
What's next for GR Cup Predictor
Real Telemetry Integration: We plan to allow users to upload CSV files from MoTeC (the data system used in real GR86 Cup cars) to replace our synthetic data with real-world lap traces.
Computer Vision: Implementing a module to analyze onboard video footage to automatically detect drafting opportunities or racing lines.
Live Weather API: Integrating a real-time weather API to pull track temperature and precipitation data for the specific circuit on race day.
Built With
- and-handle-label-encoding-for-categorical-variables-(like-weather-and-track-type).-pandas:-used-for-data-manipulation
- and-run-the-solution-without-local-installation.-scikit-learn-(sklearn):-the-core-machine-learning-library-used-to-build-the-random-forest-classifier
- creating-the-structured-dataset
- googlecolab
- host
- matplotlib
- numpy
- pandas
- perform-train/test-splits
- python
- scikit-learn
- seaborn
Log in or sign up for Devpost to join the conversation.