Project Story — GR Cup Race Intelligence

About the project

This project is a real-time analytics and strategy engine for the GR Cup series. It processes split timing and telemetry data across 7 tracks (14 races total), computes lap times, consistency scores, and pit stop recommendations, and exposes both an interactive Streamlit dashboard (dashboard.py) and self-contained HTML reports (dashboard_report.html, race_report.html). The consolidated export is all_tracks_results.json.


Inspiration

I built this project because I wanted to explore how lightweight data engineering, time-series processing, and simple physics-based modeling can deliver actionable race strategy. Racing provides a compact, high-stakes environment where small insights (e.g., a lap time delta of 0.5s) can meaningfully change decisions. That combination of measurable performance signals, rich telemetry, and real-world impact inspired the project.


What I learned

  • Practical handling of split timing: merging lap starts and lap ends, aligning by vehicle and lap number.
  • Telemetry pivoting: converting long telemetry logs (1M+ rows) into per-lap wide-format summaries that are fast to analyze.
  • Data validation matters: a small bug in filtering allowed 0.001s outliers to be reported as best laps — robust validation prevents this.
  • Building user-friendly deliverables: a Streamlit app for interactive exploration and static HTML reports for judges/stakeholders.
  • Tooling and reproducibility: packaging the full pipeline (process_all_tracks.pyall_tracks_results.jsongenerate_dashboard.py) and documenting it clearly.

How I built it

High-level pipeline:

  1. Load raw CSVs (lap starts, lap ends, telemetry, results, weather).
  2. Merge split timing by vehicle_id and lap.
  3. Compute lap durations and filter realistic values.
  4. Pivot telemetry to per-lap features (speed, throttle, brake, gear).
  5. Compute analytics: best lap, average lap, standard deviation, consistency score.
  6. Predict pit windows using a simple fuel/tire model.
  7. Produce outputs: all_tracks_results.json, race_report.html, dashboard_report.html, and the Streamlit app.

Key code files:

  • Stragery_engine.py — core engine: data loading, lap-time calculation, telemetry pivot, insight generation.
  • process_all_tracks.py — orchestrator that runs every track/race, consolidates results, and performs final verification.
  • generate_dashboard.py — generates dashboard_report.html from the consolidated JSON and validates data before output.
  • dashboard.py — interactive Streamlit UI for deeper exploration.

Mathematically:

  • Lap time for one lap is computed as the difference between end and start timestamps:

[ \Delta t = t_{end} - t_{start} ]

  • Consistency score (0–100) used in the system is:

[ \text{ConsistencyScore} = 100 - \left(\frac{\sigma_{lap}}{\mu_{lap}} \times 100\right) ]

where (\sigma_{lap}) is the standard deviation of lap times and (\mu_{lap}) is the mean lap time.

  • Simple fuel/tire model used to propose a pit window (illustrative):

[ \text{FuelRemaining} = 100 - (\text{current_lap} \times r_{fuel}) ]

[ \text{LapsUntilRefuel} = \frac{\text{FuelRemaining}}{r_{fuel}} ]

The engine uses a small adjustment to propose an optimal pit lap (e.g., (\pm 2) laps around the estimate) taking into account urgency and predicted performance drop.


Challenges faced

  • Data naming conventions: each track had slightly different CSV naming patterns. I implemented cascading filename glob patterns to locate files reliably (e.g., R{N}_{track}_lap_start.csv, {track}_lap_start_time_R{N}.csv, and a generic fallback *lap_start*R{N}*.csv).
  • Outlier lap-time bug: an initial filter allowed values > 0, which picked up invalid splits (0.001s). Fix: require laps to be greater than a realistic minimum (30s) and add multi-stage validation.
  • Scalability of telemetry pivot: converting long telemetry (millions of rows) into a compact wide per-lap summary required careful grouping and memory-aware pivoting.
  • Cross-platform encoding issues: Windows default console encoding required ensuring outputs were encoded in UTF-8 to avoid crashes when printing non-ASCII characters.

Built with

  • Languages: Python 3.13
  • Libraries: pandas, numpy, streamlit
  • Tools: pathlib, datetime, json
  • Output: Static HTML/CSS reports and interactive Streamlit app

Cloud / deployment: The project is self-contained and runs locally — it does not require cloud infrastructure for the submission deliverable. The static HTML reports are fully portable for judges.

Data & scale: ~4.7M telemetry records processed, 3,500+ lap times calculated, 14 races, 349 drivers, and 60 generated insights.


How to run (quick)

  1. Process everything:
python process_all_tracks.py
  1. Generate the HTML dashboard (validates before writing):
python generate_dashboard.py
  1. Optional: run the interactive Streamlit app:
pip install streamlit pandas
streamlit run dashboard.py

What I would build next

  • Add a small scheduler to run incremental updates from live data feeds.
  • Incorporate a lightweight ML model for lap-time prediction per driver/track using historical features.
  • Add test coverage and CI (unit tests for parsing, validation, and metrics).

Acknowledgements

Thanks to the open-source Python ecosystem (pandas, Streamlit) and the GR Cup dataset used for this project.

Built With

  • interactive
  • json
  • numpy
  • pandas
  • python-3.13
  • static-html
  • streamlit-outputs:-json
  • streamlitapp
Share this project:

Updates