Inspiration
Formula 1 is one of the most data-rich sports in the world, yet most fans only see final race results. We wanted to go deeper — using real lap time and race data to predict driver performance and give fans (and teams) a new way to understand what's actually happening on track
What it does
How we built it
We built a Python backend using pandas for data processing and scikit-learn's Logistic Regression to train a binary classifier on historical lap time and race result data. The model uses four engineered features — average lap time, best lap time, lap consistency (std deviation), and total laps — to predict top-10 finish probability. We exposed the model through a Flask REST API with CORS support and connected it to a React frontend dashboard.
Challenges we ran into
We ran into problems with data quality — many driver-round combinations had only a single recorded lap, which made the standard deviation feature undefined (NaN), crashing the model training. We also had to carefully align lap time data with race results since the two datasets used different driver identifier formats. Merging them correctly on driver abbreviation, round, and event name took several iterations.
Accomplishments that we're proud of
We're proud of building a fully working ML pipeline end-to-end — from raw CSV data to a live API that a frontend can query in real time. The driver growth slope feature, which detects whether a driver is improving or declining across the season, was a particularly satisfying addition that adds real analytical depth.
What we learned
We learned how to engineer meaningful features from time-series motorsport data, handle real-world data quality issues like NaNs and mismatched identifiers, and deploy a trained model as a REST API that a frontend can consume. We also got a much deeper appreciation for how much goes into a single F1 lap time.
What's next for Throttle
We want to incorporate more features like pit stop strategy, weather conditions, and qualifying times to improve model accuracy. We'd also like to expand from logistic regression to a gradient boosting model for better predictions, and eventually add live race data so the dashboard updates in real time during a Grand Prix weekend.
Built With
- 3dmodle
- css
- fast
- fastapi
- javascript
- machine-learning
- pandas
- python
- react
- sickit-learn
- talwind
- three.js
Log in or sign up for Devpost to join the conversation.