Predicting the Running Elite

A sample output of the predicted 10k time over a training block after the user givens providing running metrics.

Inspiration

I really enjoy running and follow the professional world of running, so I was curious what measurable things make those pros elite.

What it does

Most of the code is data analysis about the running metrics and their effects on running performance. It evaluates which metrics are the most important in predicting personal bests and overall elite status. There are also a few models that can predict these items. The main interactive program simply takes these models, asks for some accessible information from the user, and determines the user's running abilities as well as a short training plan to follow to see fitness improvements.

How I built it

I took the data analysis and machine learning methods I know, and I did pretty much whatever I could to locate trends in the data. Once I started developing a decent model to predict certain running capabilities, I tried to use real science and running methodologies to provide a simple user experience that can aid future running performance.

Challenges I ran into

I could not find a significant amount of open-source data related to measurable running biomechanics and performance, so I had to find any trends with the one available source. I also had to link the data-driven model to the user experience. A typical runner will not know most of the metrics seen in the data, so I had to limit the models to only use basic information.

Accomplishments that I'm proud of

I feel like I thoroughly explored some of the trends in the data and evaluated what metrics are actually important in determining (distance) running abilities. Also, I applied my data science knowledge to design a simple, yet solid user-driven application.

What I learned

Documenting code while going through the project is smarter than trying to remember everything and provide comments later. Also, in the real world, developing data-driven and science-based models are difficult because access to quality data is not a given, and comprehensive, effective models are difficult to produce.

What's next for Predicting the Running Elite

The model should be adapted so that the user metric of running economy is the standard VO2 max. A more interesting model would also come from more data with more features: The dataset I used had only a few quantitative variables, and some of them were likely not as useful as metrics such as average force or power.

Credit

Burns, Geoffrey T.; Tam, Nicholas; Santos-Concejero, Jordan; Tucker, Ross; Zernicke, Ronald F. (2023). DataSheet1_Assessing spring-mass similarity in elite and recreational runners.ZIP. Frontiers. Dataset. https://doi.org/10.3389/fphys.2023.1224459.s001 Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Built With

Updates

j-introne Introne started this project — Apr 20, 2024 09:28 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.