Inspiration

I am a Pirates fan and an aspiring Data Scientist with a career trajectory pretty much in line with the Pirates season the past two-three decades. I wanted to test my hand at some computer vision modeling and designing a solution involving data.

What it does

Generate Statcast Data from Old Videos: Create a tool that extracts fundamental Statcast metrics (e.g., pitch speed, exit velocity) from archival game videos using computer vision.

How we built it

  1. Build Hit Distance Regression Model from Tabular Data & Advanced Feature Engineering
  2. Build Exit Velocity & Launch angle prediction model from video. The goal therefore is to limit the number of frames the Video processing model has to handle since it would only need to see those on which the batter makes contact with the ball. One advantage of this is to reduce computational constraints and potentially allow real time distance traveled forecasts!

Challenges we ran into

Turns out some videos even in 2024 do not have a distance calculated! Also, some players are exceptionally athletic! Able to make within the park home runs through sheer speed. Computational demands exceeded colab notebook capabilities quite often.

Accomplishments that we're proud of

  1. Neural Network Regression Hit Distance Prediction Model Key Innovations:
  2. Extract ball direction from play by play title
  3. Add physics kinematics properties from projectile motion

  4. Spatiotemporal Neural Network Video Exit Velocity & Launch Angle Prediction Model Key Innovations:

  5. Designed frame efficient Video Data Generator for Tensorflow model

  6. Utilized transfer learning on lightweight MobileNetV2

What we learned

Optimized NN model is nearly a 20% performance improvement over the Linear Regression model which has a Mean Squared Error of 275.40. Pure physics model needs more inputs from effects of Air resistance and friction. Video processing models require a LOT of GPU memory.

What's next for Baseball Bill's Home Run Derby Distance Calculator

Possible Improvements:

  • Engineer more features related to game conditions. Ie. Could get time of day, month game is played, field game is played at, temperature, wind speed, etc.
  • More efficient frame extraction (ie. have another meta model that only captures frames when the bat is in the strikezone)
  • Image preprocessing techniques like masking to isolate the baseball and bat or other enhancements for frame quality

Built With

+ 17 more
Share this project:

Updates