Inspiration
Scouting and player development in MLB require massive investments, yet predicting a player’s long-term potential remains challenging. We were inspired by the idea of leveraging machine learning and cloud-based AI solutions to provide a data-driven approach to talent evaluation. By analyzing historical and real-time player performance data, we aim to enhance the decision-making process for teams, scouts, and analysts.
What It Does
Our project, MLB Prospect AI, predicts a player's future OPS (On-base Plus Slugging) based on historical performance data and real-time updates. Using machine learning models, we analyze key metrics such as batting average, home runs, strikeouts, and stolen bases to generate predictive insights. The platform offers:
- Individual player projections for future OPS
- Comparative analysis with historical MLB players
- Scouting insights to support team decision-making
How We Built It
Data Collection & Processing
- Aggregated MLB player statistics from multiple seasons via APIs.
- Cleaned and preprocessed data to handle missing values and inconsistencies.
- Aggregated MLB player statistics from multiple seasons via APIs.
Feature Engineering
- Extracted key performance indicators (KPIs).
- Created derived metrics like isolated power (ISO), walk-to-strikeout ratio (BB/K), and expected OPS.
- Extracted key performance indicators (KPIs).
Modeling & Training
- Trained multiple machine learning models: Random Forest, XGBoost, Linear Regression, and a Neural Network (MLP).
- Leveraged Google Cloud Vertex AI to train and compare performance.
- Trained multiple machine learning models: Random Forest, XGBoost, Linear Regression, and a Neural Network (MLP).
Evaluation & Deployment
- Compared model performance using Mean Squared Error (MSE) and R² Score.
- Deployed a prototype to generate 2024 season predictions for MLB prospects.
- Compared model performance using Mean Squared Error (MSE) and R² Score.
Challenges We Ran Into
- API Rate Limits: Fetching large amounts of real-time data was slow and required batching techniques.
- Missing Data Handling: Some historical data had gaps, requiring imputation techniques.
- Model Performance: Achieving high accuracy in OPS prediction was challenging due to the variability in player performance.
- Vertex AI Integration: Setting up permissions and authentication for Google Cloud services required troubleshooting.
Accomplishments That We're Proud Of
✅ Successfully aggregated and processed thousands of player records across multiple seasons.
✅ Trained and deployed multiple ML models for comparison, optimizing predictive accuracy.
✅ Built a fully functional pipeline that scales with Google Cloud Vertex AI.
✅ Generated OPS predictions for the 2024 MLB season, providing insights for scouting and player development.
What We Learned
🎯 Feature Engineering Matters: Carefully selected metrics like BB/K ratio and ISO significantly impacted model accuracy.
🚀 Cloud AI Scaling: Using Google Cloud Vertex AI enabled more efficient model training and deployment.
📊 Interpreting ML Results: Understanding how model predictions align with real-world baseball performance is crucial for practical application.
🔧 Troubleshooting Google Cloud Issues: Managing permissions and API access was a key learning experience.
What’s Next for MLB Prospect AI
🔹 Expand Features: Incorporate defensive metrics and advanced sabermetrics for a holistic player evaluation.
🔹 Real-time Predictions: Implement live tracking of player stats to update forecasts dynamically.
🔹 Visualization Dashboard: Develop an interactive web app to visualize player comparisons and career projections.
🔹 Team-Specific Insights: Offer tailored recommendations for MLB teams based on their roster needs and player development strategies.
This project demonstrates how machine learning and cloud AI can revolutionize MLB player scouting and performance forecasting, bringing data-driven decision-making to the next level. 🚀⚾
Built With
- google-cloud
- google-colab
- mlb-stats-api
- pandas
- python
- scikit-learn
- tensorflow/keras
- vertex
- xgboost
Log in or sign up for Devpost to join the conversation.