Inspiration

The inspiration for the MLB™ Prospect Predictor comes from the desire to move beyond subjective scouting reports and provide a more data-driven approach to evaluating baseball prospects. While scouts provide invaluable insights, their assessments can be subjective and prone to biases. This project aims to complement traditional scouting with objective, quantitative projections of future performance, empowering teams, analysts, and fans to make more informed decisions. The increasing availability of detailed minor league statistics and advancements in machine learning make this type of predictive platform feasible and potentially highly impactful.

What it does

The MLB™ Prospect Predictor analyzes current performance metrics, compares prospects to historical players with similar trajectories, and uses machine learning algorithms to project future MLB performance. It provides users with a personalized dashboard to track their favorite prospects, receive updates, compare players, and access detailed reports. The platform offers insights into a prospect's potential career WAR, home runs, ERA, and other key statistics, giving users a data-driven perspective on their future in the league.

How we built it

We built the platform using a combination of technologies. Python was the primary language for data analysis and machine learning, leveraging libraries like scikit-learn and TensorFlow/PyTorch. We used Cloud SQL (PostgreSQL) for robust data storage and GCP services like Cloud Functions and App Engine for backend functionality. The front-end was developed with React, providing a dynamic and user-friendly interface. Data visualization was achieved using D3.js and Chart.js, allowing users to easily understand complex data. We integrated publicly available baseball statistics and explored the potential of incorporating commercial scouting data.

Challenges we ran into

One of the biggest challenges was data quality and availability, especially for minor league players. Different leagues collect different statistics, and historical data can be incomplete. Another challenge was feature engineering – identifying the most predictive variables and creating new features from existing ones. Model selection and hyperparameter tuning also required significant effort. Finally, effectively communicating the uncertainty inherent in projections was crucial to avoid overstating the platform's predictive power.

Accomplishments that we're proud of

We are proud of developing a functional platform that integrates diverse data sources and utilizes machine learning to generate meaningful projections. We successfully built a user-friendly dashboard that allows users to easily track and compare prospects. We also made progress in addressing the challenge of data heterogeneity by developing methods to normalize and compare statistics across different leagues.

What we learned

Through this project, we learned the importance of careful data preprocessing and feature engineering in machine learning. We gained experience in building and deploying machine learning models on GCP. We also learned the value of iterative development and user feedback in building a successful product. Finally, we gained a deeper understanding of the complexities of baseball statistics and the challenges of predicting future performance.

What's next for MLB™ Prospect Predictor

Future development could include:

  • Integrating more data sources, such as scouting reports and biomechanical data.
  • Developing more sophisticated machine learning models, potentially incorporating deep learning techniques.
  • Personalizing the platform further based on user preferences and roles (e.g., scout, analyst, fan).
  • Expanding the platform to cover international prospects and amateur players.
  • Adding features for simulating player development and exploring different scenarios.
  • Developing an API to allow other applications to access the projections.

Built With

Share this project:

Updates