Inspiration

Scouting and player performance prediction have always been essential for team management and recruitment in professional sports. Traditionally, these decisions relied on human intuition and past game statistics, which can be subjective and inconsistent. Inspired by the growing power of AI & Machine Learning, we aimed to build an automated, data-driven solution that can predict a player’s future performance based on historical trends and advanced analytics.

Our goal was to develop a scalable, accurate, and automated system that could help in projecting career impact, identifying future stars, and optimizing player scouting decisions.


What We Learned

Throughout this project, we explored cutting-edge AI and cloud computing techniques. Some key takeaways include:

  • AutoML vs. LSTM Models: Google’s AutoML provided a quick solution, but LSTM allowed for deeper customization in time-series forecasting.
  • Data Processing & Cloud Pipelines: Cleaning and handling large datasets is as crucial as model accuracy. We built an automated Google Cloud Storage pipeline to continuously fetch, clean, and store player data for real-time analysis.
  • Deploying ML Models: Google Vertex AI made it easier to deploy and scale models, allowing API-based predictions.
  • Full-Stack Development: Using Next.js + Tailwind CSS, we created an interactive and user-friendly web app that delivers insights dynamically.

How We Built the Project

1️⃣ Data Collection & Preprocessing

  • Gathered historical & real-time player stats from Google Cloud Storage.
  • Cleaned the data to remove inconsistencies, format player names, and normalize key attributes.

2️⃣ Machine Learning Models

  • AutoML Regression Model (Google Vertex AI) → Trained to predict WAR based on Exit Velocity, Launch Angle, and Hit Distance.
  • LSTM Time-Series Model (TensorFlow) → Used for long-term career projection based on multi-year WAR trends.

3️⃣ Cloud-Based Data Pipeline

  • Google Cloud Storage: Stores live and historical player data.
  • BigQuery: Processes large datasets for better analytics.
  • Cloud Functions: Automates daily updates to keep predictions fresh.

4️⃣ Model Deployment & API

  • Google Vertex AI Endpoint: Hosted the trained model and allowed API-based real-time predictions.
  • Flask API: Built a backend service to expose ML predictions.
  • Postman Testing: Verified API functionality.

5️⃣ Web Application

  • Next.js + Tailwind CSS: Built an interactive UI for users to explore predictions.
  • GraphQL API: Enhanced data retrieval for faster and optimized queries.
  • CI/CD (GitHub Actions + Kubernetes): Automated deployments for reliability and scalability.

Challenges We Faced

🚧 Handling Large-Scale Data Processing → Optimized data pipeline using BigQuery & Cloud Storage to manage thousands of records.
🚧 Ensuring Real-Time Model Predictions → Integrated Google Vertex AI for scalable ML inference.
🚧 Frontend & Backend Synchronization → Used GraphQL APIs to enable fast & optimized queries for real-time player analytics.

Despite the challenges, we successfully built a robust ML-powered scouting system that predicts future player performance with high accuracy.

Built With

  • bigquery
  • cloud-functions-?-frontend:-next.js
  • docker
  • fastapi
  • github
  • google-automl
  • google-cloud-endpoints-?-devops-&-deployment:-kubernetes
  • graphql-?-apis-&-backend:-flask
  • javascript-?-machine-learning-frameworks:-tensorflow
  • lstm-?-cloud-services:-google-cloud-storage
  • programming-languages:-python
  • tailwind-css
  • typescript
  • vertex-ai
Share this project:

Updates