How we built it

We built PitchSafe as a full pipeline:

  1. Data Engineering

Using Statcast data from 2021–2023, we computed rolling acute (3-day) and chronic (14-day) performance metrics

. We calculated deltas, slopes, release-point changes, velocity shifts, and recovery time features. Missing values were imputed and scaled.

  1. Machine Learning (XGBoost)

We trained an XGBoost classifier on game-level data with injury labels (whether a player was placed on the IL the following day). It outputs an injury probability from 0 to 1.

  1. Backend

Built with Node + Express:

stores game logs in Postgres

handles file uploads

calls our Python inference API

now includes a Claude service to generate explanations

  1. Claude AI Layer

We added a /api/injury/:pitcherId/explanation endpoint that:

fetches recent starts

gets injury risk from our ML API

sends all structured data to Claude

returns a natural-language scouting report

  1. Frontend

React (Vite) dashboard showing:

teamwide injury spectrum

per-pitcher detailed risk view

“Explain with Claude” button that reveals AI analysis

Challenges we ran into

Building a clean pipeline from raw Statcast data: Daily game logs are messy, missing, and inconsistent across seasons. We had to design a robust aggregation and rolling-metrics pipeline.

Getting the ML model to generalize: Injury prediction is extremely imbalanced. We tuned XGBoost and engineered features that captured workload trends instead of raw stats.

LLM prompt engineering: Coaches need straight answers, not verbose paragraphs. Getting Claude to be concise, useful, and baseball-aware required several prompt iterations.

System integration complexity: We connected Python ML services, a Node backend, React frontend, Postgres, and now Claude... getting all parts to communicate reliably took careful API design.

Accomplishments that we're proud of

Built a full end-to-end injury-prediction system using real MLB data.

Created a visually intuitive injury-risk spectrum that lets staff see risk at a glance.

Integrated Claude AI to produce coach-ready workload recommendations.

Achieved a fast, production-ready inference pipeline deployable on any team laptop or cloud stack.

Designed a user experience that feels like a legitimate professional sports analytics tool.

What we learned

Injury prediction is fundamentally a trend-tracking problem, not a single-game problem, rolling windows are everything.

Clean feature engineering often matters more than model complexity.

LLMs are extremely powerful when used as explainability layers, not just chatbots.

Good UX matters!! coaches won’t interpret raw CSVs, but they’ll trust a clean dashboard with clear explanations.

Integrating cloud-scale AI tools into traditional ML pipelines opens up new workflows we hadn’t considered before.

What's next for PitchSafe

  1. API-based data ingestion

Move from CSV uploads to fully automated Statcast API streaming.

  1. Predicting injury types

Extend model output to classify likely injury categories (elbow, shoulder, etc.).

  1. Claude-driven conversational analytics

Enable coaches to “ask questions” about their roster: “Which pitcher has rising fatigue after three consecutive high-velocity outings?”

  1. Extension + SDK

Turn PitchSafe into a library + dashboard plug-in for analytics departments.

  1. Team-level deployment

Develop live, in-dugout dashboards with real-time model updates.

Built With

  • component-based-ui-vite-?-fast-development-build-tool-recharts-?-data-visualization-and-analytics-clean-architecture-?-separation-of-concerns
  • jest
  • node.js
  • python
  • pytorch
  • scalable-code-jest-?-comprehensive-testing-framework-(300+-tests)-python/scikit-learn-?-machine-learning-injury-prediction-models-frontend-react-?-modern
  • supabase
  • testable
  • vite
Share this project:

Updates