How we built it
We built PitchSafe as a full pipeline:
- Data Engineering
Using Statcast data from 2021–2023, we computed rolling acute (3-day) and chronic (14-day) performance metrics
. We calculated deltas, slopes, release-point changes, velocity shifts, and recovery time features. Missing values were imputed and scaled.
- Machine Learning (XGBoost)
We trained an XGBoost classifier on game-level data with injury labels (whether a player was placed on the IL the following day). It outputs an injury probability from 0 to 1.
- Backend
Built with Node + Express:
stores game logs in Postgres
handles file uploads
calls our Python inference API
now includes a Claude service to generate explanations
- Claude AI Layer
We added a /api/injury/:pitcherId/explanation endpoint that:
fetches recent starts
gets injury risk from our ML API
sends all structured data to Claude
returns a natural-language scouting report
- Frontend
React (Vite) dashboard showing:
teamwide injury spectrum
per-pitcher detailed risk view
“Explain with Claude” button that reveals AI analysis
Challenges we ran into
Building a clean pipeline from raw Statcast data: Daily game logs are messy, missing, and inconsistent across seasons. We had to design a robust aggregation and rolling-metrics pipeline.
Getting the ML model to generalize: Injury prediction is extremely imbalanced. We tuned XGBoost and engineered features that captured workload trends instead of raw stats.
LLM prompt engineering: Coaches need straight answers, not verbose paragraphs. Getting Claude to be concise, useful, and baseball-aware required several prompt iterations.
System integration complexity: We connected Python ML services, a Node backend, React frontend, Postgres, and now Claude... getting all parts to communicate reliably took careful API design.
Accomplishments that we're proud of
Built a full end-to-end injury-prediction system using real MLB data.
Created a visually intuitive injury-risk spectrum that lets staff see risk at a glance.
Integrated Claude AI to produce coach-ready workload recommendations.
Achieved a fast, production-ready inference pipeline deployable on any team laptop or cloud stack.
Designed a user experience that feels like a legitimate professional sports analytics tool.
What we learned
Injury prediction is fundamentally a trend-tracking problem, not a single-game problem, rolling windows are everything.
Clean feature engineering often matters more than model complexity.
LLMs are extremely powerful when used as explainability layers, not just chatbots.
Good UX matters!! coaches won’t interpret raw CSVs, but they’ll trust a clean dashboard with clear explanations.
Integrating cloud-scale AI tools into traditional ML pipelines opens up new workflows we hadn’t considered before.
What's next for PitchSafe
- API-based data ingestion
Move from CSV uploads to fully automated Statcast API streaming.
- Predicting injury types
Extend model output to classify likely injury categories (elbow, shoulder, etc.).
- Claude-driven conversational analytics
Enable coaches to “ask questions” about their roster: “Which pitcher has rising fatigue after three consecutive high-velocity outings?”
- Extension + SDK
Turn PitchSafe into a library + dashboard plug-in for analytics departments.
- Team-level deployment
Develop live, in-dugout dashboards with real-time model updates.
Built With
- component-based-ui-vite-?-fast-development-build-tool-recharts-?-data-visualization-and-analytics-clean-architecture-?-separation-of-concerns
- jest
- node.js
- python
- pytorch
- scalable-code-jest-?-comprehensive-testing-framework-(300+-tests)-python/scikit-learn-?-machine-learning-injury-prediction-models-frontend-react-?-modern
- supabase
- testable
- vite
Log in or sign up for Devpost to join the conversation.