Inspiration
Baseball has a rich history, and many fans want to analyze past games to uncover hidden insights. While modern Statcast provides real-time advanced metrics, historical game footage lacks this level of detail. Our inspiration was to bridge the gap by using computer vision and AI to extract key Statcast metrics (such as pitch speed and exit velocity) from archival MLB game videos.
What it does
Our tool processes old baseball game footage to extract fundamental Statcast metrics. It detects the baseball in motion, calculates pitch speed, tracks exit velocity, and retrieves on-screen data using OCR. This allows fans, analysts, and historians to analyze past performances with modern analytics.
How we built it
We structured the project into multiple components:
- Video Processing: Using OpenCV and YOLOv8 for frame extraction and object detection.
- Statcast Metric Calculation: Tracking ball movement between frames to estimate speed.
- OCR Processing: Extracting on-screen data from broadcasts using Tesseract OCR.
- Integration: A Python-based pipeline that combines all components into a single workflow.
Challenges we ran into
- Retrieving Accurate FPS: Some videos had metadata issues, resulting in a zero FPS value.
- Ball Detection in Low-Quality Footage: Older footage lacks the clarity of modern broadcasts, making detection difficult.
- OCR Accuracy: Extracting text from on-screen graphics required optimizing preprocessing techniques.
- Tesseract Setup Issues: Ensuring Tesseract OCR was properly installed and configured on different systems.
Accomplishments that we're proud of
- Successfully implemented a pipeline that extracts pitch speed and exit velocity from old game footage.
- Integrated real-time OCR processing to capture on-screen stats dynamically.
- Overcame video processing challenges to ensure accurate metric extraction.
What we learned
- Advanced video processing techniques for object tracking and movement detection.
- How to fine-tune OCR preprocessing to improve text extraction from broadcast graphics.
- Challenges of working with legacy sports footage and how to adapt AI models accordingly.
What's next for MLB_Stats_from_Old_Games
- Enhancing Object Detection: Fine-tuning YOLO models specifically trained for baseball footage.
- Automated Metadata Retrieval: Linking extracted data to official MLB records for validation.
- Expanding Metrics: Adding more Statcast metrics like launch angle and spin rate.
- User-Friendly Interface: Developing a web-based dashboard for easy access and visualization.
- Scaling to Other Sports: Adapting this approach to extract advanced stats from other historical sports footage.
Log in or sign up for Devpost to join the conversation.