Statcast Vision

Inspiration

Baseball has always been a game of numbers and analytics, and modern technology has revolutionized the way players and analysts evaluate performance. Inspired by advanced baseball analytics and the growing role of AI in sports, we set out to create a deep learning model that predicts key hitting metrics—Exit Velocity, Launch Angle, and Hit Distance—just by analyzing short MLB video clips. Our goal was to bridge the gap between video-based analysis and traditional baseball statistics, making it easier for players, coaches, and analysts to extract meaningful insights from raw footage.

What it does

Statcast Vision takes in a short MLB video clip and predicts three crucial hitting metrics: Exit Velocity, Launch Angle, and Hit Distance. By leveraging deep learning techniques, our system extracts meaningful features from video frames and processes them with a hybrid CNN-LSTM model to provide accurate predictions in real-time.

How we built it

Dataset Collection & Preprocessing
- We compiled a dataset of 16.5K historical home run videos (~300GB total size), each containing 5 key attributes: play_id, Exit Velocity, Hit Distance, Launch Angle, and video_url.
- Given storage and memory limitations, we processed videos in batches of 50 using multi-threading.
- Each video (20-40 sec, ~20MB) was converted into 60 equidistant frames, resized to (224, 224, 3), and normalized for feature extraction.

Model Architecture

We leveraged a hybrid CNN-LSTM model:

 base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
 base_model.trainable = False  # Freeze pre-trained layers
 feature_extractor = Model(inputs=base_model.input, outputs=Flatten()(base_model.output))

 # Define the LSTM-based regression model
 model = Sequential([
     TimeDistributed(feature_extractor, input_shape=(50, 224, 224, 3)),  # Apply CNN to each frame
     LSTM(256, return_sequences=False),  # LSTM processes extracted features
     Dense(512, activation='relu'),
     Dropout(0.3),
     Dense(3)  # Predict ExitVelocity, HitDistance, LaunchAngle
 ])

 model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])

We used ResNet50 (2D ImageNet) as a feature extractor to avoid the high computational cost of 3D CNNs.
The LSTM layer analyzed temporal dependencies across video frames.
Our model was trained on 550 videos, reducing the loss from 55K to 131 after extensive training.

Deployment
- We built a React.js frontend for users to upload videos.
- The backend, built with Express.js, handled video processing and model inference.
- We deployed our trained model on Google Vertex AI, ensuring scalable and efficient inference.

Kaggle Repo

https://www.kaggle.com/code/pushpenderindia/google-mlb-hackathon-2024

Challenges we ran into

Massive Dataset Handling: With 300GB of video data, we had to optimize our pipeline for batch processing within Kaggle’s memory constraints.
GPU Limitations: Training even 10 videos per batch overwhelmed resources, leading us to fine-tune batch sizes and model complexity.
Feature Extraction Efficiency: 3D ImageNet models were infeasible, so we adapted ResNet50 with LSTM to balance accuracy and computational efficiency.
Long Training Times: Training on only 550 videos took hours, and improving the dataset size remains an ongoing challenge.

Accomplishments that we're proud of

Successfully trained a hybrid CNN-LSTM model on video data for baseball analytics.
Reduced model loss from 55K to 131, achieving promising accuracy.
Deployed a working prototype with a frontend, backend, and cloud-based AI inference.
Overcame significant GPU and memory constraints to make the project viable on Kaggle.

What we learned

Optimized video processing techniques for deep learning models.
Efficient GPU resource management to train within limitations.
The power of CNN-LSTM hybrid models in extracting and processing video-based features.
Cloud-based AI deployment using Google Vertex AI for seamless integration.

What's next for Statcast Vision

Expanding the dataset: Training on more videos to improve accuracy.
Real-time inference: Optimizing the pipeline for faster predictions.
Integrating additional baseball metrics, such as bat speed and pitch velocity.
Enhancing model architecture with Transformer-based approaches for better video understanding.
Public API launch: Making the model accessible to coaches, analysts, and enthusiasts.