Inspiration
This project aims to make baseball more accessible to everyone, particularly those who are visually impaired and currently unable to enjoy the game's traditional visual experience. Cutting-edge technology is being used to create new ways for these fans to experience the excitement of baseball through sound.
Here's how it works:
- Immersive Audio: Detailed audio descriptions of the MLB season, leagues, teams, players, and games are provided. This includes information beyond just the game action, offering a richer understanding of the broader baseball landscape.
- Sensory Translation: Technology enables game details to be translated into rich auditory experiences, so fans can "hear" the action unfold. This focuses on capturing the nuances of the game and conveying them through sound.
- Real-Time Narratives: Comprehensive game narratives are tailored specifically for visually impaired audiences, providing a dynamic and engaging listening experience that replicates the excitement of watching a game.
Goal:
- Increase inclusivity in sports: Everyone should have the opportunity to enjoy baseball, regardless of their visual abilities.
- Enrich entertainment: Everyone deserves access to engaging and meaningful entertainment experiences.
- Showcase the power of technology: This project demonstrates how technology can be used to break down barriers and create a more inclusive world.
- Commitment to inclusion: This project aligns with Major League Baseball's ongoing commitment to diversity and inclusion, ensuring that the thrill of baseball can be shared by all.
What it does
This project revolutionizes MLB information access through an innovative audio interface:
Audio search:
Users can issue voice commands to retrieve up-to-date MLB data in audio. Gemini Flash 2.0's advanced audio processing capabilities interpret these commands accurately and provide real time MLB data back in the form of audio to the fans.

Audio search gets its data from the MLB Stats API, ensuring real-time, accurate data on MLB leagues, seasons, teams, rosters, and games. Gemini Flash 2.0's function calling capabilities seamlessly integrate this data, presenting it to users in a clear, accessible format.
The system provides:
- Live game data and Clutch Plays
- Comprehensive player and team data
- MLB League and Season data
- Historical data for in-depth analysis
- Team Standings
- This innovative approach makes MLB information readily available to all fans, including those with visual impairments, enhancing the baseball experience for everyone.
How we built it
Technologies used
- Gemini Multi Modal Live API
- Gemini 2.0 Flash 2.0 Exp
- Vertex Generative AI SDK
- Google Gen AI SDKs
- React
- Google Material Design
- Python
- Flask
- Websockets
Technology Architecture

Note - Web and Video capabilities have been added in the 2nd iteration.
Frontend Architecture
- Frameworks -
- React - Frontend of the MLB clutch moments application is built using React.
- Google Material - Front end design is powered by Google Material provide a rich user experience
- Deployment - Google Cloud Run is used for deployment of the React application
Backend Architecture
Frameworks -
- Python - Used Python to build the API's for the MLB Clutch App
- Flask - Flask was used some of the API end points
- Websockets - Used Websockets to enable Audio communication
- Google Gen AI SDK - Used Gen AI SDK for enabling Audio capabilities and function calling
- Vertex Generative AI SDK- Used the Gemini SDK to enable Generative AI capabilities for the Web and Video summarization
- Audio Websocket server - This Websocket server is hosted on Google Cloud Run. This websocket server receives audio input. Audio input is converted into text and it is used to find out the function that needs to be called. Gemini is able to accurately identify the function and it is able to retrieve the MLB data that the user is looking for. Different functions within the websocket server connect to different MLB stats API's to get the information need to power the MLB use cases.
- This server sends the audio output back and supports the following MLB Data.
- Seasons
- Leagues
- Teams
- Players
- Games
- https://statsapi.mlb.com/api/v1/schedule?sportId=1&season={season}&types=regular&date={gamedate}&teamIds={team_id}
- https://statsapi.mlb.com/api/v1/schedule?sportId=1&season={season}&types=regular&date={gamedate}&gamePk={Game ID}&hydrate=game(content(highlights(highlights)))
- https://statsapi.mlb.com/api/v1.1/game/{Game Id}/feed/live
- Team Standings
- This server sends the audio output back and supports the following MLB Data.
- Video API - This API end point is hosted as a Flask API. This API accepts a video url and provides a summary of the video. This is built in the 2nd iteration of this app.
- Web API - This API end point is hosted as a flask API. This API receives text input from the web. Text is analyzed and an appropriate function to be called is identified. Based on the function identified by Gemini, MLB data is retrieved from the MLB statsapi . This is built in the 2nd iteration of this app.
- This API supports the following MLB Data.
- Seasons
- Leagues
- Teams
- Players
- Games
- This API supports the following MLB Data.
Deployment - API's are deployed on Google Cloud Run.
Challenges we ran into
- Size of the datasets - Due to the size of the MLB data sets, model quotas often exceeded. By adding some filters reduced the size of the datasets.
- Websockets deployment - Deployment of websockets was a bit of a challenging but i found a solution using server.
- Function calling chains - After going through several iterations and challenges on chained function calls, learnt how to solve this problem by providing clear function instructions.
Accomplishments that we're proud of
- Building Audio Experience - Leveraging Google's Multimodal Live API, I developed an inclusive audio experience for baseball fans, enabling visually impaired and all users to enjoy the game through innovative, real-time audio description technology. This solution transforms accessibility, making sports more engaging for everyone. I hope this app helps as many people as possible.
What we learned
- Google Gemini Flash 2.0 - Learnt about the power of the Multi Modal Live API, this really helped in finalizing the MLB audio search use case
- Google Cloud - Learnt Google Cloud from scratch
- MLB API - Learnt MLB terminology, studied different API's to present the data
- Websockets - Learnt a lot about Websockets while building the Audio API and interface.
What's next for MLB Clutch Moments
- Identified MLB web search uses that could be enabled using Gemini 2.0 and started enabling those features in the app. Started updating the architecture and design for these use cases.
Web search:
Features an auto-complete search function for quick and easy information retrieval.

- Started working on video summarization of game videos
- Get feedback and refine the solution
- Add video capabilities into the app to provide an in stadium experience
Log in or sign up for Devpost to join the conversation.