Team USA Hometown Success Engine
Inspiration & Impact
We asked: Where do America's Olympic/Paralympic athletes come from? And what sports would I excel at?
This dual question addresses real needs:
- For Team USA: Understanding athlete distribution enables data-driven talent development—identifying emerging regions, allocating resources strategically, and building new athletic hubs
- For fans & communities: Celebrates Olympic heritage, shows where hometown heroes originated, and helps individuals discover sports that match their physical profile
- For aspiring athletes: Inspires them by showing that elite talent comes from communities just like theirs
How We Built It
Full-stack application on Google Cloud, combining geographic intelligence with personalized data-driven matching:
Architecture
- Frontend: React + TypeScript with interactive Google Maps, three exploration modes (Map View, Explore, Compare), parallel Olympic and Paralympic analysis
- Backend: Python Flask on Cloud Run (serverless, auto-scaling)
- Data Engine: BigQuery with 10000+ athletes across 86 sports, spanning historical and modern Olympics/Paralympics
- AI Layer: Gemini powers regional storytelling, sport distribution analysis (with regional athlete breakdowns), and sport visual descriptions
- Analytics Layer: Data driven similarity matching algorithm matches users with sports by finding athletes most similar to their physical profile (height, weight) and location
Core Feature
Hometown Success Engine (Geographic Discovery)
- Interactive map showing where Team USA talent concentrates
- Filter by Olympic/Paralympic, explore by sport, or by state/city
- Discover regional sports traditions and talent hubs
- AI-generated regional insights via Gemini
Additional Feature
Find Your Matched Sports (Personalized Exploration)
- Optional: Input height, weight, hometown to discover personalized sport matches
- Algorithmic matching based on real athlete body characteristics in your region
- Adds a personalization layer to complement geographic insights
What We Learned
1. Agentic AI & Generative AI Integration
- Combining structured data queries (BigQuery) with generative storytelling (Gemini) creates powerful user experiences
- Prompt engineering is critical for reducing hallucinations and maintaining factual accuracy
- AI agents work best when given clear constraints and specific data context
2. Data Quality & Normalization
- Geographic inconsistencies required careful mapping (spelling variations, historical name changes)
- Sport classification across 80+ Olympic disciplines demanded meticulous data cleaning
- Handling missing values (height/weight) required intelligent fallback logic
3. Multiple Perspectives, Multiple Needs
- Map View (geographic exploration) + Compare Mode (regional benchmarking) + Personalized Matching (individual discovery) serve complementary user needs
- Different users engage differently—some want exploration, others want comparison, others want personalization
4. Google Cloud Integration
- Seamless connection between BigQuery (data), Gemini (AI storytelling), Google Maps (visualization), Cloud Run (deployment) enabled this scale
- BigQuery's analytical power makes querying 10000+ records fast and efficient
- Google Maps API enables interactive geographic exploration at scale
- Cloud Run's serverless nature eliminates infrastructure overhead
Technical Challenges & How We Solved Them
Challenge 1: Geographic Data Inconsistencies
- Athlete hometown data had spelling variations, abbreviations, missing values
- Solution: Normalized city/state naming, dropped records with missing sport classification, validated against known US geography
Challenge 2: Balancing Olympic & Paralympic Data
- Paralympic data is sparse compared to Olympic records
Solution: Separate Olympic and Paralympic data streams allow users to explore each independently
Challenge 3: Low Prediction Accuracy with Naive Approach
- Initial attempt at direct sport classification (Random Forest) achieved only 23.7% validation accuracy
- Solution: Pivoted to similarity matching instead—find athletes most similar to the user, return their sports
- This is more interpretable, more honest about data limitations, and more engaging for users
Challenge 4: Handling Incomplete User Input
- Users might only provide height, or only city, or nothing at all
- Solution: Implemented tiered matching logic:
- Height + weight + city: Combined Euclidean distance + geographic weighting
- Only city/state: Random sample from that region
- City not found: Expand to state level
- No input: Random from all sports
- This graceful degradation ensures the feature works regardless of how much info users provide
Achievements
✅ Production-ready platform serving dual stakeholders
✅ Combines geographic analysis (10000+ athletes in BigQuery) + personalized exploration (similarity algorithm) + AI storytelling (Gemini)
✅ Support 86 sports across 10000+ athletes and 3000+ cities
✅ Graceful degradation: Sport matching works with any level of user input (full profile, partial, or none)
✅ Deployed on Google Cloud with auto-scaling capability
Team USA gains strategic intelligence for athlete development; community members discover their connection to Olympic legacy and find sports aligned with their potential.
Future Improvements
- Richer Athlete Profiles: Add age, weight class, sport-specific metrics (jumping height, throwing distance) to improve matching accuracy
- Advanced Analytics: Correlation analysis between elevation/climate and sport success; identify regional talent pipelines
- Competitive Intelligence: Compare talent distribution with other Olympic nations; benchmark regional performance
- Community Engagement: Historical Olympic medals by hometown; predict future medal prospects by region
- Broader Data Integration: Include youth Olympic records, national championships, training facility locations
- Business Development:
- Venue booking & facility integration
- Local coaching & training marketplace
- Sports community, events, and pickup game discovery
Log in or sign up for Devpost to join the conversation.