Inspiration

Team USA athletes inspire millions, but the story of where they come from often goes untold. I wanted to celebrate every Olympian and Paralympian by mapping their hometowns and uncovering the geographic, climatic, and community factors that create athlete "hubs." This project honors not just medal winners, but every athlete who has represented the United States, while providing actionable insights for youth sports programs, state athletic committees, and community planners looking to invest in the next generation.

What it does

Geo Athlete is an interactive data visualization and AI analytics platform that transforms 18 public datasets into a comprehensive atlas of American athletic talent:

Core Features

  • Interactive Geographic Explorer: Choropleth maps at state and county levels reveal athlete concentrations, with color intensity showing Olympic and Paralympic representation
  • Gemini-Enabled Chat Assistant: Natural language queries powered by Google Gemini AI let users ask complex questions like "Which small states punch above their weight in producing athletes per capita?" or "Does snowfall correlate with winter sport success?"
  • Multi-Dimensional Analysis: Correlate athlete production with:
    • Climate data (temperature, snowfall, sunshine)
    • Geographic features (elevation, coastline, water bodies)
    • Infrastructure (NCAA programs, ski resorts, recreation land)
    • Demographics (population, income, urbanization)
  • Paralympic Parity: Dedicated views ensuring Paralympic athletes receive equal visibility alongside their Olympic counterparts
  • County-Level Insights: Drill down to see specific cities and athletes from 3,143 counties
  • Sport-Specific Breakdowns: Analyze which states dominate in swimming, track and field, ice hockey, wheelchair basketball, and 100+ other sports

Technical Innovation

My Gemini-enabled agent doesn't just answer questions—it executes Python code in real-time, performs statistical analysis, generates visualizations, and streams results back with full transparency. Users see the code, the output, and the reasoning behind every answer.

How I built it

Architecture

Frontend (Next.js + TypeScript)
    ↓ Queries
Backend (FastAPI + Python)
    ↓ AI Processing
LangGraph Agent (Google Gemini)
    ↓ Data Access
18 Data Sources → 5 Analytics Datasets

Data Sources

All 18 data sources are publicly available and fully reproducible. Each includes a download.py script and attribution documentation:

Athlete Data:

  1. Team USA Official Website — 8,526 athletes (Olympic + Paralympic) with hometowns, sports, and medal counts
  2. Paralympic Medal Matching — Cross-referenced Team USA data with Wikipedia medalist lists to derive career timelines

Demographics:

  1. US Census Bureau - State Population — 2020 Census state populations for per-capita analysis
  2. US Census Bureau - County Population — County-level populations (3,143 counties)
  3. US Census Bureau - Income — State median household income
  4. US Census Bureau - Land Area — State land area in square miles
  5. US Census Bureau - Urbanization — Urban vs rural population percentages

Climate:

  1. NOAA Climate Normals — 1991-2020 temperature and precipitation averages
  2. NOAA Snowfall — Average annual snowfall by state
  3. NOAA Sunshine — Percent possible sunshine by state
  4. NOAA Seasonal Temperature — Summer/winter temperature patterns
  5. NOAA Coastline — Coastline and shoreline miles per state

Geography:

  1. USGS Elevation — Mean elevation, highest/lowest points per state
  2. USGS National Hydrography — Major rivers, lakes, and water access

Infrastructure:

  1. NCAA Programs — NCAA Division I/II/III program counts by state
  2. Public Recreation Land — Federal land, state parks, national parks, ski resorts

Derived Data:

  1. County Climate Estimates — Weighted estimates from state-level NOAA data using population density
  2. City Altitude — Derived from USGS elevation data for hometown analysis

Every source is US Government public domain or publicly published reference data. Total dataset: 8,526 athletes across 3,143 counties in 51 states, merged into 5 optimized JSON files via a custom ETL pipeline (build_data.py).

Technologies

Frontend:

  • Next.js 16 with React 19 and TypeScript
  • react-simple-maps for SVG-based map rendering
  • Recharts for climate/elevation visualizations
  • Framer Motion for smooth transitions
  • Tailwind CSS for responsive design

Backend & AI:

  • FastAPI with Server-Sent Events (SSE) for real-time streaming
  • Pandas / NumPy for data manipulation and statistical analysis
  • LangGraph for agentic AI workflows with a ReAct loop
  • LangChain for tool orchestration
  • Google Gemini 3.1 Pro Preview (via Vertex AI) for natural language understanding, code generation, and multi-step reasoning

Cloud Infrastructure:

  • Google Cloud Firestore — persists chat sessions and message history so conversations survive page reloads and can be resumed across tabs
  • Google Cloud Run — both frontend and backend are containerized with Docker and deployed as fully managed Cloud Run services
  • Google Artifact Registry — stores Docker images for CI/CD
  • Vertex AI — hosts the Gemini model with service-account authentication

AI Agent Architecture

My custom Agent uses a ReAct (Reasoning + Acting) pattern:

  • 13 specialized tools for querying athletes, geography, climate, sports, and infrastructure
  • Streaming execution cells show Python code, outputs, and reasoning in real-time
  • State management via LangGraph for multi-turn conversations
  • Firestore persistence — every user query and assistant response is saved, enabling session history and tab switching
  • Context-aware responses that reference specific states, counties, and athletes

My Findings

All insights below were extracted programmatically from the 18 merged datasets using the Gemini-powered agent. Every number is reproducible by running the same queries in the app.

Winter States Dominate Per-Capita Athlete Production

The top states by athletes per million residents are overwhelmingly cold-weather, high-elevation states:

Rank State Athletes / Million Climate Snowfall
1 Vermont 46.7 Cold Very Heavy
2 Alaska 35.5 Cold Very Heavy
3 Colorado 31.5 Moderate Very Heavy
4 Utah 22.3 Moderate Heavy
5 Minnesota 19.8 Cold Very Heavy

There is a strong correlation (r = 0.63) between average annual snowfall and per-capita athlete production. Eight of the top ten states experience "Heavy" or "Very Heavy" snowfall and have "Cold" or "Severe" winters. Colorado and Utah sit at alpine elevations above 6,000 ft, and states like Vermont and Minnesota each have 20+ ski resorts, providing the training infrastructure to develop Winter Olympians and Paralympians at scale.

Urbanization Drives Volume, Not Efficiency

Urbanization correlates moderately with total athlete count (r = 0.44) but has virtually no relationship with athletes per capita (r = −0.05). Highly urban states like California produce more athletes in absolute terms simply because of population size, but the most efficient athlete-producing state, Vermont, is only 38.9% urban. Rural access to outdoor terrain and winter infrastructure appears to matter more than dense urban sports programs.

Income Matters, but Geography Matters More

Median household income shows a moderate positive correlation with per-capita athlete production (r = 0.40). Wealthier states can fund youth sports, equipment, and travel for competition. However, snowfall (r = 0.63) and elevation (r = 0.26) are stronger predictors, suggesting that natural geography still outweighs economic factors in determining where Olympic and Paralympic talent emerges.

Los Angeles County Is the #1 Athlete Hub

At the county level, Los Angeles County, CA leads the nation with 58 athletes (52 Olympians, 6 Paralympians) across sports including Artistic Gymnastics, Athletics, and Basketball. The combination of massive population, elite collegiate and professional training facilities, and year-round favorable weather creates an unmatched ecosystem.

Paralympic Representation Clusters in the Midwest

Paralympians make up 11.3% of the total 8,526 athletes. The states with the highest Paralympic-to-Olympic ratio are not coastal powerhouses but Midwestern and Southwest states: New Mexico (1.67:1), Oklahoma (0.89:1), Kansas (0.80:1), and Nebraska (0.75:1). This may reflect strong local adaptive sports programs and specialized rehabilitation/training centers in those regions.

The Top 5 Paralympic Sports

Rank Sport Athletes
1 Para Track and Field 128
2 Para Swimming 91
3 Wheelchair Basketball 60
4 Para-Cycling 57
5 Para Alpine Skiing 45

Coastal States Own the Water

Six of the top ten per-capita states have ocean coastlines or Great Lakes access (Hawaii, California, Massachusetts, Connecticut, Alaska, Minnesota). This geography feeds success in swimming, sailing, rowing, and surfing. Water sport access is a secondary but meaningful factor alongside snowfall and elevation.

Built With

  • cloud-run
  • fastapi
  • firestore
  • gemini
  • next-js
  • vertex-ai
Share this project:

Updates