PathFinder
Find yourself in 120 years of Team USA
Challenge 4 · Team USA × Google Cloud Hackathon — Athlete Archetype Agent
The Problem
Every four years, billions of fans watch Team USA compete. They feel something watching those athletes — a recognition, a pull, a question they can't quite articulate:
Where do I fit in that story?
120 years of Team USA athlete data exists. Thousands of athletes. Every body type, every sport, every decade from 1904 to Paris 2024. None of it has ever been made personal. Paralympic athletes — who represent some of Team USA's most compelling storylines — are almost entirely absent from fan-facing tools.
The Solution
PathFinder takes a fan's basic biometric profile — height, weight, dominant athletic trait — and runs it through a three-agent AI pipeline against real Team USA historical data stored in Google BigQuery.
In under 60 seconds it returns a personalized Athlete Archetype: a data-driven mirror that shows the fan which Olympic and Paralympic sports have historically drawn athletes who share their body's profile — with equal analytical depth on both sides.
Not a quiz. Not a chatbot. A reasoning pipeline with visible steps, grounded in real data, delivering a shareable result.
How It Works
The three-agent pipeline
Fan inputs biometrics
│
▼
┌─────────────────────────────────────────┐
│ Agent 1 — PROFILER │
│ Model: Gemini 2.5 Flash │
│ • Normalizes height, weight, BMI │
│ • Queries BigQuery cluster centroids │
│ • Computes Euclidean distance × 4 │
│ • Gemini refines using dominant_trait │
│ Output: archetype_id + confidence │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Agent 2 — MATCHER │
│ Model: Gemini 2.5 Flash + BigQuery │
│ • Queries olympic_sports array │
│ • Queries paralympic_sports array │
│ (same query depth, simultaneously) │
│ • Joins athlete_profiles for counts │
│ Output: top 5 Olympic + top 5 Para │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ GUARDRAIL LAYER (pure function) │
│ • Strips "you will" → "you could" │
│ • Strips "built for" → "athletes with │
│ your profile have historically..." │
│ • Blocks "guaranteed", "destined to" │
│ No LLM call — enforced in code │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Agent 3 — NARRATOR │
│ Model: Gemini 2.5 Pro │
│ • Generates Olympic narrative │
│ • Generates Paralympic narrative │
│ (equal length, equal depth) │
│ • Historical callout (specific decade) │
│ • Shareable tagline │
│ Output: structured JSON story │
└──────────────────┬──────────────────────┘
│
▼
Fan sees archetype card
+ sport grid (Oly + Para)
+ narrative + share card
System architecture
Browser
│ HTTPS
▼
Next.js 14 frontend ────────────────────────────── Google Cloud Run
│ SSE stream POST /analyze
▼
FastAPI backend ────────────────────────────────── Google Cloud Run
│ │ │
│ ADK orchestrates │ BigQuery queries │ Secret Manager
▼ ▼ ▼
Gemini 2.5 Flash archetype_clusters GEMINI_API_KEY
Gemini 2.5 Pro athlete_profiles (mounted at deploy)
Google ADK
The streaming difference
The /analyze endpoint uses FastAPI StreamingResponse with Server-Sent Events. The frontend agent step feed updates as each agent completes — not a spinner after 30 seconds, a live reasoning trace showing exactly what the pipeline is doing.
The Four Archetypes
| Archetype | Centroid | Dominant Trait | Olympic | Paralympic |
|---|---|---|---|---|
| Powerhouse | 185cm · 102kg | Strength | Weightlifting, Wrestling, Shot Put | Para Powerlifting, Wheelchair Rugby |
| Aerobic | 174cm · 68kg | Endurance | Marathon, Cycling, Triathlon | Wheelchair Racing, Para Cycling |
| Precision | 172cm · 70kg | Precision | Archery, Gymnastics, Diving | Para Archery, Para Shooting |
| Adaptive | 170cm · 72kg | Adaptive | Swimming, Athletics, Judo | Para Swimming, Para Athletics |
The Adaptive archetype is the widest cluster — it maps to the broadest range of Paralympic classifications and is PathFinder's most inclusive fan pathway.
What Makes This Different
1. It's a real multi-agent pipeline, not a prompt Three specialized agents — each with a distinct role, distinct tools, and distinct Gemini model. Profiler uses Flash for speed. Narrator uses Pro for narrative depth. The fan watches each agent complete in real time via SSE.
2. Paralympic parity is architectural, not cosmetic
Every archetype maps to Olympic and Paralympic sports through identical BigQuery queries, identical agent prompts, and identical UI rendering. The Matcher agent runs query_olympic_sports() and query_paralympic_sports() simultaneously — same code path, same depth.
3. The Guardrail layer enforces conditional phrasing in code A dedicated pure function runs on every Narrator output before it reaches the frontend. This isn't a prompt instruction that can be ignored — it's a code-level enforcement that systematically replaces deterministic language with conditional phrasing across every archetype result.
4. BigQuery grounds the reasoning in real data The Profiler doesn't guess archetypes — it computes mathematical distance between fan biometrics and cluster centroids derived from 120 years of Team USA athlete records. The reasoning is explainable: here is the distance score, here is the closest cluster, here is why.
Agent Responsibilities
| Agent | Model | Input | Output |
|---|---|---|---|
| Profiler | Gemini 2.5 Flash | height_cm, weight_kg, dominant_trait | archetype_id, confidence_score |
| Matcher | Gemini 2.5 Flash + BigQuery | archetype_id | 5 Olympic sports + 5 Paralympic sports |
| Guardrail | Pure function | raw narrative inputs | conditional-safe inputs |
| Narrator | Gemini 2.5 Pro | archetype + matches + hint | headline, narratives, tagline |
Google Cloud Services Used
| Service | How used |
|---|---|
| Google Cloud Run | Two services — FastAPI backend (1Gi, 300s timeout) + Next.js frontend (512Mi) |
| Google BigQuery | Stores archetype_clusters (4 rows) and athlete_profiles (15,000+ rows). Profiler queries centroids. Matcher queries sport arrays. |
| Google Secret Manager | GEMINI_API_KEY stored as secret, mounted into backend Cloud Run service at deploy |
| Google ADK | Orchestrates the three-agent pipeline with named tool definitions per agent |
| Gemini 2.5 Flash | Profiler (classification) + Matcher (sport retrieval) |
| Gemini 2.5 Pro | Narrator (conditional story generation) |
| Google Container Registry | Docker images for both Cloud Run services |
| Google Cloud Build | CI/CD — gcloud builds submit for both services |
Data Sources
1. Olympedia — 120 Years of Olympic History
- Source: kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
- Filtered to US athletes only (NOC = USA)
- Loaded into BigQuery as
athlete_profilestable - Used by Matcher agent for historical athlete counts and example names per sport
2. Custom Archetype Cluster Dataset
- Four cluster centroids (Powerhouse, Aerobic, Precision, Adaptive) modeled on sports science research on elite athlete anthropometrics by discipline
- Stored in BigQuery as
archetype_clusterstable - Used by Profiler agent for Euclidean distance classification
BigQuery project: gen-lang-client-0276213830 · dataset: pathfinder_dataset
API
GET /health → {"status": "ok"}
POST /analyze → SSE stream
Body: {
"height_cm": 182,
"weight_kg": 95,
"dominant_trait": "strength"
}
Streams agent steps, returns PipelineResult
Why This Stack
- Gemini Flash for Profiler + Matcher — fast, cost-efficient for classification and lookup where latency matters
- Gemini Pro for Narrator — richer conditional storytelling grounded in specific historical decades
- Google ADK — explicit orchestration with named tool calls; pipeline is inspectable, not a black box
- BigQuery — centroid queries and sport array lookups in milliseconds; scales to full dataset without code changes
- SSE over REST — judges see the pipeline reason step-by-step, not a spinner
- Cloud Run — serverless, scale to zero, 300s timeout, separate frontend and backend services
Repository
pathfinder/
├── frontend/
│ ├── app/page.tsx # Biometric input form
│ ├── app/result/page.tsx # Archetype reveal + results
│ ├── components/
│ │ ├── AgentStepFeed.tsx # Live SSE reasoning trace
│ │ ├── ArchetypeCard.tsx # Archetype reveal card
│ │ ├── SportGrid.tsx # Olympic + Paralympic side by side
│ │ └── NarrativeSection.tsx # Conditional story display
│ └── Dockerfile
├── backend/
│ ├── main.py # FastAPI + SSE endpoint
│ ├── agents/
│ │ ├── profiler.py # Agent 1 — Gemini Flash + BigQuery
│ │ ├── matcher.py # Agent 2 — Gemini Flash + BigQuery
│ │ ├── narrator.py # Agent 3 — Gemini Pro
│ │ └── guardrail.py # Conditional phrasing enforcer
│ ├── data/seed.py # BigQuery seed script
│ └── Dockerfile
└── infra/
├── deploy.sh / deploy.ps1 # Cloud Run deploy
└── bigquery_setup.sh # Dataset + table creation
Local Setup
Backend:
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn main:app --reload --port 8080
Frontend:
cd frontend
npm ci
echo "NEXT_PUBLIC_BACKEND_URL=http://localhost:8080" > .env.local
npm run dev
Test the pipeline:
curl -X POST http://localhost:8080/analyze \
-H "Content-Type: application/json" \
-d '{"height_cm": 182, "weight_kg": 95, "dominant_trait": "strength"}'
Submission Checklist
- ✅ Live frontend Cloud Run URL working end-to-end
- ✅ /health returns 200 on backend
- ✅ Olympic and Paralympic sport lists — equal visual weight side by side
- ✅ Agent trace feed shows Profiler → Matcher → Narrator via SSE
- ✅ All narratives use conditional phrasing ("could", "might", "appears")
- ✅ Guardrail layer enforces conditional phrasing in code, not just prompt
- ✅ GEMINI_API_KEY in Secret Manager, mounted to backend Cloud Run
- ✅ BigQuery archetype_clusters — 4 rows seeded
- ✅ BigQuery athlete_profiles — Olympedia data loaded
- ✅ Public GitHub repo with Apache 2.0 in About section
- ✅ 3-minute demo video — YouTube unlisted
PathFinder — Athlete Archetype Agent · Challenge 4 · Team USA × Google Cloud Hackathon Built with Next.js 14 · FastAPI · Google ADK · Gemini 2.5 Flash/Pro · BigQuery · Cloud Run Apache License 2.0
Built With
- docker
- fastapi
- gemini-2.5-flash
- gemini-2.5-pro
- google-cloud-run
- google-search-grounding
- next.js-14
- python
- server-sent
- tailwind-css
- typescript
- uvicorn
Log in or sign up for Devpost to join the conversation.