PathFinder

Find yourself in 120 years of Team USA

Challenge 4 · Team USA × Google Cloud Hackathon — Athlete Archetype Agent

🔗 Live App · Backend Health


The Problem

Every four years, billions of fans watch Team USA compete. They feel something watching those athletes — a recognition, a pull, a question they can't quite articulate:

Where do I fit in that story?

120 years of Team USA athlete data exists. Thousands of athletes. Every body type, every sport, every decade from 1904 to Paris 2024. None of it has ever been made personal. Paralympic athletes — who represent some of Team USA's most compelling storylines — are almost entirely absent from fan-facing tools.


The Solution

PathFinder takes a fan's basic biometric profile — height, weight, dominant athletic trait — and runs it through a three-agent AI pipeline against real Team USA historical data stored in Google BigQuery.

In under 60 seconds it returns a personalized Athlete Archetype: a data-driven mirror that shows the fan which Olympic and Paralympic sports have historically drawn athletes who share their body's profile — with equal analytical depth on both sides.

Not a quiz. Not a chatbot. A reasoning pipeline with visible steps, grounded in real data, delivering a shareable result.


How It Works

The three-agent pipeline

Fan inputs biometrics
        │
        ▼
┌─────────────────────────────────────────┐
│  Agent 1 — PROFILER                     │
│  Model: Gemini 2.5 Flash                │
│  • Normalizes height, weight, BMI       │
│  • Queries BigQuery cluster centroids   │
│  • Computes Euclidean distance × 4      │
│  • Gemini refines using dominant_trait  │
│  Output: archetype_id + confidence      │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│  Agent 2 — MATCHER                      │
│  Model: Gemini 2.5 Flash + BigQuery     │
│  • Queries olympic_sports array         │
│  • Queries paralympic_sports array      │
│    (same query depth, simultaneously)   │
│  • Joins athlete_profiles for counts    │
│  Output: top 5 Olympic + top 5 Para     │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│  GUARDRAIL LAYER (pure function)        │
│  • Strips "you will" → "you could"      │
│  • Strips "built for" → "athletes with  │
│    your profile have historically..."   │
│  • Blocks "guaranteed", "destined to"   │
│  No LLM call — enforced in code         │
└──────────────────┬──────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────┐
│  Agent 3 — NARRATOR                     │
│  Model: Gemini 2.5 Pro                  │
│  • Generates Olympic narrative          │
│  • Generates Paralympic narrative       │
│    (equal length, equal depth)          │
│  • Historical callout (specific decade) │
│  • Shareable tagline                    │
│  Output: structured JSON story          │
└──────────────────┬──────────────────────┘
                   │
                   ▼
        Fan sees archetype card
        + sport grid (Oly + Para)
        + narrative + share card

System architecture

Browser
   │ HTTPS
   ▼
Next.js 14 frontend  ──────────────────────────────  Google Cloud Run
   │ SSE stream POST /analyze
   ▼
FastAPI backend  ──────────────────────────────────  Google Cloud Run
   │                    │                    │
   │ ADK orchestrates   │ BigQuery queries   │ Secret Manager
   ▼                    ▼                    ▼
Gemini 2.5 Flash   archetype_clusters    GEMINI_API_KEY
Gemini 2.5 Pro     athlete_profiles      (mounted at deploy)
Google ADK

The streaming difference

The /analyze endpoint uses FastAPI StreamingResponse with Server-Sent Events. The frontend agent step feed updates as each agent completes — not a spinner after 30 seconds, a live reasoning trace showing exactly what the pipeline is doing.


The Four Archetypes

Archetype Centroid Dominant Trait Olympic Paralympic
Powerhouse 185cm · 102kg Strength Weightlifting, Wrestling, Shot Put Para Powerlifting, Wheelchair Rugby
Aerobic 174cm · 68kg Endurance Marathon, Cycling, Triathlon Wheelchair Racing, Para Cycling
Precision 172cm · 70kg Precision Archery, Gymnastics, Diving Para Archery, Para Shooting
Adaptive 170cm · 72kg Adaptive Swimming, Athletics, Judo Para Swimming, Para Athletics

The Adaptive archetype is the widest cluster — it maps to the broadest range of Paralympic classifications and is PathFinder's most inclusive fan pathway.


What Makes This Different

1. It's a real multi-agent pipeline, not a prompt Three specialized agents — each with a distinct role, distinct tools, and distinct Gemini model. Profiler uses Flash for speed. Narrator uses Pro for narrative depth. The fan watches each agent complete in real time via SSE.

2. Paralympic parity is architectural, not cosmetic Every archetype maps to Olympic and Paralympic sports through identical BigQuery queries, identical agent prompts, and identical UI rendering. The Matcher agent runs query_olympic_sports() and query_paralympic_sports() simultaneously — same code path, same depth.

3. The Guardrail layer enforces conditional phrasing in code A dedicated pure function runs on every Narrator output before it reaches the frontend. This isn't a prompt instruction that can be ignored — it's a code-level enforcement that systematically replaces deterministic language with conditional phrasing across every archetype result.

4. BigQuery grounds the reasoning in real data The Profiler doesn't guess archetypes — it computes mathematical distance between fan biometrics and cluster centroids derived from 120 years of Team USA athlete records. The reasoning is explainable: here is the distance score, here is the closest cluster, here is why.


Agent Responsibilities

Agent Model Input Output
Profiler Gemini 2.5 Flash height_cm, weight_kg, dominant_trait archetype_id, confidence_score
Matcher Gemini 2.5 Flash + BigQuery archetype_id 5 Olympic sports + 5 Paralympic sports
Guardrail Pure function raw narrative inputs conditional-safe inputs
Narrator Gemini 2.5 Pro archetype + matches + hint headline, narratives, tagline

Google Cloud Services Used

Service How used
Google Cloud Run Two services — FastAPI backend (1Gi, 300s timeout) + Next.js frontend (512Mi)
Google BigQuery Stores archetype_clusters (4 rows) and athlete_profiles (15,000+ rows). Profiler queries centroids. Matcher queries sport arrays.
Google Secret Manager GEMINI_API_KEY stored as secret, mounted into backend Cloud Run service at deploy
Google ADK Orchestrates the three-agent pipeline with named tool definitions per agent
Gemini 2.5 Flash Profiler (classification) + Matcher (sport retrieval)
Gemini 2.5 Pro Narrator (conditional story generation)
Google Container Registry Docker images for both Cloud Run services
Google Cloud Build CI/CD — gcloud builds submit for both services

Data Sources

1. Olympedia — 120 Years of Olympic History

  • Source: kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
  • Filtered to US athletes only (NOC = USA)
  • Loaded into BigQuery as athlete_profiles table
  • Used by Matcher agent for historical athlete counts and example names per sport

2. Custom Archetype Cluster Dataset

  • Four cluster centroids (Powerhouse, Aerobic, Precision, Adaptive) modeled on sports science research on elite athlete anthropometrics by discipline
  • Stored in BigQuery as archetype_clusters table
  • Used by Profiler agent for Euclidean distance classification

BigQuery project: gen-lang-client-0276213830 · dataset: pathfinder_dataset


API

GET  /health   → {"status": "ok"}

POST /analyze  → SSE stream
Body: {
  "height_cm": 182,
  "weight_kg": 95,
  "dominant_trait": "strength"
}
Streams agent steps, returns PipelineResult

Why This Stack

  • Gemini Flash for Profiler + Matcher — fast, cost-efficient for classification and lookup where latency matters
  • Gemini Pro for Narrator — richer conditional storytelling grounded in specific historical decades
  • Google ADK — explicit orchestration with named tool calls; pipeline is inspectable, not a black box
  • BigQuery — centroid queries and sport array lookups in milliseconds; scales to full dataset without code changes
  • SSE over REST — judges see the pipeline reason step-by-step, not a spinner
  • Cloud Run — serverless, scale to zero, 300s timeout, separate frontend and backend services

Repository

pathfinder/
├── frontend/
│   ├── app/page.tsx              # Biometric input form
│   ├── app/result/page.tsx       # Archetype reveal + results
│   ├── components/
│   │   ├── AgentStepFeed.tsx     # Live SSE reasoning trace
│   │   ├── ArchetypeCard.tsx     # Archetype reveal card
│   │   ├── SportGrid.tsx         # Olympic + Paralympic side by side
│   │   └── NarrativeSection.tsx  # Conditional story display
│   └── Dockerfile
├── backend/
│   ├── main.py                   # FastAPI + SSE endpoint
│   ├── agents/
│   │   ├── profiler.py           # Agent 1 — Gemini Flash + BigQuery
│   │   ├── matcher.py            # Agent 2 — Gemini Flash + BigQuery
│   │   ├── narrator.py           # Agent 3 — Gemini Pro
│   │   └── guardrail.py          # Conditional phrasing enforcer
│   ├── data/seed.py              # BigQuery seed script
│   └── Dockerfile
└── infra/
    ├── deploy.sh / deploy.ps1    # Cloud Run deploy
    └── bigquery_setup.sh         # Dataset + table creation

Local Setup

Backend:

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn main:app --reload --port 8080

Frontend:

cd frontend
npm ci
echo "NEXT_PUBLIC_BACKEND_URL=http://localhost:8080" > .env.local
npm run dev

Test the pipeline:

curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{"height_cm": 182, "weight_kg": 95, "dominant_trait": "strength"}'

Submission Checklist

  • ✅ Live frontend Cloud Run URL working end-to-end
  • ✅ /health returns 200 on backend
  • ✅ Olympic and Paralympic sport lists — equal visual weight side by side
  • ✅ Agent trace feed shows Profiler → Matcher → Narrator via SSE
  • ✅ All narratives use conditional phrasing ("could", "might", "appears")
  • ✅ Guardrail layer enforces conditional phrasing in code, not just prompt
  • ✅ GEMINI_API_KEY in Secret Manager, mounted to backend Cloud Run
  • ✅ BigQuery archetype_clusters — 4 rows seeded
  • ✅ BigQuery athlete_profiles — Olympedia data loaded
  • ✅ Public GitHub repo with Apache 2.0 in About section
  • ✅ 3-minute demo video — YouTube unlisted

PathFinder — Athlete Archetype Agent · Challenge 4 · Team USA × Google Cloud Hackathon Built with Next.js 14 · FastAPI · Google ADK · Gemini 2.5 Flash/Pro · BigQuery · Cloud Run Apache License 2.0

Built With

  • docker
  • fastapi
  • gemini-2.5-flash
  • gemini-2.5-pro
  • google-cloud-run
  • google-search-grounding
  • next.js-14
  • python
  • server-sent
  • tailwind-css
  • typescript
  • uvicorn
Share this project:

Updates