Natural language query meets smart filters — 20 candidates ranked by AI match score in seconds.
Full execution trace — 5 agents, 4 tool calls, timing breakdown per agent layer.
90% similarity score — Gemini explains exactly why the profiles match tactically.
Side-by-side attribute radar, head-to-head stats, and an AI-generated scouting verdict.
Formation view with budget tracker and squad chemistry radar — one vacancy, agent-recommended fill.
Active shortlist with xG+xA, value, and an agent budget analysis at the bottom.
Scout → Compare → Act → Memory. Four steps from query to shortlist.
Multi-agent planning in real time — Planner and Scout agents decompose your brief live.

Scout Agent — Football Scouting, Powered by Gemini

Inspiration

Honest answer? It started as a hackathon idea. Both of us follow football closely — Savya's a Barca guy, Abhinav has his own opinions about that — and football data has always fascinated us. But we're not going to pretend we'd interviewed a dozen professional scouts before sitting down to build this. The real spark was simpler:

What if you could just ask "who can replace Pedri for under €40M" and get a genuinely reasoned answer, not a spreadsheet?

That question felt worth a weekend.

How We Built It

Scout Agent is a RAG pipeline on top of the EA FC 26 Player Database — over 18,000 players across all major leagues, each described by 110+ attributes — with Gemini handling the reasoning layer. Player profiles and stats are stored in MongoDB Atlas with 2 Atlas Search indexes, enabling fast document retrieval across the player corpus alongside our FAISS semantic search layer.

Step 1 — Normalise

Raw player attributes are wildly different in scale — OVR sits around 75 while Stamina might be 85 and Acceleration 70. Before embedding anything, we normalise every metric so no single attribute dominates the similarity calculation:

$$\tilde{s} = \frac{x - \mu}{\sigma}$$

Step 2 — Retrieve

Player chunks are embedded with sentence-transformers and stored in FAISS. At query time, we score candidates by cosine similarity between the query embedding $\vec{q}$ and each player vector $\vec{d}_i$:

$$\text{sim}(q, d_i) = \frac{\vec{q} \cdot \vec{d}_i}{|\vec{q}| \cdot |\vec{d}_i|}$$

Only the top-$k$ results get passed forward — this is the step we got wrong early on (more on that below).

Step 3 — Reason

Retrieved candidates are handed to Gemini Pro with a structured scouting prompt. It produces a ranked report — tactical fit, stat breakdown, transfer value estimate — grounded strictly in the retrieved context.

Step 4 — Visualise

Output is rendered as a formation card (4-3-3 in our demo) with per-player stat cards showing key metrics and value estimates.

Layer	Technology
LLM	Gemini Pro
Embeddings	sentence-transformers
Vector Store	FAISS
Database	MongoDB Atlas
Backend	FastAPI + Google Cloud Run
Frontend	Streamlit
Infra	Vertex AI, Google Cloud Storage
Data	EA FC 26 Player Database (Kaggle, ~18K players)

What Surprised Us

The player replacement queries worked better than we expected. Asking "find me a profile similar to Pedri but cheaper" — Gemini didn't just pattern-match on stats, it reasoned about why the match made sense tactically. That was the moment it stopped feeling like a demo and started feeling like an actual tool.

The scale of the EA FC 26 dataset also helped here more than we anticipated. Having 18,000 players indexed meant the retrieval step consistently surfaced genuinely obscure candidates — players from the Eredivisie or the Brazilian Série A that a traditional scouting workflow might never reach.

The Hardest Part

Gemini API tokens getting burned through faster than expected. Early on we were passing full player corpora as context instead of just the retrieved chunks — effectively ignoring the retrieval step entirely. The fix was obvious in hindsight: enforce a strict top-$k$ cutoff before any API call:

$$\text{context} = {d_i : \text{sim}(q, d_i) \geq \tau}, \quad |\text{context}| \leq k$$

where $\tau$ is a minimum similarity threshold. Classic RAG mistake, painful lesson, won't make it again.

What We Learned

Retrieval quality matters more than the LLM. Gemini is good enough that if you give it the right context it will do the right thing — the hard work is making sure the right 5 players show up in that context window, not the wrong 50.

We also learned that game ratings are a surprisingly defensible proxy for real scouting profiles. EA's scouts rate 18,000 players across 110 attributes — Composure, Vision, Positioning, Aggression — attributes that real scouts actually care about. It's not Opta. But for a weekend build, it was the right call.

What's Next

Live data pipeline from FBref and StatsBomb, voice queries, and proper PDF export for scout reports. And a transfer value model worth the name — right now we use player potential and age to estimate market value:

$$\hat{V} \propto \tilde{s} \cdot e^{-\lambda (a_p - a_{\text{peak}})}$$

where $a_p$ is the player's age and $a_{\text{peak}}$ is the position-adjusted peak age. Right now we're eyeballing €40M. The next version won't be.

Built by Savya and Abhinav at the Google Cloud Rapid Agent Hackathon.

Built With

faiss
fastapi
google-cloud
google-cloud-run
google-gemini-pro-api
mongodb-atlas
numpy
pandas
python
sentence-transformers
streamlit
vertex-ai

Updates

Savya Raj started this project — Jun 11, 2026 04:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.