Inspiration

Soccer analytics is full of vanity stats, possession percentage, shots on target, that describe a match without predicting it. We wanted to know if there was a single, well-validated number that actually tracked who controlled a game, independent of whether the ball happened to bounce in. Action-level possession-value models (VAEP) promised exactly that, but we'd never seen them applied across a full tournament and checked against real results at scale. The World Cup, with its high stakes and dramatic comebacks and upsets, was the perfect stress test.

What it does

UNDER PRESSURE scores every pass, carry, and shot from WC 2018 and WC 2022 using VAEP (Value of Actions by Estimating Probabilities), then asks two questions: does the team with higher match-average VAEP actually win (yes, 95% of decisive matches), and which teams hold up, or even elevate, when they're behind on the scoreboard, versus which ones collapse? The app surfaces this through an interactive scatter plot quadrant model (Elite, Pretenders, Grinders, Fragile), a per-team pressure-resilience breakdown, a match-by-match VAEP browser, and a live, if coarse, proxy for the in-progress 2026 tournament.

How we built it

The pipeline pulls StatsBomb's open event data, converts it to SPADL format, and fits a VAEP model locally with socceraction (XGBoost under the hood) to score every on-ball action. From there we engineer team-level features: possession-quality index, score-state-conditioned VAEP rates, stage retention. Then we run the validating regression (match-VAEP-accuracy as the primary claim, a PRS-vs-FIFA-rank logistic regression as a secondary, honestly-reported check). All of that runs offline in a five-stage pipeline; a FastAPI backend just loads the resulting parquet/JSON at startup and serves it with zero runtime model inference. The frontend is a React and Vite SPA with Recharts for the data viz, a fully custom dark design system, and a live 2026 proxy pulled from football-data.org.

Challenges we ran into

The biggest one: our first hypothesis, that pressure resilience (PRS) predicts who survives a tournament better than FIFA rank, failed. Rigorously. We could have buried that or reframed PRS to look better; instead we reported it honestly as a secondary, descriptive metric and pivoted the headline claim to the metric that actually held up under scrutiny (match-VAEP-accuracy). Technically: socceraction's installed API didn't match what we expected (no pretrained model load, so we had to fit our own), a subtle off-by-one in score-state tagging double-counted a team's own goals in their pre-goal state, and football-data.org's free tier only gives half-time and full-time scores, not a goal timeline, which limited how precisely we could detect in-match comebacks for the live 2026 data.

Accomplishments that we're proud of

Shipping an honest negative result alongside a strong positive one, instead of cherry-picking. Getting a real, statistically meaningful, cross-validated finding (95% consistent across two independent tournaments) rather than an overfit one-tournament fluke. And building a genuinely explorable interface around it: every number on the site is clickable into the match or team that produced it, not just a static report.

What we learned

Validating a metric against ground truth, and being willing to discard the part that doesn't hold up, produces a much stronger story than forcing a single flashy number to do too much work. Also: free-tier live sports APIs are far coarser than they look on the marketing page, and designing around that limitation transparently, in the UI, matters more than hiding it.

What's next for UNDER PRESSURE

Re-run the VAEP and PRS pipeline directly on 2026 event data once it's available (replacing the half-time and full-time proxy), extend the quadrant model with multi-tournament historical depth (we already have EURO 2024 and Copa America 2024 loaded as an exploratory check), and add head-to-head match prediction using match-level VAEP as the primary feature.

Built With

Share this project:

Updates