PolyHedge: TradFi Risk Infrastructure for Prediction Markets

Inspiration

Prediction market traders are flying blind. Options desks have had hedge scanners, scenario replay, and Monte Carlo simulation for decades, but none of it exists for binary markets. The math is identical though: a YES contract at $0.60 is a binary call option with a $1 payoff. We asked: what if you applied the full derivatives risk stack to Polymarket?

Specifically, we wanted to answer two questions:

First: I have a thesis. How do I find every market it touches?

Second: I have a position. How do I build a strategy around it, and does it actually hold up?


What We Built

A six-tab analytics pipeline, used left to right like a real trading workflow:

  1. Position Input — Enter a position (stake, entry price, side)
  2. Correlation Scanner — Scan 1,000 markets for statistically significant co-movement using a full correlation pipeline: Pearson on logit returns, rolling 20%-window correlation, ±10-day cross-correlation, Granger causality, and CUSUM structural break detection
  3. Market Comparator — Compare two markets side-by-side with price history, volume activity, and AI-powered spike investigation
  4. Hedge Scanner — Huber IRLS regression on spike events gives a minimum-variance hedge ratio robust to outliers, with 250-round bootstrap confidence intervals streaming live via SSE
  5. Strategy Builder — Build strategies using logit-space correlation-adjusted EV curves:

$$p_B = \sigma!\left(\text{logit}(p_{B_0}) + \rho \cdot \bigl(\text{logit}(p_A) - \text{logit}(p_{A_0})\bigr)\right)$$

6. Stress Test — Validate with three methods: paired-resample Monte Carlo (2,000 paths), historical scenario replay with Conditional Hedge Effectiveness, and walk-forward OOS validation.

Key Technical Decision: Logit Space

Raw price differences on [0,1]-bounded processes are non-stationary near resolution. Naive Pearson on raw prices is nearly meaningless. Mapping with

$$\text{logit}(p) = \ln\frac{p}{1-p}$$

sends prices to the real numbers and makes correlations, regressions, and cumulative returns well-behaved across the full range. This single transform underpins every statistical method in the codebase.


Challenges

The biggest was spurious correlation from resolution convergence. When two unrelated markets both drift toward 0% or 100% near their end dates, naive Pearson flags them as highly correlated. We penalize this explicitly by discounting correlation mass that accumulates in the final 10% of a market's life.

On the infrastructure side:

  • Railway's nixpacks builder failed to locate Python, fixed with a custom Dockerfile
  • The CLOB API only returns ~24h of trade data, so we derived historical volume activity from hourly price history instead
  • Recharts click handlers silently fail before a first hover event, requiring an activeTooltipIndex fallback

What We Learned

Building this deepened our appreciation for how much the options world takes for granted: liquidity models, vol surfaces, portfolio-level simulation. The math transfers cleanly to prediction markets. The data infrastructure does not yet.

Built With

Share this project:

Updates