Inspiration
Polymarket binary prediction markets on crypto assets sit at the intersection of behavioral finance and ML. We investigated whether algorithmic trading could systematically exploit mispricings, and whether these markets exhibit the efficiency evolution predicted by the adaptive markets hypothesis.
What it does
Two-layer strategy. Layer 1: complete-set arbitrage which is risk-free profit when YES+NO < $1. Layer 2: selective logistic regression distilled from XGBoost, firing only in the final 20% of market life when yes_price is extreme, validated at 93-97% win rate on 18,717 near-expiry ticks.
How we built it
Started with exploratory analysis of 178 hours of tick data across 8,466 Polymarket markets. Identified complete-set arbitrage as a risk-free baseline. Built XGBoost on five temporal features including time remaining, yes_price, momentum, order book imbalance, and spread, achieving 74.7% CV accuracy and 0.835 AUC. Since XGBoost is not allowed at inference, distilled it into logistic regression via probability matching, achieving 0.90+ correlation between teacher and student. Trained selectively on 1,095 high-confidence samples only, forcing the model to abstain under uncertainty.
Challenges we ran into
Two main ones. First was partial fill risk in arbitrage. Early attempts with size=500 created naked YES or NO exposure when orders partially filled on thin books. Solved by sizing down to match actual book depth. Second was temporal non-stationarity. XGBoost trained on early inefficient markets degraded badly on the mature validation period. The market had evolved. Solved by abandoning learned directional signals entirely and relying only on the structural near-expiry signal which remains robust across regimes.
Accomplishments that we're proud of
Empirically documented market efficiency evolution. Built complete ML pipeline: XGBoost → knowledge distillation → selective LR → deployed weights. +$176.62 validation, 84% win rate, Sharpe 4.42.
What we learned
Simplicity is best.
What's next for Arb + momentum
Online learning for continuous retraining. Regime-adaptive sizing via HMM. Cross-asset stat arb exploiting 0.546 BTC→ETH correlation.
Log in or sign up for Devpost to join the conversation.