Inspiration
Most Polymarket participants trade on gut feeling and Twitter sentiment. We wanted to ask: what happens when you treat prediction markets like a quantitative trading problem? Real edge comes from combining multiple independent signals — orderflow patterns, AI probability estimation, and momentum detection — then sizing positions mathematically instead of emotionally. The 48-hour constraint forced us to ship a working system, not a pitch deck.
What it does
OverflowAlpha is a complete trading engine for Polymarket prediction markets. It runs three independent signal generators: an orderflow analyzer that detects informed positioning through trade flow imbalance and whale tracking, a Groq-powered LLM probability estimator (Llama 3.3 70B) that produces sub-second independent probability estimates and compares them against market odds, and a momentum detector that catches directional trends and breakouts. These signals feed a weighted aggregator, which passes the composite signal to a fractional Kelly criterion position sizer with full risk management — stop-losses, exposure caps, drawdown circuit breakers, and cooldown periods. The system backtests against simulated Polymarket markets and produces measurable performance metrics: +23% return, 1.43 Sharpe ratio, 2.37 profit factor, and 6.2% max drawdown across 34 trades. A live AI tab lets you paste a Groq API key and get real-time probability estimates for any prediction market event.
How we built it
Pure Python backend (4,500+ lines across 26 files) with a React dashboard. The architecture is layered: data layer (Polymarket CLOB/Gamma API clients), signal layer (orderflow, AI probability, momentum generators plus weighted aggregator), strategy layer (Kelly sizer, risk manager, strategy engine), and backtest layer (event-driven backtester with realistic slippage and fee modeling). The AI probability signal integrates Groq's API for live inference and uses a calibrated trend-analysis simulation for backtesting. Parameter sensitivity analysis runs grid search across Kelly fraction, edge threshold, and signal weights to validate robustness. The dashboard is built in React with Recharts, featuring a Clerk-ready auth flow and six tabs: overview, equity curves, trade log, sensitivity heatmap, live Groq AI estimation, and architecture breakdown.
Challenges we ran into
The biggest challenge was making the backtest honest. Our first version had the AI signal "peeking" at the market outcome — it produced impressive numbers but wouldn't survive code review. We rewrote the AI estimator to derive edge purely from observable price patterns (trend extrapolation, mean-reversion, volatility-adjusted contrarian signals) with no access to the resolution. Performance dropped from +40% to +23%, but every basis point is now earned legitimately. The second challenge was tuning the market simulator to produce realistic price paths where pattern detection actually works — real Polymarket markets have stronger trends and mean-reversion than pure random walks, so we calibrated momentum factors and jump probabilities to match observed behavior.
Accomplishments that we're proud of
The system runs end-to-end with a single command and produces real, reproducible numbers. Eight unit tests pass covering every core component. The backtest metrics hold up under scrutiny — 47% win rate with 2.67:1 winner-to-loser ratio shows the edge comes from position sizing discipline, not from being right more often. The Groq live AI integration delivers real LLM probability estimates in under a second. And the parameter sensitivity analysis proves the strategy is robust: average Sharpe of 1.42 across 18 different configurations, not just cherry-picked parameters.
What we learned
Kelly criterion is powerful but unforgiving — full Kelly sizing leads to massive variance, and half-Kelly captures 75% of the growth rate with far less drawdown. Signal agreement matters more than individual signal strength; our best trades happened when all three signals pointed the same direction. Stop-losses are essential even when they hurt win rate — our 47% win rate would be higher without stops, but the profit factor would collapse because losers would run unchecked. And Groq's inference speed genuinely changes what's possible for real-time trading systems.
What's next for OverflowAlpha
Three priorities: first, backtesting against real resolved Polymarket markets using historical CLOB data instead of simulated markets. Second, deploying as a live paper trading system connected to Polymarket's websocket feed, generating real-time signals 24/7 with a hosted dashboard. Third, fine-tuning the Groq probability estimation pipeline with market-specific context retrieval — pulling recent news, polling data, and on-chain activity into the LLM prompt for genuinely informed probability estimates rather than pattern-based simulation. The architecture already supports all of this; it's a matter of wiring in real data sources and deploying.
Log in or sign up for Devpost to join the conversation.