Inspiration Polymarket processed over $43 billion in trading volume in 2025. Most of it was placed manually. We wanted to know what happens when you replace human intuition with a structured AI debate. The Orderflow 001 hackathon gave us 48 hours to find out.
What It Does Scans Polymarket every 60 seconds. Runs each market through a five-model AI ensemble. Trades when the models agree. Each model returns a probability estimate and confidence score. The system takes a weighted average across all five models — forecaster carries 30%, bull and news analysts 20% each, bear researcher and risk manager 15% each. A trade only fires when the weighted confidence crosses 40% AND the spread between model estimates is below 4% variance. If the models disagree too strongly, the trade is skipped regardless of average confidence. Edge is the difference between the AI probability estimate and the current market price. Minimum 2% edge required to proceed. Position size uses quarter-Kelly — balancing between maximum growth and drawdown protection. Hard cap at 5% of balance per trade. Everything logs to SQLite. Live Streamlit dashboard. Runs entirely on free Groq and Gemini keys — no wallet needed for paper trading.
How We Built It Built the exchange layer around Polymarket's CLOB API and ERC-1155 token model. Each YES and NO outcome is a separate on-chain token — prices are fetched per token ID, not per market ID. Getting that distinction right was the first real challenge. Added a five-model AI ensemble with free-tier routing — Groq handles the forecaster, bull researcher, and risk manager roles while Gemini covers bear researcher and news analyst. Paid-tier models (Grok, Claude, GPT-4o, DeepSeek) slot in automatically when keys are present. Built a backtesting engine that fetches real resolved Polymarket markets from the Gamma API and replays signals against actual outcomes. Category scores weight historical performance: 40% ROI, 25% recent trend, 20% sample size, 15% win rate. Categories scoring below 30 are hard-blocked regardless of AI confidence. Stack: Python, aiosqlite, Streamlit, py-clob-client, Groq SDK, Gemini API, httpx, structlog.
Challenges We Ran Into Price = 0 bug. The bot was exiting every position at zero, logging guaranteed losses on every trade. Root cause: a unit mismatch — the code divided prices by 100. Polymarket already returns prices in the range 0 to 1. So a price of 0.55 was becoming 0.0055, producing a loss of approximately 0.54 per contract on every single trade regardless of outcome. Windows emoji crashes. Logger calls with emojis crashed the Windows cp1252 terminal mid-execution — right at the moment of logging trade approvals, before saving them to the database. Wrote a script to strip every emoji from every Python file. Free-tier rate limits. Five models, five markets, 25 API calls per loop. Groq allows roughly one request per 6 seconds, making the minimum loop time around 150 seconds. Added per-provider throttling and exponential backoff on 429 errors to stabilise the loop. Accomplishments That We're Proud Of The bot actually works. Given the scope of what we built in 48 hours — a five-model ensemble, Kelly sizing, category scoring, a backtesting engine, and a live dashboard — getting it to the point where it finds real Polymarket markets, debates them with multiple AI models, and logs verified paper signals with correct PnL calculations is genuinely satisfying. Backtest results over 50 resolved Polymarket markets: MetricValueROI+163.5%Win rate83.8%Sharpe ratio6.63Max drawdown5.2% The entire demo runs on two free API keys. Clone the repo, add Groq and Gemini keys, have a live bot scanning real Polymarket markets in under ten minutes.
What We Learned The model variance filter is the most important gate in the system. It rejects trades where models disagree by more than 20% — which turns out to be exactly where the bad trades live. Category blocking matters more than confidence thresholds. An 80% confident bot still loses money on efficiently-priced economic markets. The scoring system that identifies and blocks those categories is what keeps the strategy profitable over time.
What's Next for Autonomous Polymarket Bot WebSocket streaming for real-time price updates. Live ensemble output wired directly into the backtester for honest edge measurement. Dynamic Kelly sizing that updates continuously as new information arrives — where both the AI probability estimate and the market odds update in real time rather than at each 60-second scan interval. The architecture is ready. The edge is real. Next step is scaling it.
Built With
- aiosqlite
- groq/gemini-llms
- on-chain
- pandas/numpy
- polygon-+-eth-account
- polymarket-apis-(clob-+-gamma)
- py-clob-client
- python-(asyncio)
- sqlite
- streamlit-+-plotly
- structlog
Log in or sign up for Devpost to join the conversation.