Inspiration
The idea that AI agents can have multiple layers of research — scanning, debating, judging, and monitoring — reminded us of how quantitative trading desks operate. We wanted to build an autonomous
multi-agent system that mirrors that process: agents that discover opportunities, argue both sides, make calibrated decisions, and continuously improve themselves using real production data.
What it does
DeepQuant is an autonomous prediction market trading swarm. 12 AI agents work together across 4 swarm architectures in a never-stopping loop:
- 3 Market Scanners run in parallel (MixtureOfAgents) pulling real prediction markets from Kalshi and Polymarket APIs, enriched with live financial data from Airbyte's Twelve Data connector (stocks,
crypto, forex, ETFs cached in DuckDB) - 4 Debate Agents (GroupChat) argue YES/NO over 3 rounds — a Research Analyst cites base rates, a YES Advocate builds the bull case, a NO Advocate tears it apart, and a Sentiment Analyst brings Reddit and
news signals
- Chief Judge + Risk Manager (SequentialWorkflow) render a calibrated verdict and apply half-Kelly position sizing
- Trades execute, portfolio updates, and a Portfolio Monitor (HierarchicalSwarm) reviews open positions against fresh data
Every LLM call is traced by Overmind, which scores agent outputs with an LLM judge, identifies underperforming agents, and generates prompt improvement suggestions autonomously. In our runs, Overmind
discovered 14 distinct agents, scored them, and flagged 6 agents with scores below 0.50 for prompt rewrites — generating specific fixes like adding structured output formats and calibration anchoring that
improved projected scores by +0.45.
How we built it
- Swarms framework — MixtureOfAgents, GroupChat, SequentialWorkflow, HierarchicalSwarm for multi-agent orchestration
- Overmind (open source) — self-hosted via Docker Compose (FastAPI + Celery + PostgreSQL). Auto-traces every agent call, runs LLM judge evaluations, generates prompt improvements and model backtesting
suggestions. Built a full /overmind terminal command system with subcommands for cost breakdowns, agent scores, and improvement suggestions — all queryable without leaving the trading terminal
- Airbyte (PyAirbyte) — Twelve Data connector syncs stock quotes, crypto prices, and ETF data into a local DuckDB cache. Agents query this cache for real market context instead of hallucinating prices. The
connector is swappable — one config change to switch data providers without touching agent code
- Kalshi + Polymarket — Direct API integration for real prediction market listings (no Airbyte connectors exist for these yet). Agents trade on real markets with real prices and real volumes
- Reddit + RSS — Live sentiment from r/polymarket, r/wallstreetbets, r/cryptocurrency and breaking news from BBC/CNBC for agent context
- Aerospike — Optional persistent storage layer for portfolio state across restarts
- Auth0 — M2M client credentials flow for secure agent authentication
Everything is open source or self-hosted. No proprietary APIs required beyond LLM keys.
Challenges we ran into
- Overmind's Celery pipeline had async event loop conflicts with the forked worker processes — the prompt improvement task crashed on attached to a different loop. We worked around this by building a
score-driven suggestion generator that analyzes agent evaluation data directly - Airbyte connector compatibility — several PyAirbyte connectors (Yahoo Finance, RSS, CoinGecko) had build failures on Python 3.12 due to legacy dependencies. We pivoted to Twelve Data which uses the
modern CDK and works cleanly - Getting real data to agents was harder than expected — the Swarms framework doesn't have a built-in data injection pattern, so we built a DataFeed layer that refreshes all sources at the top of each
trading cycle and injects formatted context into agent prompts
Accomplishments that we're proud of
- Every sponsor serves a distinct, essential role: Overmind watches and improves the agents, Airbyte feeds them real market data, Kalshi/Polymarket provide the actual markets to trade, and they all work
together in a single autonomous loop - Overmind integration is deep — not just tracing, but a full terminal command system (/overmind cost, /overmind agents, /overmind suggest) that shows per-agent cost breakdowns, evaluation scores, and
generated prompt improvements. It identified that our Ruthless No Advocate was scoring 0.00 and generated a specific fix: add POSITION/EVIDENCE/PROBABILITY output format with calibration anchoring - Agents debate with real data — the YES Advocate argues with actual BTC prices from Airbyte, real Polymarket odds, and live Reddit sentiment from r/wallstreetbets. Not hallucinated numbers
- 12 agents, 4 swarm architectures, 6 data sources, fully autonomous — Ctrl+C to pause, /overmind suggest to see how agents are improving, resume trading
What we learned
- Mixture of Agents architecture is powerful for discovery — having 3 scanners approach the same goal from different angles (user goal, contrarian, emerging trends) finds opportunities a single agent
misses - Observability changes everything — without Overmind, we had no idea our Swarm Agent Team Template was scoring 0.09. With it, we could see exactly which agents needed better prompts and what specifically
to fix
- Real data dramatically improves agent reasoning — agents went from inventing prediction markets to debating actual Kalshi/Polymarket listings with real prices, which made the Judge's probability
estimates far more grounded
- Graceful degradation matters — every integration (Overmind, Airbyte, Kalshi, Reddit) fails silently. If Airbyte is down, agents use LLM knowledge. If Overmind isn't running, trades still execute. Nothing
crashes.
What's next for DeepQuant
Log in or sign up for Devpost to join the conversation.