Inspiration
Most retail traders lose money because they chase confirmation bias — they find one bullish signal and ignore everything else. Professional trading desks avoid this
with structured debate: analysts pitch, risk managers challenge, and backtests ground-truth the thesis. We wanted to bring that same adversarial rigor to individual
investors using multi-agent AI.
What it does
Polybot runs six specialized AI agents in a cyclic pipeline to analyze any stock ticker. A Market Context agent pulls real price data and technicals, a Sentiment
agent classifies news headlines, an Alpha Generator proposes a trade, a Devil's Advocate tears it apart, a Backtest Validator checks if similar setups have
historically worked, and a Risk Manager sizes the position using Kelly criterion math. If the proposal is weak, the pipeline loops back for revisions — up to twice
before forcing a HOLD. Every agent's reasoning streams to a split-pane UI in real time so you can watch the debate unfold. Approved trades execute as bracket orders
on Alpaca's paper trading platform.
How we built it
The backend is a FastAPI server orchestrating agents through LangGraph with cyclic conditional edges. Google Gemini (via Vertex AI) powers the reasoning agents with
structured output schemas so responses are typed, not parsed from free text. Market data and execution flow through Alpaca's API. The backtest engine uses k-means
clustering and KNN pattern matching over historical bars replayed through vectorbt. The frontend is a Next.js app with React Flow for the agent graph visualization,
WebSocket streaming for live token output, and a split-pane "Cross-Examination Terminal" showing the Alpha vs. Adversarial debate side by side.
Challenges we ran into
Getting the Adversarial agent calibrated was the hardest part. Too aggressive and it vetoes everything — too lenient and it becomes a rubber stamp. We had to tune the
system prompt, add deterministic escalation rules (e.g., earnings within the hold window automatically triggers a CRITICAL flag), and cap the revision loop to
prevent infinite debates. Backtest validity was another challenge: ensuring strict point-in-time data filtering so agents can't accidentally peek at future prices
during evaluation, and handling tickers with insufficient historical matches gracefully.
Accomplishments that we're proud of
In our evaluation framework, Polybot's approved BUY signals averaged +2.45% excess return over the S&P 500, with the Risk Manager correctly rejecting 75% of proposals that didn't meet its criteria — proving the adversarial architecture works as intended. The system rejected trades for concrete, auditable reasons (stop too tight relative to ATR, insufficient backtest matches, negative Kelly fraction) rather than vibes. The full agent reasoning is transparent and streamable, so you never have to trust a black box.
What we learned
Structured disagreement produces better decisions than consensus-seeking. Forcing a dedicated adversarial agent into the pipeline caught edge cases that a single
monolithic LLM would have confidently missed. We also learned that position sizing math (Kelly criterion, portfolio heat caps) matters more than signal accuracy — a
great signal with bad sizing still loses money. Finally, building evaluation infrastructure early (our run_eval backtester) was essential for iterating on agent
prompts with real feedback instead of guessing.
What's next for Polybot
Adding a live earnings calendar integration via Finnhub so the Adversarial agent can flag upcoming catalysts automatically. Expanding the backtest engine with more
similarity algorithms and a Parquet cache for faster historical lookups. Building a watchlist mode that runs the pipeline nightly across a user-defined set of tickers
and surfaces only high-conviction opportunities. And eventually, once we've built enough confidence from paper trading results, exploring cautious live execution
with strict position limits.
Built With
- fastapi
- langgraph
- python
- react
- typescript
Log in or sign up for Devpost to join the conversation.