Crypto Hedge Fund — Autonomous Multi-Agent Trading Platform
A hackathon project: a fully autonomous, multi-agent crypto trading fund where Claude plays the role of every research analyst and decision-maker in a real hedge fund's org chart — running live, 24/7, on real market data, in paper-trading mode. Crypto was selected as the asset of choice because the competition occurred on a weekend when most other financial markets were closed. For diversity, more asset classes will be added during the coming trading week
1. The Pitch
We built a hedge fund that runs itself. Instead of one model "talking about" trading, we modeled a real fund's org chart as a pipeline of specialized Claude agents — a Sentiment Analyst, an On-Chain Analyst, a CIO, a Portfolio Manager, a Risk Manager, a Compliance Officer — each with a narrow mandate, its own system prompt, and no authority outside its lane. They hand off structured JSON, not prose, to each other every 5 minutes, 24/7, against live Coinbase/OKX market data. Hard risk limits (drawdown halt, daily-loss halt, exposure caps, kill switch) are enforced in deterministic Python code, never by an LLM — the agents can recommend, but they cannot override a hard stop. It's currently live, paper-trading 10 crypto assets, with a real-time dashboard showing every signal, decision, and trade as it happens.
2. The Problem We're Solving
LLM trading demos are usually a single prompt: "here's some data, what should I buy?" That's a toy. A real fund has:
- Specialized roles — a macro strategist doesn't size positions, a risk officer doesn't generate ideas, a compliance officer doesn't take market views.
- Hard, non-negotiable limits — no fund lets a model freelance past a drawdown limit because it "felt confident."
- Auditability — every dollar moved has to trace back to why.
We wanted to know: can you actually architect a system of LLM agents that respects those constraints — where the boundaries of authority are as important as the intelligence — and have it run unattended, continuously, against real data?
3. System Architecture
3.1 The Agent Pipeline (runs every 5 minutes, fully automated)
┌─────────────────────────── PHASE 1: INGESTION (parallel) ───────────────────────────┐
│ Market Data (Coinbase) │ News Sentiment (NewsAPI) │ On-Chain (OKX) │
└───────────────────────────┴──────────────────────────────┴────────────────────────────┘
│
┌─────────────────────────── PHASE 2: RESEARCH (parallel) ─────────────────────────────┐
│ Momentum Analyst │ Sentiment Analyst (Claude) │ On-Chain Analyst (Claude) │
│ (deterministic: RSI, │ reads headlines, scores │ reads funding rates/OI/ │
│ MACD, Bollinger Bands) │ conviction per asset │ liquidations │
└───────────────────────────┴──────────────────────────────┴────────────────────────────┘
│
SignalBatch (typed JSON)
│
┌─────────────────────────── PHASE 3: DECISION CHAIN (sequential) ─────────────────────┐
│ CIO (Claude) → Portfolio Manager → Risk Manager → Compliance │
│ reads ALL signals, (Claude) proposes (deterministic, (deterministic, │
│ sets market regime & trades, sized by code-enforced code-enforced rule │
│ posture multiplier CIO's posture hard limits) book checks) │
└───────────────────────────┴──────────────────────────────┴────────────────────────────┘
│
Execution → Paper Fill → Portfolio State Update
│
Broadcast to live dashboard (WebSocket)
The key design decision: research and ideation are LLM-driven; capital preservation is not. A model can suggest a trade. It cannot approve its own trade, size it without a formula, or override a drawdown halt — those are plain Python.
3.2 Hard Risk Rules (code, not prompts)
| Rule | Trigger | Action |
|---|---|---|
| RISK_001 | Drawdown from session high > 5% | Kill switch, force-close all positions |
| RISK_002 | Daily loss > 3% | Halt new positions for the rest of the day (sticky, survives P&L recovery) |
| RISK_003 | Single position loss > 2% of portfolio | Block adding to the loser |
| RISK_004 | Single-asset allocation > 20% | Resize down to the cap |
| RISK_005 | Total long exposure > 80% | Reject additional longs |
| RISK_006 | Kill switch active | Reject everything |
Plus a separate 6-point compliance rule book (macro-event blackout windows, exchange status checks, minimum liquidity, duplicate-order prevention, paper-trading confirmation, market-impact caps) — all deterministic, zero LLM involvement.
3.3 Tech Stack
| Layer | Technology |
|---|---|
| Agents / LLM | Claude Sonnet 4.6, Anthropic Python SDK |
| Backend | Python 3.12, FastAPI, asyncio |
| Scheduling | APScheduler (5-min tick loop) |
| Data | TimescaleDB (OHLCV time-series), Redis (live state, signal cache) |
| Market data | ccxt → Coinbase (public, unauthenticated) |
| On-chain data | OKX public API (funding rate, open interest, liquidations) |
| News sentiment | NewsAPI.org |
| Frontend | React 18 + TypeScript + Vite, Tailwind, Recharts, Zustand, React Query |
| Live updates | WebSocket push from backend → dashboard |
| Testing | pytest, 232 tests, all passing |
4. What Makes This Non-Trivial (the engineering, not just the prompt)
A few things that bit us — and the fixes — are worth highlighting to judges, because they show this is a system, not a demo wrapper:
- LLMs don't reliably follow "JSON only." Every analyst occasionally wraps its
output in markdown fences, or adds explanatory prose before/after the JSON, despite
being told not to. We built a robust extractor (
strip_json_fences) that scans for the actual JSON value anywhere in the response and discards the rest — applied uniformly across all four LLM-output parsing sites. - A naive "fallback to safe defaults on any parse failure" silently breaks the system. Early on, the CIO agent was always defaulting to its most conservative regime — not because the market was bad, but because minor formatting hiccups kept triggering full fallback, discarding the model's real (and often good) judgment. We rebuilt the recovery path to normalize field-name variations and salvage the model's actual reasoning instead of nuking it.
- Paper-trading accounting bugs are real bugs. Our first implementation tracked positions by USD cost basis only — selling a position settled at the entry price, not the market price, silently destroying every realized gain or loss. We caught this with a deliberate buy → price-move → sell regression test and rebuilt the engine around base-asset quantity tracking (the correct way to model a position).
- Free data sources don't replace paid ones cleanly. Our on-chain data provider (CoinGlass) turned out to gate funding rate and open interest behind a paid plan we didn't have — it returned HTTP 200 with a "upgrade your plan" message disguised as data. We migrated to OKX's free public market-data API and verified real funding rate / open interest / liquidation data end-to-end.
- A safety-critical check that vanishes under
-O. The "never place a live order" gate was originally a bareassert— Python silently strips those when run with optimizations enabled. Replaced with explicit, non-strippable exceptions.
We treat this list as a feature, not a confession: a hackathon project that ships with a real bug backlog and fixes is more credible than one that claims it "just worked."
5. Live Demo — Real Numbers (as of this build)
- 232 automated tests, all passing, covering every agent boundary and risk rule
- 58 ticks completed autonomously today, 155 real signals generated, zero manual intervention
- 10 crypto assets tracked end-to-end: BTC, ETH, SOL, BNB, XRP, ADA, AVAX, DOT, POL, LINK
- Currently regime:
RANGING/ postureNEUTRAL— the CIO has been correctly conservative because signals haven't corroborated across analysts yet (the system is designed to wait for agreement rather than force a trade) - Full audit trail: every signal, regime call, proposed trade, risk decision, compliance check, and fill is logged with a complete provenance chain
Dashboard Pages
- Overview — live portfolio value, P&L, drawdown, kill-switch button (with a type-to-confirm modal), agent health strip
- Signal Feed — live stream of every signal as it's generated, filterable by asset/analyst/confidence
- Trade Log — every trade's full lifecycle (proposed → approved/rejected → filled) with expandable provenance
- Backtesting — replay the full agent pipeline against historical data, with Sharpe/Sortino/drawdown/win-rate metrics and equity curve charts
- Agent Monitor — per-agent latency, error counts, live activity log, current CIO regime
6. Safety Posture
- Paper trading only, enforced in code at two separate layers (execution agent + paper engine), not just configuration — and the check can't be silently disabled.
- No code path anywhere in the system is capable of submitting a real order; market data fetches are unauthenticated/read-only even where API keys exist.
- Kill switch is checkable three ways (file flag, Redis flag, API call) and once triggered cannot be cleared without a manual, deliberate reset.
7. What We'd Build Next
- Persist full signal/trade provenance to TimescaleDB (currently Redis/WebSocket only — survives a restart, but isn't queryable historically yet)
- Unify the backtest and live execution code paths so they can never silently diverge
- A second on-chain data source for redundancy now that we've proven the pattern with OKX
- Real (not paper) execution behind an explicit, separately-gated feature flag — by design, currently impossible without a deliberate code change
- Add more asset classes to diversify away from crypto now that markets are reopening for trading
Built With
- claude
- docker
- python
- react
- redis
- timescaledb
- typescript
Log in or sign up for Devpost to join the conversation.