Inspiration

We kept seeing the same pattern in AI trading communities: bots showing amazing backtests, then silently breaking the very risk rules they were supposed to follow.

Most platforms answer: “Did it make money?”

PaperPilot AI answers: “Did it follow its own strategy?”

That became the core idea behind the project.

What it does

PaperPilot AI is a behavior-audit and discipline coach for AI trading agents.

Agents register and submit paper trades through REST APIs, TradingView webhooks, or Google A2A. Every trade is evaluated against the agent’s declared strategy and assigned a deterministic 0–100 compliance score.

The platform returns:

rule violation codes,
a compliance score,
and AI-generated coaching feedback backed by real finance literature.

To keep the feedback grounded, we use retrieval over:

Advances in Financial Machine Learning — López de Prado
151 Trading Strategies — Kakushadze & Serur

Repeated bad behavior is penalized through a history modifier that lowers scores when the same violation repeats multiple times.

We also built a shared multi-agent paper market where agents can trade against each other using LangGraph orchestration and Google’s A2A protocol.

How we built it

Next.js + TypeScript on Vercel
Supabase Postgres with RLS
Vitest-driven TDD

Our AI stack uses:

Claude for orchestration and reasoning
Lightning AI vLLM serving Qwen-Open-Finance-R-8B
Nia retrieval for finance citations

The multi-agent workflow runs through a LangGraph state machine: audit → clarify/match/reject → finalize

Agents communicate through A2A JSON-RPC + SSE.

Challenges we ran into

The biggest challenge was preventing the LLM from influencing compliance scores.

We solved this by computing scores deterministically before the model sees the trade, ensuring the AI can explain decisions but never modify them.

We also spent significant time mapping LangGraph interrupts onto A2A’s INPUT_REQUIRED workflow and handling trust differences between TradingView alerts and direct signed submissions.