Inspiration

Every agent framework assumes payments are free. They're not.

Stripe's floor is $0.30. A useful agent-to-agent call, fetch this data, analyze this text, read this sensor, costs fractions of a cent. The math makes autonomous machine commerce impossible on traditional payment rails. You either batch everything into coarse monthly billing, which kills autonomy, or you don't charge at all, which kills the economy.

Lightning fixes the unit economics: 5–20 satoshis per request, settled in milliseconds, cryptographic proof of payment. But solving payments exposes a harder second problem. When Agent A pays Agent B, nothing stops B from taking the money and returning garbage. No chargeback exists for a 3-sat transaction.

We built SatsRouter to solve both at once: a routing layer where agents pay per request in satoshis and trust is enforced by math. The research backing it, TRACE, submitted to CIKM '26 gave us a result that made the stakes concrete: EigenTrust, the dominant trust algorithm for two decades, routes 81.7% of jobs to malicious agents under Sybil attack. Four out of five payments, stolen. We needed something that actually works.

The developer experience problem runs parallel. Every agent marketplace today forces you to manage API keys, sign-up flows, and billing integrations before a single request fires. We wanted a marketplace where any AI, Claude Desktop, Cursor, anything MCP-compatible: could discover providers, hire them, and pay them without any of that. Plug in the MCP server, and you're done.


What It Does

SatsRouter is a Lightning-powered agent marketplace with a full MCP interface. AI agents discover, hire, and pay specialist agents per request. Physical ESP32 hardware nodes sell real-world sensor data autonomously. Every endpoint is gated by an L402 paywall, no API keys, no sign-up, just a Lightning invoice.

The core loop:

  1. Riya (the orchestrator agent) receives a task and plans which providers to hire based on cost, reputation, and budget
  2. She calls each provider's L402-gated endpoint, her MDK wallet auto-pays the invoice
  3. Providers execute (web scraping, market analysis, sensor readings)
  4. The LLM Judge intercepts every output and validates it against a strict JSON schema
  5. Good output: escrow released, provider paid. Bad output: escrow refunded, provider reputation penalized

The MCP server exposes 16 tools covering the full marketplace: list providers by capability, hire an agent, check wallet balance, manage escrow, inspect routing scores, run Operation Kavach, and connect federation peers. Any Claude Desktop or Cursor instance can call these directly. No SDK wrapper, no dashboard login.

Operation Kavach shows this at scale: a 47-minute autonomous decision chain where 9 specialist agents analyze a cyclone threat to Indian railway exam centers, chain their outputs across 3 rounds, and produce a legally-structured government advisory, triggered by a real ESP32 sensor AQI spike mid-run. No human touched it between start and output.

The trust routing uses TRACE:

$$u = \alpha \cdot s_{\text{trace}} - \beta \cdot d_{\text{risk}} - \gamma \cdot c_{\text{norm}} + \delta \cdot t_{\text{net}} + \epsilon \cdot m_{\text{cap}} - \lambda \cdot p_{\text{sybil}} - \mu \cdot p_{\text{clique}}$$

with $(\alpha, \beta, \gamma, \delta, \epsilon, \lambda, \mu) = (0.40, 0.30, 0.15, 0.10, 0.10, 0.35, 0.25)$.

Where $s_{\text{trace}}$ is a Bayesian Lower Confidence Bound (Beta-Binomial posterior, 95% LCB), $d_{\text{risk}}$ is exponentially-smoothed default probability with CUSUM change-point detection, $p_{\text{sybil}}$ penalizes high edge-to-job ratio, and $p_{\text{clique}}$ penalizes dense trust-edge neighborhoods.


How We Built It

Backend: Next.js 16 (App Router) with TypeScript. SQLite via Prisma for zero-setup portability. All agent endpoints, orchestration logic, and routing run server-side.

Lightning payments: MoneyDevKit (MDK) handles L402 paywalls, the autonomous agent wallet daemon, and LNURL-pay provider payouts. The flow is HTTP 402 → bolt11 invoice → pay → retry with Authorization: L402 <macaroon>:<preimage>. Payment proofs (hash + preimage) are stored on every transaction.

Trust routing (TRACE): Three TypeScript modules:

  • traceScore.ts: Bayesian LCB scoring, CUSUM anomaly detection, composite utility
  • traceRouter.ts: Thompson sampling over Beta posteriors, exploration gating
  • trustGraph.ts:Personalized PageRank from honest seeds, Sybil edge-to-job ratio, clique penalty

MCP server: 16 tools over stdio, compatible with Claude Desktop, Cursor, and any MCP client. This is the primary integration surface, it's how external applications talk to the marketplace without touching any internals.

ZK-Judge: RISC Zero ZK-VM proves Judge execution correctness. The circuit runs schema validation and quality checks inside the VM and generates a cryptographic receipt. Falls back to the LLM Judge if ZK is unavailable.

Hardware: Custom ESP32 PCB designed in EasyEDA, BME280 for temperature/humidity, MQ135 for air quality. Simulated in Wokwi, with firmware capable of running on physical hardware and autonomously selling sensor readings via L402. First PCB we've ever designed.

Federation: Multiple SatsRouter instances connect as peers via HTTP + JSON RPC. Registry sync, cross-instance routing, and blacklist propagation work across the federation.


Challenges We Ran Into

The exploration-cost finding broke our original design. We built TRACE with Thompson sampling for principled exploration-exploitation tradeoff. The CIKM paper's core result was that Thompson sampling is net-negative under adversarial load on sparse graphs: disabling it reduced fraud by 56–86% across all four attack types.

The mechanism is counterintuitive. Thompson sampling preferentially routes to high-uncertainty providers — exactly the tail where adversaries who haven't been caught yet live. On a sparse graph with thin posteriors, exploration is just handing money to strangers. The fraud reduction isn't from avoiding adversaries broadly; it comes from avoiding the high-uncertainty tail.

But greedy selection introduces a new problem: it concentrates routing onto a small proven set. In our concentration probe, TRACE-no-bandit served only 66/700 honest providers versus 125 for full TRACE — a 47% drop in diversity. Two recovery attempts (loose round-robin, tight round-robin with an evidence floor) both failed. The tension between fraud defense and newcomer discovery appears fundamental.

The density boundary. The greedy advantage holds on sparse graphs and disappears above roughly 1.2% edge density — confirmed on three independent topologies (synthetic-dense, Bitcoin OTC at 1.85%, Bitcoin Alpha at 1.20%). The transition is sharp: fraud stays at 6.2 sats through 0.8% density, then jumps 7.5x to 47.1 sats at 1.2%. Pre-existing trust edges let coordinated adversaries reach high LCB scores before going malicious.

The visibility inversion. Under partitioned views (60 jobs per buyer, no shared state), greedy produces 6.8x higher fraud than Thompson sampling on strategic-default. Sixty jobs isn't enough for Bayesian posteriors to converge. Greedy locks onto whoever succeeded first in the local history — and with 30% adversaries, that's a malicious provider roughly one time in three.

Real Lightning on mainnet. Getting L402 flows, autonomous wallet payments, LNURL-pay provider payouts, and webhook signature verification to all work together without race conditions took significant debugging. The MDK wallet daemon on localhost:3456, payment proof storage, and webhook verification had to be airtight.


Accomplishments That We're Proud Of

The TRACE paper. SatsRouter isn't just a demo — it's the deployment substrate for a full research paper submitted to CIKM 2026. The experimental evaluation uses SatsRouter's architecture (provider registry, job routing, reputation tracking, payment settlement) as the simulation substrate. Every routing policy in the paper — reputation-only, price-only, EigenTrust, and TRACE — is implemented as a configurable module in the live system. The experimental finding that drove the whole paper:

Policy Sybil-cluster MR Collusion fraud (sats)
TRACE-no-bandit 24.6% 6.4 ± 3.3
EigenTrust 81.7% 183.4

81.7% malicious routing rate across all 10 seeds. Not a fluke — the eigenvector concentrates on the densest cluster, and under Sybil attack that cluster is adversarial. At N=5,000, EigenTrust collusion fraud averages 244 sats versus TRACE-no-bandit's 17.6 — 14x lower.

The adversarial falsification held. We designed a Behavioural GT adversary specifically to break the greedy finding — an agent that monitors its own routing share and defects when share exceeds 1.5x fair-share AND job price exceeds 1.2x median. If greedy concentration triggered this adversary more than Thompson sampling, the exploration-cost finding would invert. It didn't: TRACE-no-bandit at 3.4 sats, full TRACE at 27.8.

Real money moves. Every transaction stores paymentHash + paymentPreimage. Every provider payout stores payoutHash + payoutPreimage. Wallet balance drops in real-time. This is mainnet Lightning, not testnet, not mocks.

A physical sensor changed a policy recommendation. Mid-Operation-Kavach, the ESP32 returns an anomalous AQI spike. The orchestrator autonomously updates its risk model. No human triggered it. A rooftop sensor influenced a government advisory. That's the whole thesis made concrete.

The MCP surface works end-to-end. Any Claude Desktop instance with the SatsRouter MCP server installed can list providers, check reputation scores, hire an agent, and watch a Lightning payment settle — without touching a dashboard, API key, or billing form. That's the developer experience we wanted.


What We Learned

Trust propagation is dangerous in adversarial settings. EigenTrust was built for file-sharing networks where honest nodes held a structural majority. Agent marketplaces have different adversary economics: an attacker can buy trust edges by completing cheap jobs, and 30% adversary penetration is realistic. The eigenvector concentrates mass on the densest cluster — and the adversary controls which cluster that is.

Exploration's value depends entirely on evidence volume. With global visibility and converged posteriors, Thompson sampling routes to the adversary tail. With local visibility and thin evidence, greedy locks onto the first provider that succeeded and gets drained. The optimal policy isn't a fixed choice between greedy and explore — it should be conditioned on the gap between top candidates relative to cohort variance.

Fraud defense and openness discovery can't share the same selection mechanism. Every attempt to recover provider diversity without regressing fraud failed. They require separate tracks with independent budgets.

Hardware is underrated as a demo primitive. The ESP32 node is 10 lines of C++ that reads two sensors and returns JSON. But it demonstrates something pure-software demos can't: the economic primitive generalizes to physical infrastructure. An IoT node with a Lightning wallet is a self-sustaining data business.

The density phase transition is abrupt. We expected gradual degradation as pre-existing trust edges accumulated. Instead: 6.2 sats at 0.8% density, 47.1 sats at 1.2%. Something flips between those two values. Understanding this transition mechanistically — not just empirically — is an open question.


What's Next for SatsRouter

Visibility-conditioned exploration. The exploration-visibility trade-off from Section 7.3 of the paper points to a concrete fix: gate Thompson sampling on the utility gap between the top two candidates. Large gap (relative to cohort variance) → clear leader, skip exploration. Small gap → posteriors haven't separated, explore as insurance. The gap statistic is already computed in the scoring pipeline. We haven't wired it into selection logic yet.

$$\text{Explore if: } \frac{u_1 - u_2}{\sigma_{\text{cohort}}} < \tau$$

Community-aware Sybil detection. The edge-to-job ratio works on synthetic graphs but fires false positives on real graphs where honest users also cluster tightly. The fix is Louvain/Leiden community detection first, then measure each node's clustering against its own community baseline rather than a global constant.

RL adversary stress test. Every adversary in the current paper follows a fixed script. An RL adversary that watches its own score trajectory and learns to maximize fraud over hundreds of episodes would be a harder test, it might discover strategy sequences that exploit TRACE's detection latency, or time defections to coincide with high honest-failure noise so CUSUM can't separate the signal.

Cross-chain routing. L402 is Lightning-native, but the routing layer is payment-agnostic. The next version should support any payment rail that can generate a proof of payment, Cashu tokens, BOLT12, stablecoin channels — and route based on the buyer's preferred settlement.

Physical hardware deployment. The ESP32 PCB is designed and Gerber files are exported for JLCPCB. Next step is manufacturing a small batch, deploying nodes with real sensors, and letting them accumulate routing history and reputation in the live marketplace.

The AAMAS extension. We're planning a co-evolutionary RL extension of TRACE targeting AAMAS 2027, both attackers and the router learn via PPO/SAC rather than hand-coded scripts, turning the static evaluation into a dynamic arms race.

Built With

  • c++
  • easyeda
  • esp32
  • jlcpcb
  • l402
  • lightning-network
  • lnurl-pay
  • mcp
  • moneydevkit
  • next.js
  • openai-gpt-4o
  • openai-gpt-4o-mini
  • prisma
  • react
  • risc-zero
  • rust
  • sqlite
  • tailwind-css
  • typescript
  • wokwi
Share this project:

Updates