Inspiration

By 2026 agentic commerce shipped - ChatGPT Instant Checkout, Mastercard Agent Pay, Visa Intelligent Commerce, Stripe's machine-payments work. Everyone is racing to solve agent identity and payment credentials. Almost nobody is solving the load-bearing problem underneath: merchant backends were never built for software that buys at machine speed — concurrent, retry-heavy, and multi-region. At that speed they oversell scarce stock, double-charge on timeouts-and-retries, and silently blow past spend limits.

We wanted to build the correctness layer the agentic-commerce wave actually runs on — and to build it on the one database whose model is genuinely made for it: Amazon Aurora DSQL.

What it does

ZeroRace is a checkout kernel: one idempotent ACID transaction that, atomically, claims inventory, debits a spend mandate, writes a balanced double-entry ledger, and emits a transactional outbox event. Under a 1,000,000-agent swarm against 100 units it sells exactly 100, and four invariants hold — each provable with a single SQL query, not a dashboard counter:

$$\text{oversells} = \text{duplicate settlements} = \text{ledger drift} = \text{mandate breaches} = 0$$

No oversell is structural. Stock is sharded across random-keyed buckets and claimed with a conditional decrement the database refuses to take negative, with CHECK (available_qty >= 0), so unit 101 of 100 is impossible. Ledger drift is provably zero because every commit writes two signed legs that sum to nothing:

$$\sum_{i} \text{signed amount}_i = 0$$

And it isn't only a demo — it's a real, self-serve product: sign up, get an API key, POST /v1/purchases/commit, with a tenant-scoped console to create your own merchants, mandates, and inventory. You can drive the kernel yourself in a live playground and watch which constraint binds while the invariants stay at zero.

How we built it

  • Truth core — Aurora DSQL. PostgreSQL-16-compatible, optimistic concurrency (OCC) with snapshot isolation. The kernel uses raw node-postgres for explicit BEGIN/COMMIT/ROLLBACK and conditional UPDATE … WHERE with rowCount checks — the row-count is the no-oversell / mandate logic.
  • Sharded everything. Inventory pre-split across 64 random buckets; the spend mandate split across 20 budget buckets. No hot row anywhere, which is exactly what DSQL asks for.
  • Idempotent + double-entry. Keyed on (merchant_id, idempotency_key) under a UNIQUE index; every commit writes a MANDATE_RESERVED -amount and MERCHANT_PAYABLE +amount ledger pair.
  • Read plane — DynamoDB. A transactional outbox row is written in the same DSQL txn; a separate projector ships it to DynamoDB exactly-once, feeding the live Mission Control dashboard over SSE.
  • Multi-region. A real peered cluster — Tokyo ap-northeast-1 + Seoul ap-northeast-2, witness Osaka — two strongly-consistent write endpoints with app-layer failover.
  • Deployed on Vercel via the AWS Marketplace integration using OIDC federation — no static credentials; DSQL IAM auth tokens are minted from a Vercel-issued OIDC token at runtime.
  • A real agent. An MCP server + the Anthropic SDK (claude-opus-4-8) shops a limited drop and is declined at the 9th buy by the same kernel.
  • The product layer. tenants + api_keys, Bearer auth, tenant-scoped /v1 resource APIs, a Next.js console, and a self-resetting public sandbox — all additive on the same schema.

Challenges we ran into

  • The budget hot row. First honest 10k run: 100 sold, but only 2 commits/sec with 303 RETRY_EXHAUSTED. We'd sharded inventory but left every winning commit decrementing one mandate row — under OCC those serialize and conflict-storm. Fix: shard the budget too. Result: 0 exhausted, 0 errors, ~390 OCC retries (surfaced, not hidden).
  • DSQL has no TRUNCATE and a per-transaction row limit. Re-seeding between runs blew the cap on a single DELETE. Fix: batched deletes under the limit.
  • A pool-exhaustion deadlock our own adversarial review caught — the kernel re-read the idempotency registry on a second pooled connection while holding the first; under same-key contention every slot waited on a connection that never freed. Fixed by reusing the held client.
  • An OIDC STS DNS storm (getaddrinfo EBUSY) when a burst opened many connections that each resolved credentials at once. Fixed by memoizing the OIDC credentials process-wide.
  • Async indexes are a correctness gate. The UNIQUE idempotency indexes are what make dupes provably zero; DSQL builds indexes asynchronously, so they must be ACTIVE before traffic.
  • No FOREIGN KEYs. Referential integrity and tenant isolation (403 cross-tenant) live in the service layer plus periodic audit queries.

Accomplishments that we're proud of

  • pnpm prove — every invariant reduces to one read-only query over the live cluster, returning oversells / duplicate_settlements / ledger_drift / mandate_breaches = 0. Correctness proven, not claimed — a judge can clone and run it.
  • Exactly 100 sold under 1,000,000 agents (0 errors, 585 bounded OCC retries), and a scaling sweep that gets faster under higher concurrency as the sold-out attempts drain in parallel.
  • Real multi-region failover mid-swarm — Tokyo to Seoul split 10/90, total exactly 100, 0 oversold, both regions reconciled into one consistent ledger.
  • A real Claude agent transacting safely and being declined atomically at its budget edge.
  • We turned the kernel into a real, self-serve, multi-tenant product — signup, API keys, console, playground — entirely within the submission window.

What we learned

DSQL doesn't let you paper over contention with locks; it makes you design the contention out. "Don't concentrate writes on a single key" turned out to be the entire architecture — sharded inventory and sharded budget, idempotent retry, double-entry ledger, async-built UNIQUE indexes. We learned to treat correctness as a SELECT, and that strong consistency across active-active regions is a capability you cannot fake with read replicas — the honest answer to "why not single-writer Postgres?"

What's next for ZeroRace

DynamoDB Streams to Lambda for true push real-time; AWS FIS region-impairment chaos in the failover demo; adapters for real payment rails (Stripe / Visa / Mastercard agent APIs); a published SDK; and production multi-tenant hardening — API-key rotation, per-tenant rate limits, and metered billing.

Built With

Share this project:

Updates