Inspiration

I've built "let users buy credits and spend them on AI calls" into a few products now, and every time the first version had the same quiet bug. You read the balance, check it, then charge. It passes review, it works in the demo, and then in production two requests land in the same millisecond, both read the same balance, both pass the check, and both go through. A user with 5 credits gets 20 answers and you eat the provider bill for the other 15.

Nobody notices until the invoice shows up. I got tired of rebuilding the same metering-and-wallets plumbing for every project, and more than that, tired of the fact that it was subtly wrong every time. So I built the thing I kept wishing existed: a gateway that makes overspend impossible instead of just unlikely, and moves everything else (pricing, routing, kill switches) into a dashboard so I never have to redeploy to change how a product is monetized.

What it does

Switchboard sits between your app and the model providers. You change two lines: point your OpenAI SDK at the gateway, swap the model name for a flag you control, and pass the user id you already have. After that:

  • Every request is metered, and each user's credit wallet and plan quota is enforced at request time.
  • A flag like "chat" maps to whatever model you want per tier. Free can ride a cheap model, pro gets the frontier one, and you can add as many tiers as you sell. Routing is changed from the dashboard, not in code.
  • You see gross margin on every model, can A/B test a new model on live traffic, and can disable a model or kill a leaked key instantly.
  • Users top up through Stripe or RevenueCat. The webhook credits their wallet and double-enters the ledger automatically.

It speaks the OpenAI API the whole way through, including the error shapes: 402 when credits run out, 429 for quota, 403 for a killed key, 409 for a duplicate request.

How I built it

The whole thing rests on one decision: enforcement is a database transaction, not middleware.

The reserve is a single DynamoDB TransactWriteItems. In one atomic step it conditionally decrements the wallet and bumps the quota on the user item, and writes an idempotency record. There is no read-then-write gap for concurrency to slip through. DynamoDB serializes the conditional updates on that one item, so the requests that fit succeed and the rest fail the condition. I read the cancellation reasons to map each failure to the right HTTP status.

I used two AWS databases on purpose, because one store can't do both jobs well. The hot wallet lives in DynamoDB: strongly consistent, single-digit-millisecond conditional writes, and it's the contended item that everything fights over. The double-entry ledger, audit log, and config history live in Aurora DSQL, written off the request path with Vercel's after(). An append-only ledger never conflicts under DSQL's optimistic concurrency, which is exactly the workload that would abort-storm a hot wallet row. Each database does what it's good at.

It runs on Vercel (Next.js, Node runtime, pinned to iad1, responses streamed straight back to the caller). The gateway talks to AWS over OIDC into an IAM role, so there are no static AWS keys sitting anywhere. Provider keys stay the tenant's own, encrypted with AWS KMS and decrypted only in memory when a call actually goes out. I scaffolded the dashboard with v0.

Challenges I ran into

The hard part was making overspend impossible rather than improbable, and then proving it. I almost shipped the naive read-check-write version because it honestly looks fine. Getting the reserve down to one conditional transaction, and then writing a test that fires 50 parallel requests at a 5-credit wallet and asserts that exactly 5 succeed and the balance lands on 0, took longer than the feature itself.

A few specific things bit me. Under heavy contention on the same item, DynamoDB returns TransactionConflict, which means "I didn't get to evaluate the conditions," not "the condition failed," so those need a retry with backoff while only ConditionalCheckFailed counts as a real rejection. Reasoning models bill thinking tokens that don't show up in completion_tokens, so I charge max(completion, total minus prompt) to avoid undercharging. And keeping every amount as integers (milli-credits and micro-USD) so rounding never leaks value meant formatting only at the very edge of the system.

Accomplishments I'm proud of

The race test passes the same way every run: 50 in, 5 out, balance 0. It's a genuinely two-line integration that still gives you wallets, quotas, routing, failover, A/B tests, margin tracking, kill switches, and hosted refills. And the two-database split holds up. The ledger is never on the latency path, and neither database is being asked to do something it's bad at.

What I learned

The real lesson was that the right answer was two databases, not one. I went in assuming Postgres could handle everything, and it can, but a hot contended wallet under optimistic concurrency abort-storms, while the same engine is perfect for an append-only ledger. DynamoDB conditional transactions are the mirror image. Designing to each database's grain is what made the system both correct and fast, and it's not a tradeoff I would have appreciated without actually hitting the contention.

What's next for Switchboard

Usage-based invoice exports, more provider adapters, per-tenant anomaly alerts, and a self-serve onboarding flow so a team can wire it up without me in the loop.

Built With

Next.js, Vercel, v0, Amazon DynamoDB, Amazon Aurora DSQL, AWS KMS, AWS IAM, TypeScript, Stripe, RevenueCat

Built With

Share this project:

Updates