Inspiration

The old web ad model—selling user attention—doesn't translate to AI chat, where there's no banner to look at and interrupting the answer erodes trust. We asked what an advertiser could pay for that actually helps the user, and landed on selling test-time compute instead of attention.

What it does

GUPTA is a test-time-compute ad exchange: when a user asks a question, advertisers bid (with a cheap model over their own product data) on how relevant they are, and the top bid buys the main model extra reasoning budget to vet that product against the user's private context. The platform spends compute only where a relevant advertiser is willing to pay for scrutiny, so the user gets a better-researched recommendation rather than an interruption.

How we built it

We built a reproducible synthetic benchmark of 20 SF hotels, 20 user profiles, and noisy "advertiser dossiers," plus a deterministic preference key, all generated with the Anthropic API. The live system is a FastAPI app where Haiku runs the parallel auction and Opus runs a tool-using search agent over a document corpus, with everything streamed to a no-build web UI via Server-Sent Events.

Challenges we ran into

Modern Claude models don't expose a manual "thinking budget," so we had to proxy test-time compute through reasoning depth and tool-use turn budgets instead. Getting reliable streaming for 20 parallel bids plus a multi-rollout ensemble—without one failed call killing the run—took careful async orchestration and per-call error handling.

Accomplishments that we're proud of

We connected a clean theoretical idea—the MLE-vs-Bayes gap—to a working product, using a prompt-ensemble as a Monte-Carlo Bayes predictor that empirically closes that gap. The whole pipeline runs live end-to-end: pick a user, watch the auction settle, and see the agent think and answer in real time.

What we learned

More compute isn't uniformly better—accuracy rises steeply with a little search and then plateaus, so the real value is in cheap, high-marginal-value reasoning that's exactly auction-shaped. We also saw that a small model with a little search can rival a much larger one, suggesting efficiency, not size, should win the slot.

What's next for Advertising Payment Via TTC to Improve Surplus

We want to move from synthetic hotels to real product corpora and design a proper auction mechanism (truthful pricing, budgets, and fraud resistance) on top of the bidding layer. Longer term, we'd formalize the surplus guarantees so both users and advertisers provably benefit from each unit of compute spent.

Built With

Share this project:

Updates