Inspiration

I kept noticing that every AI idea validator says yes to everything, which makes them useless. At the same time, the real work of starting something, figuring out whether the idea is worth building and whether you can actually get the name, still happens across ten browser tabs of Google searches, competitor pricing pages, and domain lookups. Lovable, Bolt, and v0 will all happily build the app for you. None of them check first. When I drew the challenge domain startup.delivery, the idea clicked: build the tool that doesn't suggest a startup, it delivers one, onto its own TLD.

What it does

You describe a business idea in one sentence. A single streamed agent run then does four things:

  1. See (Nimble). Live SERP and Extract recon: real competitors with the junk filtered out (social sites, directories, listicles), incumbent complaints mined and rated by severity, real pricing pulled from competitor pages, and a demand signal built from the related-search queries people actually type.
  2. Think (OpenRouter). Finds the positioning gap and invents brandable names, informed by past deliveries in the same niche so it avoids names it already handed out.
  3. Verdict. A 0-100 opportunity score where every point comes from a cited factor (Demand +9, Pain +8, Competition -10, and so on). The score is deterministic. The model can adjust it by at most 15 points within an evidence-backed band, and it never writes the number itself. The verdict is build, pivot, or pass, and it genuinely says pass.
  4. Check (name.com). Every candidate name is checked across six TLDs in one batched call to the production Core API, .delivery first. You get live availability, real first-year and renewal pricing, and premium-trap warnings (.delivery costs $8.99 and renews at $77.99 a year). Taken names get rejected on screen. The winner gets a one-click "Claim it" link straight into the name.com cart.

At the end you have a shareable delivery: domain, brand, cited brief, an optional landing page, a permalink, and a tracking number. The whole point is a validator built to say no, and to show its work when it does.

How we built it

  • SvelteKit on Vercel for the UI and SSE streaming, talking to a Python FastAPI bridge on Fly.io that runs the four-step agent pipeline with durable, resumable jobs.
  • Nimble SERP and Extract for the recon. An offline grounding verifier re-checks the LLM's prose against what the recon actually found. If it cites a company that never appeared in the live results, the call retries once with a grounding steer, and anything that still slips through gets logged instead of silently shipped.
  • The name.com production Core API for availability and pricing. On boot, the bridge probes a known-taken domain and a known-open one to confirm it's talking to production rather than the sandbox, and the UI shows a provenance badge so judges can verify the prices on screen are real.
  • OpenRouter as one swappable key for the extraction and classification work. The model handles language; it does not author the verdict.
  • Tower for the data layer. The agent ships as a Tower app with an Iceberg deliveries table. Every delivery persists its full cited signal vector along with captured founder outcomes (built, passed, got traction), so the score can become outcome-calibrated later as labels accumulate.

Challenges we ran into

  • Keeping the score honest. An LLM-authored score flatters you. I moved the verdict into a deterministic, cited anchor and limited the model to a bounded 15-point adjustment, then built an eval to prove the behavior: 10/10 on grounding, 43/43 on anchoring, 6/6 on ownability.
  • Making live registry data the hero of the demo without it breaking on camera. Batched availability calls, cached recon, and a network-free ?demo=loop replay mode for recording.
  • Hallucinated competitors. The junk filter plus the grounding verifier handle this; I stopped trusting raw synthesis early on.

Accomplishments that we're proud of

  • The "3 rejected, 1 secured" moment, where live availability changes on screen against the real registry, with renewal-cliff pricing that no free name generator shows you.
  • A validator that actually returns pass verdicts and treats them as honest outcomes rather than failure states.
  • A genuinely deployed, hardened product: durable resumable jobs, SSE streaming, accessibility throughout, Open Graph cards, permalinks, and a public Loading Dock gallery of past deliveries.

What we learned

Competitor count by itself is a vanity metric. The only real wedge in this category is credibility: a cited, falsifiable verdict that can say no. And real registry data (live availability, real renewal pricing) differentiates harder than any amount of generated copy.

What's next for startup-delivery

Outcome-calibrated score weights once enough founder labels accumulate. Founder accounts with a personal validation portfolio. An embeddable validation API. The flywheel already turns: every delivery's signal vector sharpens the naming of the one after it.

Built With

Share this project:

Updates