Inspiration

World Product Day's theme is "Everyone Ships Now." But when you ship every week with an AI builder and have no data team, you ship blind: did the change you just merged help the flow that matters, or did it quietly break it? Today the honest answer is a 40-widget analytics dashboard you'll never open, or a gut feeling. Every PM judge — and every maker here — knows that pain in the first person: "I shipped three things this week. Which one is silently costing me users?"

What it does

A release is the changepoint on your event timeline — the cut we split before/after on. After each release, LaunchVerdict hands you one card:

  • the flow that moved the most (e.g. Onboarding completion fell 71% → 53%),
  • the likely cause — the diff you shipped, plus any UX regression Novus flagged on that PR,
  • the one thing to do nowkeep shipping / hold and watch / roll it back,
  • a confidence label and a 95% CI, because a short-window comparison is a correlation under a known cause, not a proof — and the card says so.

Four verdict states, decided deterministically by a rule over the stats — never by the LLM:

  • KEEP SHIPPING — significant improvement.
  • ROLL IT BACK — significant, meaningful drop, with enough sample to trust.
  • HOLD — a real drop but on too thin a sample to revert on yet (hold and watch).
  • INSUFFICIENT — not enough data to call.

See all three live in 60 seconds, no login: https://launchverdict.vercel.app/demo

How Novus closes the loop (not a checkbox)

LaunchVerdict and Novus form a closed release-confidence loop — Novus is the cause half, not a logo:

  • Novus by Pendo reviews the diff before you merge — it scans the codebase and flags UX regressions on the PR (the cause).
  • LaunchVerdict renders the verdict after it ships — it measures the flow that moved (the effect).

When connected to a repo, LaunchVerdict reads Novus's flags from the merged PR's review comments and surfaces the matching flag as the cause line on the verdict card — so the pre-merge warning and the post-ship outcome sit on one artifact. In the public /demo that flag is a seeded fixture string standing in for the comment. We dogfood the loop by installing Novus by Pendo on LaunchVerdict's own GitHub repo — and you can verify it live: Novus is connected as the GitHub App app/novus-by-pendo and has opened its instrumentation PRs on our repo, #1 Install Novus and #2 Instrument Pendo Track Events. The attached Novus dashboard screenshot is that same install actively mapping the product.

How we built it

  • Next.js 15 (App Router) on Vercel, React 19, TypeScript.
  • A deterministic statistics engine: two-proportion z-test (pooled SE), Wilson score interval, an inlined normal CDF (Abramowitz–Stegun erf) — no stats dependency, fully unit-tested. It reproduces a fixed reference case (z = −5.46, p ≈ 4.6e-8 → ROLLBACK) on every run.
  • GitHub for the cut (commit/PR/diff/CI + Novus's PR comments) via webhook; Postgres for telemetry; a thin telemetry snippet (lv.js) to self-measure before/after on the flows you care about.
  • An LLM prose layer (schema-forced via tool_choice) that only polishes the headline/cause/action wording — it can never change the call, the numbers, or the confidence. The honesty guard is structural, not a prompt instruction.
  • Built with Claude (Claude Code) as the AI builder.

What we learned

  • The hard part of "release confidence" isn't the stats — it's refusing to over-claim. The most design effort went into making the product honest: confidence labels, "correlation under a known cause, not causation," and a HOLD state so a thin-sample drop is never mis-sold as a confident rollback.
  • The strongest use of Novus wasn't to pull its raw analytics (the free beta doesn't expose a read-back API in time) — it was to promote Novus's pre-merge signal into a decision UI: the cause line on the card. The loop is the product.

Target user

The solo maker / small team who ships every week with an AI builder and has no data team — exactly the "Everyone Ships Now" audience. One avoided bad rollback pays for it.

What's verified vs. wireable (honest scope)

The deterministic engine and the seeded /demo are test-covered and run with no accounts (the live demo is real output from the same engine). The live wiring (GitHub webhook → cut assembly → PR comment + commit status, Postgres telemetry, optional LLM polish) is implemented and typechecks/builds — not stubbed — but /demo data is seeded and the live external paths come online only once a repo + Postgres + token + Novus are connected. The verdict call and numbers never depend on any of that: they come from the tested engine.

Built With

Share this project:

Updates