Inspiration

Before you ship, you'd love a product manager, a designer, a QA engineer, a market researcher, a security engineer, and a DevOps lead to go through your product — and then an investor to grill you. Almost nobody can assemble that room, so people ship blind:

  • 42% of startups fail because there was no market need (CB Insights),
  • a bug caught after launch costs up to 100× more than catching it before (IBM Systems Sciences Institute),
  • and the AI you'd ask for a gut-check just flatters you — recent research found models agree with a user's wrong belief ~64% of the time.

We wanted the opposite of a yes-man: a whole product team that assembles in seconds, looks at your real product, and hands you one prioritized verdict — before your users (and investors) do.

What it does

Paste a live URL, a public GitHub repo, a description, or even a whole Devpost submission, and Meridian:

  • Smart Intake auto-extracts the live URL, the repo, and the description — even pulling them out of an aggregator page like a Devpost submission — shows you what it found, and lets you pick which specialists to run.
  • Six specialists analyze at once — UX, QA, Market, Security, DevOps, and the Product Lead — each running their own skills over your actual code and live pages: reading the repo on GitHub, screenshotting the site for the UX agent to see, checking internal links, searching the market.
  • One verdict — a transparent Meridian Score /100 (a documented weighted average that renormalizes when a specialist can't run, so a missing input never counts as zero), a prioritized fix list, and per-specialist report cards where every finding cites real evidence.
  • The team can disagree, and the PM resolves it — when one specialist says "block" and another says "ship," Meridian surfaces the conflict and the product lead makes the call.
  • The Investor grills you — a final boss-battle that challenges you on what the team actually found and adapts to how well you defend it.

How we built it

  • Frontend: Next.js 16 (App Router) + React 19 + Tailwind v4 + Framer Motion — a glassy "mission-control" UI with a boot intro, a war-room report, and a light/dark theme.
  • AI: Google Gemini 2.5 Flash (with vision for the screenshot-based UX review), with an automatic fallback to xAI Grok on rate-limits.
  • Multi-agent skills engine: a code orchestrator decides which skills are eligible from the inputs + mode, fans them out per-skill with bounded concurrency, then synthesizes the verdict and runs the investor debate. Each skill is an authored SKILL.md with its own rubric.
  • Real evidence tools: GitHub API, a page fetcher + head-signal extractor, Browserless for live screenshots → vision, an internal link checker, and Tavily web search for the market skills.
  • Persistence: Supabase (runs, per-skill results, events) over PostgREST, with an in-memory write-through cache.
  • Analytics: Novus.ai (by Pendo), installed on Meridian itself — we instrument the whole funnel (product submitted → analysis started → each specialist's score → verdict → investor debate) so we can see exactly how it's used.
  • Deploy: Vercel.

Challenges we ran into

  • Stopping false positives was the whole game. Early runs invented vulnerabilities (a "SQL injection" on a static query, "SSRF" on a logging call), flagged marketing demo content as a "PII leak," called current dependencies "outdated," and read a still-loading image as a "placeholder." We built a verification loop — cloning the real repo and probing the live site to fact-check the AI's own findings — and re-grounded every skill: never assert something you can't see in the evidence, reserve critical/high only for directly-verified issues, and say "could not verify" instead of guessing.
  • Serverless state. A run is driven across many short-lived serverless functions, so a per-instance cache went stale and a run could freeze mid-synthesis. We made Supabase the source of truth on every read, so runs stay consistent across instances and survive reloads.
  • Aggregator inputs. Making "paste a Devpost link" actually work meant crawling the submission page for the real repo + deployed URL hiding inside it, and cleaning the README's HTML into readable text.
  • Fair scoring. A dead deploy, a login wall, or a timed-out specialist shouldn't tank the whole score — so unanalyzable inputs drop their specialist and the weights renormalize, with a visible note.

What we learned

The interesting engineering in an LLM product is the guardrails, not the prompt. Most of our work went into making the agents trustworthy: never assert what you can't verify, calibrate severity, and show your evidence. A confident wrong answer is worse than an honest "I couldn't check that."

What's next for Meridian

  • Judge Mode — paste a hackathon submission and grade it against that hackathon's own rubric.
  • Founder Readiness Score by theme, derived from the investor debate.
  • Take-it-with-you — PDF export, a shareable report link, an embeddable score badge, and one-click GitHub-issue text for each fix.
  • Authenticated + mobile screenshots so the UX and journey reviews go even deeper.

Built With

  • browserless
  • framer-motion
  • gemini-2.5-flash
  • github-api
  • google-gemini
  • next.js
  • novus.ai
  • pendo
  • postgresql
  • react
  • supabase
  • tailwindcss
  • tavily
  • typescript
  • vercel
  • xai-grok
Share this project:

Updates