Inspiration (Problem)

AI can write code, but teams can’t trust why it was written. Specs drift, tests fail, and there’s no audit trail from real customer evidence to shipped code.

What it does (Solution)

Growpad runs a stateful, multi-step workflow: Evidence → Insights → Decision Memo → PRD → Tickets → Code Diff → Tests → Drift Check → Export (zip) If verification fails, it performs bounded self-healing (≤2 retries) and re-verifies.

How it works (What judges should understand in 20 seconds)

Panel 1: Evidence + guardrails (OKR, forbidden paths, max retries=2, failure injection toggle) Panel 2: Pipeline log (each stage + retry counter + pass/fail) Panel 3: Artifact tabs (Evidence Map, Decision Memo, PRD, Tickets, Diff, Tests, Drift Report, Scorecard) + Download ZIP

How we built it with Gemini 3

How we built it (Gemini 3 API "magic")

Thought signatures to keep state across stages Function calling to orchestrate tools (repo scan, diff, tests, export) Structured outputs to generate deterministic JSON artifacts Long context to ingest large evidence packs Code execution to compute verification metrics and validate outputs

Challenges we ran into

Making failure deterministic, retries bounded, and outputs judge-readable—without turning this into a prompt wrapper.

Accomplishments we're proud of

  • A full "Evidence → Verified PR" loop with visible logs and a downloadable proof pack.
  • An intentional failure that gets corrected in ≤1 retry (cap is 2).
  • A clean demo UX that clearly communicates tool use + verification.

What's next

Connectors (Intercom/Gong/Mixpanel), human approval gates, continuous nightly synthesis.

Built With

  • code-execution
  • docker
  • fastapi
  • function-calling
  • gemini-api
  • github-api
  • jest
  • long-context
  • nextjs
  • node.js
  • pytest
  • thought-signatures
Share this project:

Updates