Inspiration
“Can you repro this?” is where bug fixes go to die. A teammate drops a screen recording, but nobody has the exact state, timing, or steps. You end up guessing, writing flaky tests, or shipping fixes without proof. Replay2PR was built to turn a bug replay into a deterministic, reviewable bundle: repro steps + an automated test + a verified patch + a single share link.
What it does
Replay2PR converts a short bug screen recording into:
- Reproduction steps (structured + readable)
- A Playwright test that reproduces the issue
- A patch diff that attempts a minimal fix
- Verification output from re-running the test after the patch
- A shareable Evidence Pack containing steps, test code, diff, logs, and artifacts
It also includes a built-in demo target with a deterministic UI bug so judges can run the full pipeline end-to-end without external dependencies.
How we built it
- Next.js 14 (App Router) for the web app + API routes
- Upload API for MP4 intake, stored locally as artifacts
- Job runner that executes a 5-step pipeline: Extract → Reproduce → Patch → Verify → Ship
- Gemini 3 for structured reasoning + code generation:
- Extract repro steps as JSON
- Generate Playwright test code
- Propose a unified diff patch
- Playwright to validate the repro and confirm the fix
- Evidence Pack UI to present everything clearly for reviewers/judges
Gemini 3 Integration (core to the product)
Replay2PR uses the Gemini 3 API as the “reasoning spine” of the pipeline. First, Gemini is prompted to output strict JSON describing the bug: a summary, step-by-step reproduction instructions, expected vs actual behavior, and a confidence level. Those steps drive the second Gemini call, which generates a deterministic Playwright test in TypeScript using stable selectors (data-testid) so the repro is repeatable. After Playwright reproduces the failure, Replay2PR sends the failing output plus the current source of the target component to Gemini again to propose a minimal unified diff patch. The patch is applied automatically, and Playwright is run again to verify the fix. Every output (JSON repro, generated test, diff, and Playwright logs) is persisted as artifacts and rendered in the Evidence Pack. Gemini is used with structured outputs (JSON) and low temperature settings to keep results consistent and automatable, and the “Pro” tier model is reserved for patch generation where correctness matters most.
Challenges
- Forcing strict structured outputs reliably (JSON-only)
- Keeping generated tests stable (selectors, timing, determinism)
- Making patching safe and minimal (unified diffs, smallest change surface)
- Avoiding flaky demos by shipping a deterministic target bug for judging
What we learned
- “Pretty outputs” don’t matter without verification
- Tests are the contract: the patch is only real if the test passes
- UI matters for trust: an Evidence Pack should read like a clean incident report
What’s next
- True multimodal extraction from video frames (UI state + text + clicks)
- GitHub integration (open PR with the diff + attach Evidence Pack)
- Real before/after screenshots captured by Playwright for the artifacts section
Built With
- framer-motion
- google-gemini-3-api
- next.js-14
- node.js
- playwright
- radix-ui
- react
- tailwind-css
- typescript
- zod
Log in or sign up for Devpost to join the conversation.