Replay2PR

Hero Section
Report Card
Thinking
Victory
Replay Incident Report

Inspiration

“Can you repro this?” is where bug fixes go to die. A teammate drops a screen recording, but nobody has the exact state, timing, or steps. You end up guessing, writing flaky tests, or shipping fixes without proof. Replay2PR was built to turn a bug replay into a deterministic, reviewable bundle: repro steps + an automated test + a verified patch + a single share link.

What it does

Replay2PR converts a short bug screen recording into:

Reproduction steps (structured + readable)
A Playwright test that reproduces the issue
A patch diff that attempts a minimal fix
Verification output from re-running the test after the patch
A shareable Evidence Pack containing steps, test code, diff, logs, and artifacts

It also includes a built-in demo target with a deterministic UI bug so judges can run the full pipeline end-to-end without external dependencies.

How we built it

Next.js 14 (App Router) for the web app + API routes
Upload API for MP4 intake, stored locally as artifacts
Job runner that executes a 5-step pipeline: Extract → Reproduce → Patch → Verify → Ship
Gemini 3 for structured reasoning + code generation:
- Extract repro steps as JSON
- Generate Playwright test code
- Propose a unified diff patch
Playwright to validate the repro and confirm the fix
Evidence Pack UI to present everything clearly for reviewers/judges

Gemini 3 Integration (core to the product)

Replay2PR uses the Gemini 3 API as the “reasoning spine” of the pipeline. First, Gemini is prompted to output strict JSON describing the bug: a summary, step-by-step reproduction instructions, expected vs actual behavior, and a confidence level. Those steps drive the second Gemini call, which generates a deterministic Playwright test in TypeScript using stable selectors (data-testid) so the repro is repeatable. After Playwright reproduces the failure, Replay2PR sends the failing output plus the current source of the target component to Gemini again to propose a minimal unified diff patch. The patch is applied automatically, and Playwright is run again to verify the fix. Every output (JSON repro, generated test, diff, and Playwright logs) is persisted as artifacts and rendered in the Evidence Pack. Gemini is used with structured outputs (JSON) and low temperature settings to keep results consistent and automatable, and the “Pro” tier model is reserved for patch generation where correctness matters most.

Challenges

Forcing strict structured outputs reliably (JSON-only)
Keeping generated tests stable (selectors, timing, determinism)
Making patching safe and minimal (unified diffs, smallest change surface)
Avoiding flaky demos by shipping a deterministic target bug for judging

What we learned

“Pretty outputs” don’t matter without verification
Tests are the contract: the patch is only real if the test passes
UI matters for trust: an Evidence Pack should read like a clean incident report

What’s next

True multimodal extraction from video frames (UI state + text + clicks)
GitHub integration (open PR with the diff + attach Evidence Pack)
Real before/after screenshots captured by Playwright for the artifacts section

Built With

framer-motion
google-gemini-3-api
next.js-14
node.js
playwright
radix-ui
react
tailwind-css
typescript
zod

Updates

Awaiz Ahmed started this project — Feb 09, 2026 07:11 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.