Inspiration

AI coding agents can generate pull requests quickly, but pure capability isn't the biggest bottleneck—transparency and trust are. Humans often hesitate because they haven’t seen enough: they can’t tell what changed, why it matters, and what might break. AgentReview makes agent output explainable and reviewable so risk becomes visible and confidence becomes earned.

What it does

AgentReview turns an agent-generated PR into a structured, explainable review experience.

Receipt View

  • Printed-style PR summary
  • Risk score
  • Files touched / lines added & removed
  • Categorized change chunks
  • Review checklist
  • One-click Sign & Merge

Director View

  • Cinematic player that steps through logical changes
  • Synchronized narration + captions
  • Code panels + dependency blast radius visualization

Ask AgentReview

  • Ask questions about the PR
  • Trigger visual actions (replay a scene, highlight affected files)
  • Grounded in the actual change set

Concrete case study We demo a real PR on a cal.diy-style booking system:

  • Changes to card hold / payment logic
  • Updates to no-show fee handling
  • Modified webhook payload contracts
  • Deleted regression test for payments
  • 10 files touched, +103 / −415 lines, 5 downstream affected files

AgentReview surfaces the key risk: payment behavior changed while test coverage was removed.

How we built it

Frontend: React components (Receipt, Director, ChatPanel) with a synced playback timeline

Backend: Node.js + Express

  • webhook → generate receipt
  • merge → Sign & Merge with SHA protection

Data layer: MongoDB for session persistence

Narration system:

  • ElevenLabs for voice generation
  • Custom beat manifest/timestamps to sync audio, captions, UI transitions, and graph animations

Review generation:

  • GPT models via OpenAI API

Infra:

  • Cloudflare Tunnel for stable webhook routing

Challenges we ran into

  • Turning diffs into narratives (IR for change chunks + dependencies)
  • Audio/UI sync using custom timing
  • Blast radius modeling without full static analysis
  • General reliability issues.

Accomplishments we're proud of

  • Turning a 500+ line PR into a coherent, navigable story
  • Making risk legible (not just summarized)
  • Building a closed-loop review system (explain → ask → approve)
  • Achieving tight audio + UI synchronization in the Director view
  • Shipping a functional Sign & Merge flow with safety checks

What we learned

  • Raw diffs are not the right abstraction for reviewing AI-generated code
  • Developers care more about risk + impact than line-by-line changes
  • Narration + visualization together are far more effective than either alone
  • Designing for failure modes first improves demo reliability

What's next for AgentReview

  • Deeper static/semantic analysis for more accurate blast radius
  • Multi-PR memory to learn patterns across agent behavior
  • Integration with real GitHub org workflows
  • Better verification layers (test coverage + invariant checks)

Built with

  • React
  • Node.js
  • Express
  • MongoDB
  • ElevenLabs
  • GitHub App APIs
  • Cloudflare Tunnel
  • OpenAI API
Share this project:

Updates