Inspiration

One of us was working at a startup when AI-generated code caused an outage, and we couldn't figure out the root cause because nobody had tracked what the coding agent actually did or assumed.

What it does

OpenBox captures what AI coding agents decide, assume, and run during development, attaches that evidence to pull requests, and when something breaks, lets you replay the full decision trail and auto-draft a fix in under a minute.

How we built it

A Claude Code capture skill for session logging, a backend bridge that filters logs against GitHub diffs via the Gemini API, and a Next.js TypeScript frontend for PR provenance and incident replay.

Challenges we ran into

Making capture invisible during coding but genuinely useful during review, and cleanly separating hard evidence like commands and diffs from the model's own interpretation of what it was doing.

Accomplishments that we're proud of

We validated with 14 engineers across companies like Google, Microsoft, Apple, and Atlassian, and our incident replay demo goes from "prod is down" to a drafted fix PR in about 60 seconds. What we learned We started out building a provenance viewer, but mentors and interviewees kept pushing us toward the auto-fix and helping teams learn from how their best engineers use agents, which completely reframed the product. What's next for OpenBox Expanding agent coverage to Cursor and Codex, open-sourcing the capture skill, and building the learning layer that surfaces team-wide patterns so the system gets smarter with every incident.

SLIDE DECK https://www.youtube.com/watch?v=-nGD-p475vc

Built With

Share this project:

Updates