AutoPilot

Sentry metric: https://youtu.be/ctUCtMgi9pA We use AI coding tools like Codex all the time. They are great at writing snippets, but the process around the code is still manual. I still have to run it, debug it, open PRs, and wait for reviews.

Aadit, heard a Principal Engineer talk about how he uses Agents to create PR's at night, and reviews those when he wakes up in the morning. This makes the PE to be very effecient

We thought, we should build this !

What if you could drop in a repo link and a task, and an AI system would not just write code, but also run it, trace it, and ship it as a PR?

That idea became Daytona PR Copilot.

How we built It

The project connects a few tools into one automated pipeline:

A Daytona sandbox spins up and clones the repo

An LLM generates the code changes for the task

The project runs inside the sandbox

Sentry captures runtime errors and traces

If things look good, a branch is pushed and a PR is opened automatically

CodeRabbit is triggered to do an instant AI code review

So instead of “AI wrote some code,” you get AI shipped a tested, traced pull request.

Win - We just reduced the PR cycle time from 2 hours to 3 minutes! We are definitely going to use it.

What we Learned

The biggest thing I learned is that AI coding gets much more powerful when it can see runtime behavior, not just source code.

I also learned how much of software development is actually workflow. Environments, testing, and reviewing are just as important as writing code, and those steps can be automated too.

Productionalizing the product -

Integrate with enterprise version control systems
Simulate customer traffic in sandboxes
Run Sentry within the sandbox

Challenges

One challenge was connecting code generation with real execution. LLMs do not naturally understand runtime failures, so I had to rely on sandbox runs and Sentry traces as feedback signals.

Another challenge was automating all the small developer habits like structuring commits and PRs in a way that still feels human.

LLMs hallucinate fixes when they can't see runtime state. Our solution: we pipe Sentry stack traces directly into the prompt context, letting the model see exactly what failed and why

This project explores a simple idea:

Task→Code→Run→Trace→PR

Not just AI that writes code, but AI that finishes the job.