Inspiration Every developer knows the pain of "Pager Fatigue"—waking up to Sentry alerts for trivial bugs like type errors, null pointer exceptions, or dependency conflicts. We realized that 80% of these bugs follow a predictable pattern: Detect → Reproduce → Patch → Test → PR. Why are humans still doing this manually?
We wanted to build an autonomous first responder—an AI agent that doesn't just chat, but acts. We were inspired by the vision of "Self-Healing Software," where a repository can identify its own wounds and stitch them up before a human engineer even finishes their morning coffee.
What it does Reflexiv-PR is an end-to-end autonomous coding agent that monitors your application for crashes and fixes them in real-time.
Detects: Listens for Sentry alerts via webhooks. Isolates: Instantly spins up a secure, ephemeral Daytona Sandbox—ensuring the bug reproduction happens in a clean environment, never messing up your local machine. Reproduces: Clones the repo and runs the test suite to confirm the bug exists. Fixes: Uses Google Gemini to analyze the stack trace and generate a code patch. Verifies: Applies the patch and re-runs the tests. If they fail, it feeds the error back to Gemini and tries again (Effectively "debugging itself"). Delivers: Once tests pass, it automatically opens a GitHub Pull Request with the fix, ready for review. How we built it We architected a micro-agent system using Python (FastAPI) as the brain and Daytona as the muscle.
Orchestration Engine: A Python worker queue that manages the lifecycle of "Missions" (bug fix attempts). The Sandbox (Daytona): We utilized the Daytona SDK to programmatically spawn isolated, dockerized development environments. This was crucial for running untrusted code and installing dependencies (npm install, pip install) safely. The Brain (Gemini): We engineered a prompt pipeline that feeds Sentry stack traces + file context to Google Gemini. We implemented a custom Patcher that robustly parses LLM responses into valid Git Diffs. Mission Control: A Next.js dashboard that streams real-time logs from the agent, giving us "God Mode" visibility into every step (Provisioning, Patching, Validating). Connectivity: We bridged the gap between Sentry Cloud and our local agent using Ngrok tunnels. Challenges we ran into The "It Works on My Machine" Problem: Initially, the agent failed because it lacked the right dependencies (like Node.js) in the sandbox. We solved this by implementing custom Docker image builds via the Daytona SDK (Image.from_dockerfile), ensuring the agent always has the perfect environment. Handling Bad AI Output: Gemini would sometimes return invalid diffs or hallucinate file paths. We built a Retry Guardrail system that catches these errors, gives specific feedback to the model (e.g., "You missed a closing brace"), and forces it to regenerate the solution. Timeout & Latency: Building environments from scratch took time. We optimized this by leveraging Snapshot Caching in Daytona, reducing provisioning time from minutes to seconds. Silent Failures: connecting "Real" Sentry webhooks to localhost was tricky. We debugged this by building a custom simulation script ( trigger_sentry_alert.py ) to mimic webhooks, allowing us to test the pipeline rapidly without waiting on cloud latency. Accomplishments that we're proud of True Autonomy: We watched the agent receive a real TypeError from a React app, spin up a sandbox, write a fix, pass the tests, and open a PR—all in under 64 seconds. Self-Healing Capabilities: The agent successfully detects when a patch fails tests and iterates on its own solution without human intervention. Seamless Developer Experience: The dashboard makes the AI's "thought process" visible. You don't just see a PR; you see exactly how it got there. What we learned Context is King: An LLM is only as good as the data you feed it. Providing the exact file contents + the Sentry stack trace drastically improved fix rates compared to just giving the error message. Sandboxes are Essential: You cannot build a safe coding agent without isolation. Daytona made it trivial to create disposable environments that we could trash after every run. Agents need Feedback Loops: Linear pipelines fail. Circular pipelines (Try → Fail → Learn → Retry) succeed. What's next for Reflexiv-PR Proactive Security Patching: Monitoring CVE feeds and auto-patching package.json vulnerabilities. Multi-File Refactoring: Moving beyond single-file bug fixes to handle architectural changes. Auto-Deploy: Integration with Vercel/Netlify to automatically deploy fixes that pass high-confidence thresholds. CodeRabbit Integration: We started adding this! We want deeper integration where the agent creates a PR, CodeRabbit reviews it, and the agent fixes the review comments automatically.
Log in or sign up for Devpost to join the conversation.