Repo Doctor: Self-Healing CI Agent for GitHub Repos

Inspiration

Modern software work is dominated by one loop: reproduce → isolate → hypothesize → patch → re-run. Even experienced engineers lose hours to flaky environments, noisy logs, and “one more try” debugging. We noticed that most AI assistants still optimize for advice, not outcomes—they explain what might be wrong, but they don’t reliably prove the fix works.

Repo Doctor was inspired by the idea that the most useful AI developer tool is a closed-loop agent: it should run the code, observe failures, propose a minimal change, and verify success by re-running tests. That “verified fix” experience is what we wanted to make fast, repeatable, and demo-friendly.

What it does

Repo Doctor is an agentic “self-healing CI” app:

Takes a public GitHub repository URL

Clones the repo into a sandbox

Detects or accepts a test command

Runs tests and extracts a clean failure digest

Uses Gemini 3 to generate a minimal patch as a structured unified diff + root-cause explanation

Applies the patch and re-runs tests

Iterates for up to a few loops until it reaches:

✅ tests pass (verified fix), or

❌ stops with an explainable reason (unsupported build, timeouts, repeated failure, etc.)

The final output is PR-style: diff, rationale, and the exact verification command + results.

How we built it

Stack

Frontend: React (Vite)

Backend: FastAPI

Execution: Docker sandbox for cloning + running tests with strict timeouts and resource limits

Deployment: containerized app on Google Cloud Run with a public, no-login link

Core pipeline

Repo ingest

Clone repo

Detect project type (Node/Python/Go/Java) and pick a reasonable default test command

Runner

Execute tests inside a sandbox

Capture stdout/stderr/exit code + duration

Failure digest

Compress logs into the minimum useful signal (assertions, stack trace, failing file/test)

Gemini 3 repair

Provide repo structure + failure digest + constraints (“minimal patch”, “no refactor”, “output unified diff”)

Request structured output so patch application is deterministic

Patch apply + verify

Apply diff

Re-run tests

Repeat (bounded attempts) until green or stop condition

A live timeline: Clone → Run Tests → Diagnose → Patch → Re-test → Result

A diff viewer for review and copy

Shareable run links so judges can revisit results

Challenges we ran into

Reproducibility across random repos

Different build tools, missing dependencies, flaky tests, huge install times Fix: curated demo repos + safe defaults + optional “custom test command” field + strict timeouts.

Safe execution of untrusted code

Running arbitrary code is risky. Fix: Docker sandboxing, resource caps, restricted commands, file-size limits, no privileged operations.

Getting minimal patches instead of rewrites

Models often propose refactors that are hard to review and risky. Fix: hard constraints + “minimal diff” instruction + iterative loop: if tests still fail, refine with new evidence.

Latency vs reasoning depth

Deep diagnosis helps, but long waits kill demos. Fix: bounded loop, visible progress steps, and mode-switching between deeper reasoning for diagnosis and fast updates for UI.

Accomplishments that we're proud of

Built a true closed-loop system: not “AI suggestions,” but verified fixes by rerunning tests.

Produced PR-style outputs that feel reviewable and professional: diff + explanation + verification command.

Made the demo judge-proof: no login, public deployment, predictable curated repos, and clear failure modes.

Implemented guardrails so the app fails safely and transparently instead of crashing or hanging.

What we learned

Verification beats persuasion. A fix isn’t real until tests pass; showing “before/after” test results builds instant trust.

Structure is everything. For automation, structured diffs and machine-readable outputs matter more than verbose text.

Constraints make agentic systems usable. Bounded iterations, timeouts, and stop conditions turn a cool idea into a stable product.

Great demos are curated. Reliability in front of judges matters more than claiming universal coverage.

What's next for Repo Doctor: Self-Healing CI Agent for GitHub Repos

Expand language/build support (monorepos, multi-service repos, more test runners).

Add “patch risk scoring” (e.g., touches only tests vs production code) and a rollback/compare view.

Generate a lightweight postmortem after a fix: root cause, why the patch works, and prevention tips.

Add optional integrations: GitHub PR creation, CI re-run hooks, and team workflows (while keeping a no-login demo mode).

Built With

build/test
for
git-(repo-cloning)-cloud-/-hosting:-google-cloud-run-dev-tooling:-github-(repo-+-version-control)
html/css-frontend:-react
javascript
languages:-python
linux
shell
utilities
uvicorn-ai-/-api:-google-gemini-3-api-sandbox-&-execution:-docker-(isolated-test-runner)
vite-backend:-fastapi-(python)

Updates

Souvik Ghosh started this project — Feb 08, 2026 11:07 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.