Inspiration

CI failures waste hours: a flaky test, a subtle regression, a dependency mismatch, a brittle mock. The painful part isn’t “knowing it failed” — it’s the slow loop of reading logs, guessing fixes, re-running tests, and repeating.

We wanted an agent that works like a senior engineer on-call: reproduce → diagnose → patch → verify → iterate, until the build is green — and do it autonomously.

What it does

RepoDoctor is an autonomous build-fixing agent powered by Gemini 3. You submit a GitHub repo URL or upload a ZIP, and it:

  • Runs a baseline test attempt inside a sandboxed container
  • Diagnoses failures by analyzing the logs
  • Generates a targeted patch (as a unified diff)
  • Applies the patch + re-runs tests
  • Repeats until tests pass (or max attempts)
  • If there are no Unit Tests, then it switches to improvement mode
  • Provides an option to create a GitHub Pull Request with an auto-generated explanation and risk assessment of the changes

This full “diagnose → patch → run → repeat” loop is the core workflow. There’s also a live site you can try at https://repodoctor.onrender.com/

How we built it

How we used Gemini 3

RepoDoctor is designed around three Gemini 3 capabilities that make autonomous repair reliable.

1) Thinking control (thinkingLevel).

We dial reasoning depth based on the job step to optimize speed and iteration:

  • Log analysis: thinkingLevel="MINIMAL" for fast structured extraction
  • Patch generation: thinkingLevel="LOW" for balanced reasoning that improves across attempts

This split is explicitly implemented as part of the workflow.

2) Structured outputs (responseSchema + JSON)

Every model response is forced into a strict JSON schema (including unified_diff, touched_files, risk_level, etc.). If the JSON is invalid, RepoDoctor refuses to apply anything (fails safely).

This is exactly the kind of “agentic workflow reliability” structured outputs are meant to enable.

3) Multi-turn history (closed-loop memory)

Each attempt preserves per-job history so Gemini can reference previous failures, avoid repeating bad patches, and build incremental context across iterations.

The design uses a thoughtSignature field to carry forward state across turns.

Implementation details (Gemini API)

RepoDoctor calls the Gemini API via the v1beta endpoint, using gemini-3-flash-preview, JSON response mime type, schema enforcement, and retry/backoff on rate limits.

RepoDoctor is a 3-part system

RepoDoctor is a 3-part system:

  • Web app UI to submit a repo/ZIP, set max attempts + network toggle, and watch progress in real time (SSE)
  • Backend orchestrator that manages jobs/attempts, calls Gemini, validates diffs, and drives the loop
  • Runner sandbox that executes builds/tests in a constrained container environment

Security model

Untrusted code is executed in a sandbox with defense-in-depth: non-root containers, CPU/memory/time limits, optional --network none, file/path validation, and input limits (ZIP size + file count).

Challenges we ran into

  • Rate limits: handled with retries + exponential backoff on 429s
  • Malformed patches: strict schema + diff validation + safe apply checks before execution
  • Infinite loops: multi-turn context and explicit attempt history reduce repeated failure modes
  • Sandbox safety: hardened runner + strict resource and file limits

Accomplishments that we're proud of

  • True autonomy: it doesn’t just suggest fixes — it applies + verifies and iterates until done
  • Reliable agent behavior: structured outputs + validation prevent unsafe or unusable actions
  • Real-time UX: live progress + clear attempt history, with optional PR creation workflow
  • Practical impact: targets a daily developer pain point (broken builds/tests)

What we learned

  • Thinking levels are game-changing: Being able to dial reasoning depth based on task complexity dramatically improves both speed and cost efficiency
  • Structured outputs enable autonomy: Without guaranteed JSON schemas, building reliable agentic workflows would require fragile parsing and extensive error handling.
  • Multi-turn context is essential for iterative tasks: Single-shot prompting fails for problems requiring trial-and-error approaches

What's next for Repo Doctor

  • Smarter file targeting (embeddings / relevance selection)
  • Parallel patch exploration (multiple candidate diffs evaluated concurrently)
  • More ecosystems (Go, Rust, C/C++, mono-repos)
  • IDE integration (fix failing tests without leaving the editor)
  • Self-hosted mode for private repos / enterprise runners

Built With

Share this project:

Updates