Inspiration
CI failures waste hours: a flaky test, a subtle regression, a dependency mismatch, a brittle mock. The painful part isn’t “knowing it failed” — it’s the slow loop of reading logs, guessing fixes, re-running tests, and repeating.
We wanted an agent that works like a senior engineer on-call: reproduce → diagnose → patch → verify → iterate, until the build is green — and do it autonomously.
What it does
RepoDoctor is an autonomous build-fixing agent powered by Gemini 3. You submit a GitHub repo URL or upload a ZIP, and it:
- Runs a baseline test attempt inside a sandboxed container
- Diagnoses failures by analyzing the logs
- Generates a targeted patch (as a unified diff)
- Applies the patch + re-runs tests
- Repeats until tests pass (or max attempts)
- If there are no Unit Tests, then it switches to improvement mode
- Provides an option to create a GitHub Pull Request with an auto-generated explanation and risk assessment of the changes
This full “diagnose → patch → run → repeat” loop is the core workflow. There’s also a live site you can try at https://repodoctor.onrender.com/
How we built it
How we used Gemini 3
RepoDoctor is designed around three Gemini 3 capabilities that make autonomous repair reliable.
1) Thinking control (thinkingLevel).
We dial reasoning depth based on the job step to optimize speed and iteration:
- Log analysis: thinkingLevel="MINIMAL" for fast structured extraction
- Patch generation: thinkingLevel="LOW" for balanced reasoning that improves across attempts
This split is explicitly implemented as part of the workflow.
2) Structured outputs (responseSchema + JSON)
Every model response is forced into a strict JSON schema (including unified_diff, touched_files, risk_level, etc.). If the JSON is invalid, RepoDoctor refuses to apply anything (fails safely).
This is exactly the kind of “agentic workflow reliability” structured outputs are meant to enable.
3) Multi-turn history (closed-loop memory)
Each attempt preserves per-job history so Gemini can reference previous failures, avoid repeating bad patches, and build incremental context across iterations.
The design uses a thoughtSignature field to carry forward state across turns.
Implementation details (Gemini API)
RepoDoctor calls the Gemini API via the v1beta endpoint, using gemini-3-flash-preview, JSON response mime type, schema enforcement, and retry/backoff on rate limits.
RepoDoctor is a 3-part system
RepoDoctor is a 3-part system:
- Web app UI to submit a repo/ZIP, set max attempts + network toggle, and watch progress in real time (SSE)
- Backend orchestrator that manages jobs/attempts, calls Gemini, validates diffs, and drives the loop
- Runner sandbox that executes builds/tests in a constrained container environment
Security model
Untrusted code is executed in a sandbox with defense-in-depth: non-root containers, CPU/memory/time limits, optional --network none, file/path validation, and input limits (ZIP size + file count).
Challenges we ran into
- Rate limits: handled with retries + exponential backoff on 429s
- Malformed patches: strict schema + diff validation + safe apply checks before execution
- Infinite loops: multi-turn context and explicit attempt history reduce repeated failure modes
- Sandbox safety: hardened runner + strict resource and file limits
Accomplishments that we're proud of
- True autonomy: it doesn’t just suggest fixes — it applies + verifies and iterates until done
- Reliable agent behavior: structured outputs + validation prevent unsafe or unusable actions
- Real-time UX: live progress + clear attempt history, with optional PR creation workflow
- Practical impact: targets a daily developer pain point (broken builds/tests)
What we learned
- Thinking levels are game-changing: Being able to dial reasoning depth based on task complexity dramatically improves both speed and cost efficiency
- Structured outputs enable autonomy: Without guaranteed JSON schemas, building reliable agentic workflows would require fragile parsing and extensive error handling.
- Multi-turn context is essential for iterative tasks: Single-shot prompting fails for problems requiring trial-and-error approaches
What's next for Repo Doctor
- Smarter file targeting (embeddings / relevance selection)
- Parallel patch exploration (multiple candidate diffs evaluated concurrently)
- More ecosystems (Go, Rust, C/C++, mono-repos)
- IDE integration (fix failing tests without leaving the editor)
- Self-hosted mode for private repos / enterprise runners

Log in or sign up for Devpost to join the conversation.