Inspiration
CI failures still wake engineers up at night. We wanted to see if an AI system could take over that 2 AM debugging loop: read logs, understand the codebase, propose a fix, test it safely, and decide whether to resolve or escalate. Instead of another chatbot, we aimed to build something that actually executes.
What it does
Nightingale is an autonomous CI repair agent powered by Gemini 3. When a pipeline fails, it analyzes the failure context, generates a structured fix plan, applies it in an isolated sandbox, reruns tests, and computes a weighted confidence score. If confidence is high, it resolves. If not, it escalates with a detailed report.
How we built it
We designed Nightingale as a multi-stage system: a listener for CI events, a reasoning agent using Gemini 3, a sandbox execution layer, a verification agent, and a confidence engine. All fixes run in isolation, and every decision is backed by structured JSON outputs and validation. We focused heavily on making the reasoning loop reflective and measurable rather than just generative.
Challenges we ran into
The biggest challenge was reliability. Model access, quota behavior, and endpoint differences forced us to deeply understand the Gemini API stack. We also had to prevent silent failures and ensure no simulated logic slipped into the pipeline. Making the system fail loudly and safely was harder than making it work.
Accomplishments that we're proud of
We built a full end to end autonomous repair loop with sandbox isolation, schema validation, and transparent confidence scoring. There are no hardcoded patches or fake reasoning paths. Every fix is generated, tested, and scored before any decision is made.
What we learned
Reasoning alone is not enough. Execution, validation, and safety boundaries matter just as much. We learned how critical it is to control context size, enforce structured outputs, and design escalation paths when confidence drops.
What's next for Nightingale
We plan to extend Nightingale to handle multi file failures, dependency updates, and longer running repair sessions. The long term goal is a Marathon Agent that can manage complex CI recovery across entire repositories without human supervision.
Log in or sign up for Devpost to join the conversation.