Inspiration

Modern AI coding tools can modify code, but they do not own outcomes. Fixing a bug or flaky test still requires humans to verify results, assess risk, and decide whether a change is safe to ship. We wanted to explore what happens when an AI system is responsible not just for writing code, but for proving that a fix actually works.


What it does

Shipless is an autonomous release-engineering agent. Given a concrete engineering goal (such as stabilizing a flaky test), it plans a fix, applies the smallest possible change, verifies the result through repeated test execution, and produces a structured proof pack with diffs, logs, and a final PASS/FAIL decision.


How we built it

We built Shipless as a goal-driven agent in Google AI Studio using the Gemini 3 API. Gemini handles repository-wide reasoning, hypothesis generation, and decision-making across a multi-step agent loop. The system orchestrates planning, execution, verification, and reflection, rather than relying on a single prompt or manual intervention.


Challenges we ran into

The biggest challenge was avoiding a prompt-only design. We had to explicitly structure the agent as a deterministic state machine with enforced verification and failure handling, ensuring the system could not claim success without evidence.


Accomplishments that we're proud of

We built a working autonomous loop that reliably fixes a flaky test and proves stability through repeated verification. Shipless produces reproducible artifacts that clearly show before-and-after behavior, demonstrating a true outcome-owning system rather than a coding assistant.


What we learned

AI systems become far more trustworthy when they are designed to own results instead of suggestions. Verification, rollback rules, and explicit failure states are essential for turning generative models into reliable engineering systems.


What's next for Shipless

Next, we plan to extend Shipless to handle broader release tasks such as regression monitoring, performance verification, and multi-goal execution, moving toward a fully autonomous release-engineering workflow.

Built With

Share this project:

Updates