Inspiration
Modern AI coding tools can modify code, but they do not own outcomes. Fixing a bug or flaky test still requires humans to verify results, assess risk, and decide whether a change is safe to ship. We wanted to explore what happens when an AI system is responsible not just for writing code, but for proving that a fix actually works.
What it does
Shipless is an autonomous release-engineering agent. Given a concrete engineering goal (such as stabilizing a flaky test), it plans a fix, applies the smallest possible change, verifies the result through repeated test execution, and produces a structured proof pack with diffs, logs, and a final PASS/FAIL decision.
How we built it
We built Shipless as a goal-driven agent in Google AI Studio using the Gemini 3 API. Gemini handles repository-wide reasoning, hypothesis generation, and decision-making across a multi-step agent loop. The system orchestrates planning, execution, verification, and reflection, rather than relying on a single prompt or manual intervention.
Challenges we ran into
The biggest challenge was avoiding a prompt-only design. We had to explicitly structure the agent as a deterministic state machine with enforced verification and failure handling, ensuring the system could not claim success without evidence.
Accomplishments that we're proud of
We built a working autonomous loop that reliably fixes a flaky test and proves stability through repeated verification. Shipless produces reproducible artifacts that clearly show before-and-after behavior, demonstrating a true outcome-owning system rather than a coding assistant.
What we learned
AI systems become far more trustworthy when they are designed to own results instead of suggestions. Verification, rollback rules, and explicit failure states are essential for turning generative models into reliable engineering systems.
What's next for Shipless
Next, we plan to extend Shipless to handle broader release tasks such as regression monitoring, performance verification, and multi-goal execution, moving toward a fully autonomous release-engineering workflow.

Log in or sign up for Devpost to join the conversation.