Inspiration
Every day, merge requests land in production carrying hidden risk. Security vulnerabilities slip through because manual review is slow, inconsistent, and context-switching-heavy. A single missed SQL injection or IDOR flaw becomes an incident report. Compliance gaps — violations of SOC2, HIPAA, PCI-DSS, or GDPR — surface in audits rather than in code review, where they are cheapest to fix. CI pipelines run redundant stages, skip dependency caching, and burn cloud budget and carbon while nobody watches. Large features ship as monolithic merge requests because nobody broke them down into reviewable units before development started.
The root cause is not laziness or incompetence. It is that the knowledge required to catch these problems — attacker intuition for security, framework expertise for compliance, CI proficiency for sustainability — is rare, expensive, and unavailable at the speed modern teams ship. Manual review is inherently a bottleneck. Automated scanners produce noise without judgment. The gap between what teams know they should do and what they have capacity to do is where risk lives.
What it does
ShipSafe is an AI-native DevSecOps platform — 5 custom AI agents and 3 orchestration flows that form an autonomous DevSecOps team operating inside GitLab. It eliminates the repetitive review work that kills developer motivation while maintaining code quality standards.
The five agents each own a distinct domain:
- @shipsafe-security reviews every diff as an adversarial attacker would. It covers the full OWASP Top 10, scores exploitability realistically, and proposes correct fix code for each finding.
- @shipsafe-compliance checks code changes against your configured regulatory frameworks. It reads
.shipsafe/compliance-rules.ymland produces a scorecard with specific control citations (e.g., SOC2-CC6.1, HIPAA-164.312) and remediation paths. - @shipsafe-green makes CI/CD environmental and financial cost visible. It detects missing caches, redundant stages, and duplicate test execution. It estimates CO2 output per pipeline run based on cloud region energy mix.
- @shipsafe-planner transforms large, under-specified issues into atomic, independently-mergeable sub-issues with estimates, labels, and explicit dependency ordering.
- @shipsafe-verdict reads the Security, Compliance, and Green reports from the MR comment thread and computes a weighted ShipSafe Score (Security 50%, Compliance 30%, Green 20%). Issues a clear merge decision — SHIP IT, REVIEW REQUIRED, or DO NOT SHIP — and applies the corresponding label.
Flow 1: Secure Release Review — Triggered by assign_reviewer. Runs Security → Compliance → Green → Verdict Synthesizer sequentially. Produces a weighted ShipSafe Score with merge decision and label.
Flow 2: Vulnerability Auto-Fix — Triggered by shipsafe-autofix label. Identifies vulnerabilities, designs minimal fixes, creates a new branch with patches, opens a fix MR. Closed-loop remediation.
Flow 3: Issue Planning — Triggered by @-mention in any issue. Decomposes large features into trackable sub-issues with estimates and dependency ordering.
How we built it
All three flows are defined in GitLab Duo Workflow YAML using the ambient environment and v1 schema. Each step is an AgentComponent with a scoped toolset drawn from GitLab's 80+ agent tools. All five agents use Claude Sonnet 4 via GitLab-managed credentials. Compliance rules are fully configurable per project via .shipsafe/compliance-rules.yml. We created an intentionally vulnerable demo application to validate each agent's detection capabilities across OWASP Top 10 categories.
Challenges we ran into
Designing system prompts that produce consistent, structured output across different codebases and languages required extensive iteration. Ensuring the Verdict Synthesizer could reliably parse three different agent report formats from MR notes was an architectural challenge solved by using standardized markdown headers as parsing anchors. Flow service account token scopes on the hackathon platform required adapting our testing approach to validate agents via Duo Chat individually.
Accomplishments that we're proud of
All five agents work end-to-end via Duo Chat on real merge requests and issues. The Security Analyst found 10+ vulnerabilities in our test MR including SQL injection, pickle deserialization, command injection, and hardcoded secrets. The Compliance Auditor mapped every finding to specific SOC2 and GDPR control IDs. The Verdict Synthesizer computed a weighted score of 0/100 and issued DO NOT SHIP. The Planner decomposed a feature into 8 sub-issues with story points and dependencies.
What we learned
The GitLab Duo Agent Platform is remarkably capable for building multi-agent workflows. The sequential component routing with cross-agent context passing enables sophisticated orchestration. Per-project compliance configuration via repository files makes the system immediately useful across different regulatory environments without code changes. Claude Sonnet 4's long-context reasoning is essential for tracing vulnerabilities across module boundaries.
What's next for ShipSafe
- Parallel execution of Security, Compliance, and Green agents for faster reviews
- Integration with GitLab SIEM for vulnerability correlation
- Custom rule marketplace for sharing compliance configurations across organizations
- Historical trend analysis: track ShipSafe scores across MRs to measure security posture improvement
Log in or sign up for Devpost to join the conversation.