Inspiration
AI coding assistants are writing more code than ever, The problem? They're also introducing bugs faster than developers can catch them. A team might fix one null-check vulnerability in a PR review, only to discover the same pattern lurking in twelve other files and then they watch it reappear next month when someone copies an old snippet. We kept asking ourselves: what if fixing one bug could immunize the entire codebase against that class of vulnerability? just like a vaccine ?
So we built PHOENIX , like a Phoenix rises from its ashes and becomes tougher, the codebase also evolves and becomes immune to certain patterns of bugs that are there.
What it does
PHOENIX is a 7-agent pipeline that transforms a single failing test into permanent protection. When a pipeline breaks, Triage investigates and classifies the failure. Surgeon patches the immediate problem. Pathologist abstracts the bug into a searchable anti-pattern. Hunter scours the codebase for every sibling vulnerability. Immunizer fixes all of them and generates a Semgrep rule. Arbiter validates everything and opens a merge request. Guardian then stands watch on future PRs, blocking any attempt to reintroduce the same vulnerability class.
The core agentic flow as follows : Triage -> Surgeon -> Pathologist -> Hunter -> Immunizer -> Arbitrer -> Infra Reporter
Phoenix Guardian finds the recurring bug patterns in the MRs and then automatically detects and warns the developer.
How we built it
We built PHOENIX on the GitLab Duo Agent Platform. We used Bun and TypeScript for a simple API to test the flow of the agent.
Each agent has a distinct responsibility and a curated set of GitLab-native tools: 30 unique tools across the system. The pipeline uses conditional routing: infrastructure failures get reported and halted, while code bugs flow through the full immunization chain. Semgrep handles static analysis and rule generation. The whole thing runs against a Fastify API we intentionally seeded with a vulnerable pattern for demonstration.
Challenges we ran into
Agent orchestration is harder than it looks. Getting seven specialized agents to pass context cleanly without bloating prompts or losing critical details required careful schema design. We also hit friction with cross-project search availability depending on GitLab tier, which forced us to make Hunter gracefully degrade when Advanced Search isn't present. Validating generated Semgrep rules before committing them without breaking CI took more iteration than expected.
We also wanted to integrate slack in here, but that proved very difficult as well.
Accomplishments that we're proud of
The variant analysis pipeline actually works. Fix one bug, and PHOENIX genuinely finds and patches siblings across the codebase. The Semgrep rule generation means we're not just cleaning up today's mess, we're preventing tomorrow's. The conditional routing between code bugs and infrastructure failures keeps the system honest instead of hallucinating fixes for problems that don't exist in source code.
What we learned
Multi-agent systems need sharp boundaries. Letting each agent do one thing well and explicitly defining what it hands off to the next prevented the kind of scope creep that turns orchestration into chaos. We also learned that the "fix" is only half the value; the "prevent" is what actually changes the security posture long-term.
Also to set up custom agents and flows using GitLab Duo's native platform
What's next for PHOENIX
Cross-project immunization at scale. When Hunter finds a vulnerability pattern in one repository, there's no reason the same rule can't propagate to every project in a group. We're also looking at feedback loops and tracking which Semgrep rules block the most violations over time and surfacing that data to teams so they understand where their codebase keeps trying to regress.
Eventually we can move to integrate third party apps with GitLab
Built With
- bun
- gitlab
- gitlab-duo
- typescript
- yml


Log in or sign up for Devpost to join the conversation.