Inspiration

Business logic vulnerabilities are the security industry's open secret. They cause real financial damage — the Latitude Financial breach, countless e-commerce over-refund exploits, privilege escalation attacks that bypass every scanner on the market — and they never appear in a CVE database.

The reason is structural: static analysis tools like Semgrep and SonarQube work by pattern matching against known vulnerability signatures. They are exceptionally good at what they do. But detecting a flaw where code is syntactically perfect yet logically broken — where a refund endpoint accepts amounts larger than the original transaction, or where a completed payment can be captured twice — requires understanding what the software is supposed to do. No pattern can encode that.

Every team running a modern DevSecOps pipeline has this gap. The pipeline gives a green checkmark. The business logic flaw ships to production. DevSecOps Autopilot exists to close that gap at the point where it's cheapest to fix: the merge request.

What it does

DevSecOps Autopilot is an AI security agent that automatically reviews GitLab merge requests for business logic vulnerabilities — the class of flaws that standard SAST tools cannot detect.

When a developer opens or updates a merge request, the agent triggers automatically via webhook. It reads the full diff and pipeline context through the GitLab MCP server, then uses Gemini to reason about business intent rather than code syntax. It hunts for six categories of logic flaws:

  • Financial logic flaws: refund overflow, negative value exploits, double-credit scenarios
  • State machine violations: acting on resources in invalid states — capturing a completed transaction, cancelling a shipped order
  • Authorization gaps: user-controlled role parameters, missing ownership checks, horizontal privilege escalation
  • Limit bypasses: business tier limits enforced only on the frontend, quota circumvention
  • Idempotency failures: financial operations that execute multiple times when they should execute once
  • SSRF via business features: webhook registration and import endpoints accepting internal network addresses

For each finding, the agent posts a structured comment directly on the MR — severity level, affected file and line range, attack scenario, business impact, and a specific code-level fix. It applies a security label to the MR automatically (security::critical through security::clean) and creates a tracked GitLab issue for any critical or high findings.

On a clean MR with no logic flaws, it posts a clean bill of health and applies the security::clean label — demonstrating judgment, not just pattern matching.

The entire review completes in under 30 seconds.

How we built it

The agent is built on Google ADK (Python) and deployed on Vertex AI Agent Platform, using Gemini 3.1 Flash Lite Preview as the reasoning model.

The GitLab integration is handled entirely through the GitLab MCP server — the agent has no hardcoded GitLab API calls. All reads (diff, pipeline status, linked issues) and writes (MR comments, labels, issues) go through MCP tools. This makes the partner integration meaningful rather than cosmetic: the MCP server is the agent's only interface to GitLab.

The trigger layer is a FastAPI application deployed on Google Cloud Run that receives GitLab merge_request webhooks, validates them, and invokes the ADK agent session on Vertex AI. GitLab credentials are stored in GCP Secret Manager and retrieved at runtime.

The demo target is NexaPay — a fictional B2B payment processing API built in Node.js/Express with a realistic module structure, seeded with six intentional business logic flaws written as a rushed developer would write them: no comments, no signals, just pragmatic code that violates business rules.

The CI/CD pipeline on the NexaPay repo runs install, lint, unit tests, npm audit, and secret detection — all passing — to demonstrate that the existing automation has no visibility into what the agent finds.

Challenges we ran into

The hardest problem was prompt engineering for precision. Gemini needed to distinguish between a genuine business logic flaw and a standard CRUD operation with proper validation — a false positive on every update endpoint would make the tool useless. The system prompt went through significant iteration to achieve specificity: explicit flaw taxonomy, worked examples of what does and does not qualify, and a mandatory "why SAST missed this" section that forces the agent to articulate its reasoning for every finding.

The second challenge was webhook loop prevention. GitLab fires an MR update event when any comment is posted — including the agent's own findings comment. Without filtering, every review triggered another review in a feedback loop. The fix was constraining the webhook trigger to the open action only, ensuring the agent fires once per MR creation rather than once per activity event.

The third challenge was making the GitLab MCP integration substantive rather than decorative. It would have been simpler to call the GitLab REST API directly. Using MCP for every GitLab interaction — reads and writes — required understanding the available tool surface and designing the agent workflow around it. The result is an agent that cannot function without the MCP server, which is the right architectural relationship between agent and partner technology.

Accomplishments that we're proud of

The agent correctly identifies business logic vulnerabilities that the NexaPay pipeline's automated security tooling — including GitLab's built-in secret detection and npm audit — does not flag. The contrast is demonstrable in a single demo: pipeline passes, agent finds five critical and high findings.

The clean MR behavior is equally important. When a developer pushes a standard analytics endpoint with no logic changes, the agent posts a clean bill of health and applies security::clean without generating noise. A security tool that cries wolf on every MR is useless — judgment in both directions matters.

The end-to-end automation is complete. From MR open to findings comment to label applied to issue created, no human is in the loop. The entire workflow runs in under 30 seconds.

What we learned

Business logic security is genuinely hard to automate — not because the AI can't reason about it, but because the prompt engineering required to make it precise is non-trivial. The difference between a useful security agent and a noisy one is specificity: knowing exactly what categories of flaws to hunt, what good code looks like, and how to articulate the gap between what a static analyzer sees and what business intent requires.

The GitLab MCP server as an agent interface is more powerful than direct API calls for agent use cases. The tool abstraction lets the agent plan its actions declaratively rather than imperatively — it decides what information it needs and requests it, rather than being wired to a fixed sequence of API calls. This makes the agent more robust to variation in MR structure.

Google ADK's runner model with InMemorySessionService is well suited to stateless webhook-triggered agents. Each MR review is an independent session — no state leaks between reviews, and the agent starts fresh with full context for every invocation.

What's next for DevSecOps Autopilot

The NexaPay demo targets a Node.js fintech API — but the agent's business logic taxonomy is language and framework agnostic. The next step is validating it against Python/Django, ASP.NET Core, and Java Spring codebases to confirm the prompt generalizes beyond JavaScript.

The six flaw categories in the current taxonomy are a starting point. A production version would expand to cover: multi-step workflow bypass (skipping steps in a checkout or approval chain), time-of-check/time-of-use races in inventory or booking systems, and data visibility flaws where users can access records they don't own via predictable identifiers.

The logical production path is as a GitLab CI/CD component — a reusable pipeline include that any team can drop into their .gitlab-ci.yml with a GitLab PAT and an ADK endpoint, with no custom infrastructure required.

Built With

  • fastapi
  • gemini-3.1-flash-lite-preview
  • gitlab-mcp
  • google-adk
  • google-cloud-run
  • google-cloud-secret-manager
  • node.js
  • python
  • vertex-ai
Share this project:

Updates