Inspiration

In 2026, vibe coding changed everything — developers describe features in casual language and AI generates entire applications in minutes. The speed is thrilling. But it creates a new kind of chaos that nobody talks about:

AI writes code. Nobody checks it.

The code ships with hardcoded API keys. With zero tests. With O(n²) loops burning through cloud compute and driving up carbon emissions.

I looked at who this was affecting the most:

  1. New Developers: Almost everyone "vibe codes" now, but beginners don't know the downstream implications of what they are about to push to production. They lack the architectural experience to spot security holes or performance drains.
  2. Experienced Engineers: Senior developers building massive, powerful projects are getting bogged down. They want to focus on high-level architecture, but they are spending hours catching the "small errors" generated by AI that turn into massive technical debt or compliance disasters later.

I watched this pattern repeat: developers now spend more time cleaning up AI-generated code than they spent writing code manually. The velocity promise of vibe coding was evaporating into security reviews, compliance audits, and emergency patches.

That's when it hit me — the GitLab Duo Agent Platform was built for this problem. What if instead of one developer manually checking everything, you had an autonomous orchestra of specialist agents acting as an automated senior mentor for juniors, and an automated cleanup crew for seniors?

VibeGuard was born from a simple conviction: developers of all skill levels should be free to vibe-code fearlessly. The agents should handle the rest.


What it does

VibeGuard Flow is an autonomous multi-agent orchestra built on the GitLab Duo Agent Platform. It's not a chatbot. It's a digital teammate that takes real actions on your code.

Here's what happens when you mention @ai-vibeguard-orchestrator-gitlab-ai-hackathon on a merge request:

🎯 Phase 1 — Vibe Interpreter reads your vague MR description ("just make login work and look premium") and transforms it into structured requirements, edge cases, acceptance criteria, and auto-creates subtask issues. No more ambiguity.

🛡️ Phase 2 — Security Sentinel deep-scans every changed file for OWASP Top 10 vulnerabilities — hardcoded secrets, SQL injection, PII exposure, missing authentication. It auto-fixes LOW/MEDIUM issues and commits the patches directly to your branch. CRITICAL issues get flagged for human review. In my live test, it found 15 real vulnerabilities in a single MR.

🧪 Phase 3 — TestForge identifies untested code paths and generates comprehensive test suites — unit tests, integration tests, security validation tests, and edge case tests. It commits the test files and updates your test script.

🌱 Phase 4 — Green Optimizer refactors O(n²) algorithms to O(n), removes dead code, and calculates carbon savings using the Green Software Foundation SCI methodology. It recommends the greenest Google Cloud deployment region (us-central1, 93% Carbon-Free Energy) and posts a unified pipeline summary with a readiness score (0–100%).

If readiness exceeds 95% with no critical flags → it recommends auto-deploy to Google Cloud Run in the greenest region.

Plus, a standalone Compliance Auditor agent lives in Duo Chat for on-demand GDPR, WCAG 2.1 AA, and licensing audits — anytime, anywhere.

One mention. Four agents. Zero manual intervention. Production-ready code in minutes.


How I built it

I built VibeGuard entirely on the GitLab Duo Agent Platform, following the v1 Flow Registry specification:

Architecture:

Technical approach:

  1. Deep research into the Duo Agent Platform — studied the custom flow schema, v1 spec, and verified all 89 available tools against tool_mapping.json
  2. Flow design — components chain sequentially (vibe_interpretersecurity_sentineltest_forgegreen_optimizerend), each passing its final_answer as context to the next agent via inputs with from/as syntax
  3. Prompt engineering — each agent has a detailed system prompt with specific scan categories, output formats, severity-based action rules, and citation requirements (OWASP, CWE, GDPR articles)
  4. Sustainability integration — Green Optimizer applies SCI methodology, uses real Google Cloud CFE% data for region recommendations, and a dedicated google-cloud-info CI job surfaces carbon data in every pipeline run
  5. Sample application — built a deliberate Node.js + Express API with 10+ intentional vulnerabilities (hardcoded API keys, SQL concatenation, PII logging, O(n²) loops, dead code) to showcase every agent's capabilities
  6. CI/CD pipeline — 6-stage pipeline with SAST scanning, Google Cloud sustainability job, and Cloud Run deployment targets

Everything is published via git tags to the AI Catalog — no manual UI setup required.


Challenges I ran into

1. The 404 Project Not Found Wall The biggest challenge was discovering that agents couldn't automatically resolve the correct project path. During my first live test, the Vibe Interpreter tried to access the MR using its own service account username (ai-vibeguard-orchestrator-gitlab-ai-hackathon) as the project path — causing a 404 error. I solved this by adding explicit "Project Access" instructions in every agent's system prompt, guiding them to extract the project path from the MR URL in the goal context. This was a lesson in how prompt engineering for agents is fundamentally different from prompting for chat.

2. Tool Verification With 89 available tools, it was critical to verify every tool name against tool_mapping.json before including it. A single typo would cause the entire flow to fail. I cross-referenced each tool for each agent to ensure only valid, relevant tools were assigned.

3. Balancing Agent Autonomy with Safety The auto-fix policy was a design challenge: agents should fix what they can, but never break production code. I implemented a severity-tiered system (LOW/MEDIUM = auto-fix, HIGH = flag, CRITICAL = halt) that gives agents autonomy while keeping humans in the loop for decisions that matter.


Accomplishments that I'm proud of

🏆 It actually works end-to-end. In my live test, Security Sentinel found 15 real vulnerabilities (5 CRITICAL, 6 HIGH, 3 MEDIUM, 1 LOW) with OWASP classifications and CWE references — not generic warnings, but specific, actionable findings with line numbers and code fix recommendations.

🏆 Four agents, one flow, zero human intervention. Mention the bot on an MR, walk away, come back to a fully analyzed merge request with security patches, test generation, carbon savings calculations, and a deploy recommendation. That's the promise of the Duo Agent Platform realized.

🏆 Real sustainability impact. The Green Optimizer doesn't just mention carbon — it uses actual CFE% data from Google Cloud regions, applies Green Software Foundation SCI methodology, and recommends the optimal deployment target. In my test, it identified ~30% estimated carbon savings through code optimizations and 71% carbon reduction through region selection. The google-cloud-info CI job makes this visible in every pipeline run.

🏆 Ridiculously easy to adopt. No SDK. No custom server. No configuration files. Just mention @ai-vibeguard-orchestrator-gitlab-ai-hackathon on any MR and the entire pipeline runs automatically. The standalone Compliance Auditor is one click away in Duo Chat. Zero learning curve — if you can write an MR, you can use VibeGuard.

🏆 The readiness score concept. Instead of overwhelming developers with separate reports, VibeGuard distills everything into one number: a readiness score that immediately answers "can I ship this?" — with a transparent breakdown of security, testing, compliance, and efficiency factors.

🏆 Interactive "God Mode" Auto-Fixes. The agents don't just complain about problems — they formulate the code fixes and present them with an interactive "Apply Safe Fixes" call to action. If the developer replies @ai-vibeguard-orchestrator-gitlab-ai-hackathon apply fixes, the agents autonomously commit all LOW/MEDIUM security patches and energy-efficiency refactors directly to the branch. This is true agentic action beyond chat, preserving developer trust while maximizing automation.

🏆 Self-Healing CI/CD Pipelines. I gave the TestForge agent access to get_job_logs and get_pipeline_errors. If an MR's pipeline is failing, TestForge autonomously reads the logs, identifies the root cause (syntax error, missing import), and commits a fix to heal the pipeline before generating its tests.

🏆 Intelligent Project Triage. Vibe Interpreter goes beyond translating descriptions — it actively manages the project. It applies architectural labels to the MR, identifies unstated edge cases, and aggressively uses the create_issue tool to generate and link new subtask tickets for missing features (like adding a 'password reset' ticket when a developer only asks for 'social login').


What I learned

Agents are fundamentally different from chatbots. In chat, you optimize for helpful answers. In agents, you optimize for correct actions in ambiguous contexts. The prompt engineering discipline is entirely different — you need explicit action rules, severity policies, output contracts, and error recovery strategies.

The Duo Agent Platform is incredibly powerful — and still early. Building on cutting-edge infrastructure means discovering limitations in real-time. The sequential routing model, the ambient-only environment, the tool verification requirements — these constraints shaped the architecture in ways I didn't anticipate, and forced creative solutions.

Green software is an underserved opportunity. Most developers don't think about the carbon cost of an O(n²) loop or the CFE% of their deploy region. Surfacing this data directly in MR comments — where developers already look — makes sustainability visible and actionable. The Green Software Foundation's SCI framework is elegant and deserves wider adoption.

Shared rules (AGENTS.md) are the secret weapon. Having a single source of truth for all agents — coding standards, severity policies, commit message formats, green principles — creates consistency across the entire pipeline. Every agent follows the same playbook.


What's next for VibeGuard Flow

🔮 Immediate roadmap:

  • Parallel agent execution — When the Duo Agent Platform supports parallel routing, run Security Sentinel, TestForge, and Compliance Auditor simultaneously for 3x faster pipeline completion
  • Live Google Cloud deployment — Replace echo-simulated deploys with real Cloud Run deployments using Workload Identity Federation
  • Pipeline trigger integration — Use gitlab_graphql to auto-trigger staging pipelines when readiness score exceeds threshold
  • Customizable rules — Let teams define their own AGENTS.md with project-specific policies, severity overrides, and compliance frameworks

🚀 Bigger vision:

  • VibeGuard Marketplace — Additional specialist agents (API Design Reviewer, Database Migration Guardian, Performance Profiler) that plug into the orchestrator as new components
  • CI carbon budgets — Set maximum SCI score per pipeline; block deploys that exceed the carbon budget
  • Learning from feedback — Track which auto-fixes get accepted vs. reverted, and feed that data back to improve agent accuracy
  • Cross-project intelligence — Share security patterns and compliance findings across an organization's entire portfolio
  • Carbon dashboard — Aggregate SCI data across all MRs to show organization-level sustainability trends over time

The future of development isn't AI writing code. It's AI writing code and an orchestra of agents making sure that code is secure, tested, compliant, and sustainable — before any human touches it.

That's VibeGuard Flow. You vibe. We guard.

Built With

  • gitlab
  • gitlab-duo
  • green-software-foundation-sci
  • javascript
  • prompt-engineering
  • yaml
Share this project:

Updates