Logo
Architecture

🚀 About KaiSec

💡 What Inspired Us

Let's be real: we absolutely love "Vibe Coding." With GenAI, our teams are shipping features at breakneck speeds. But we quickly realized that speeding up feature velocity just creates a massive, ugly backlog of security debt.

You can think of the problem like this: $$ \text{Security Debt} = \frac{\text{Vibe Coding Velocity} \times \text{AI Hallucination Rate}}{\text{Security Team Bandwidth}} $$

Here is the daily friction: Security engineers are brilliant at spotting risks, but they don't have the time (or product context) to jump into the IDE and write the patch. On the flip side, developers are moving so fast that they lack the deep security expertise to make their code 100% compliant on the first pass.

We were tired of the endless Jira ping-pong between security and engineering. We realized we needed to shift from a "Human-IN-the-loop" model (where developers must stop everything to manually fix things) to a "Human-ON-the-loop" model.

We built KaiSec to be our ultimate 24/7 autonomous security engineer—an ambient pipeline that detects, plans, patches, and tests vulnerabilities while we sleep.

🛠️ How We Built It

Instead of trying to build one massive "Mega-Prompt" that gets confused trying to do everything at once, we built KaiSec on top of the bleeding-edge GitLab Duo Flow Registry (v1) using an ecosystem of highly specialized "Micro-Agents."

The total system resolution power scales with the number of specialized agents: $$ \text{Remediation Power} = \sum_{i=1}^{n} \text{Agent}_i ( \text{Context} \times \text{Specialized Toolset}) $$

We divided the ecosystem into three core flows:

The Compliance Factory: A swarm of agents (SAST, Dependency, and DAST/Container) that independently hunt vulnerabilities across different vectors. They generate specific code patches and then funnel them all to a single, unified Merge Request Agent.
The Test Generator Flow: Because what good is a security patch if it breaks the app? One agent generates the boilerplate pytest structure, and a strict QA Edge Case Agent acts as an overseer to inject boundary conditions ($x \to \infty$, None types, etc.).
The Local Execution Engine: A sandboxed runner that clones the code into an ephemeral Python Docker container, runs pip install and pytest, and parses the raw XML to give us definitive proof that the patch worked.

🚧 Challenges We Faced

Building true autonomous pipelines means you hit the limits of AI orchestration pretty fast.

First, JSON Schema hallucinations were our biggest nightmare. We needed the LLMs to strictly adhere to the massive GitLab create_commit API. But the base models kept stripping out the required actions[] JSON array. We ended up having to reverse-engineer the Flow Registry v1 schema, physically block conflicting platform platform primitives, and tightly couple our prompts to enforce deterministic API calls.

Second, Variable Injection Blindness. We learned that agents running in different environments (CLI vs. Kubernetes CI Runner) don't always get the same environment variables (like merge_request_iid). If an agent expected an ID and didn't get it, it failed silently. We had to rewrite our prompts so the agents could dynamically extract their own constraints straight from the raw user prompt.

Finally, The Infinite Loop of Doom. When you give AI the power to write code and trigger CI pipelines, it can accidentally react to its own commits in an endless loop. We engineered strict "Phase 0" guardrails, forcing agents to analyze MR comments and mathematically prove ($N_{KaiSec\ Notes} = 0$) that they hadn't already run before proceeding.

🧠 What We Learned

Prompt engineering for an agent that actually manipulates infrastructure is a whole different beast than talking to a chatbot.

We discovered that separating AI roles into discrete, strict steps (Generator $\to$ QA $\to$ Reporter) increases accuracy exponentially. Above all, we proved that true 24/7 automation is possible. We can literally vibe code all afternoon, let the pipeline fail, and wake up to a beautifully patched and fully-tested Merge Request by morning.