-
-
warloop , all agents works in parallel
-
Research agent - scrapes web finds new vulnerabilities, methods to exploit and remediate
-
Red agent - 180+ pentesting tools to detect , LLM to reason and exploit
-
Blue agent - generates production level fixes , validates them against multiple safety gates
-
Governance agent - to enforce policies , control red and blue agents
-
Audit agent - audits everything into a immutable ledger , generating legal compliance reports
Inspiration
Security teams at fast-moving companies face the same broken loop every day: scanners find hundreds of issues, engineers triage for hours, fixes get shipped, and nobody ever goes back to verify the door is actually closed. The average breach goes undetected for 207 days - not because tools don't exist, but because no tool closes the loop.
We built Ouroboros because we believe the question is never just "what's vulnerable?" It's "is the fix verified?" No existing tool answers that. We decided to answer it.
What it does
Ouroboros is an autonomous security agent built on GitLab CI/CD that runs a continuous detect → exploit → fix → re-attack loop on your repository and production endpoints.
The loop:
- DETECT - Semgrep + CodeQL perform SAST on every commit delta. Trivy + Checkov scan IaC and container configurations. Nuclei runs active DAST against live endpoints , and other 180+ professional pentesting tools orchestrated.
- PRIORITIZE - An LLM (CodeLlama via Ollama, running locally) reasons over findings, cross-references CVE severity with EPSS exploit probability scores, and ranks by actual risk — not just CVSS.
- EXPLOIT - Nmap + Nuclei attempt active verification of findings. Only confirmed exploitable issues proceed. This is the false-positive filter no scanner has.
- FIX - The LLM synthesizes targeted code patches scoped to the vulnerable lines. Not whole-file rewrites. Precise, minimal diffs.
- RE-ATTACK - The original exploit is re-run against the patched code. If it's blocked — a signed closure certificate is generated and the finding is resolved.
- PR - A GitLab merge request is opened with the finding, exploit proof, fix diff, and verification result attached. Reviewers see the full evidence chain.
The GitLab integration is native. Ouroboros reads repository structure via GitLab API, hooks into CI/CD pipelines, creates branches, opens PRs, and posts pipeline status updates — all through GitLab's native workflows.
How we built it
- LLM layer: Open source models via Ollama for local inference so that no data leaves the repository environment
- SAST: Semgrep (custom security ruleset) + CodeQL CLI (semantic code analysis)
- IaC/container scanning: Checkov + Trivy
- Active DAST: Nuclei (template-based CVE probing) + Nmap (service fingerprinting)
- Orchestration: Python agent with LangChain for LLM tool calling, structured output enforcement
- GitLab integration: GitLab REST API for repo access, branch creation, PR generation, CI/CD status
- Evidence signing: HMAC-SHA256 signed exploit proof objects with timestamps
- Pipeline: GitLab CI YAML template that drops the agent into any existing pipeline in under 5 minutes
Challenges we ran into
The false positive problem. Early versions surfaced too many noise findings. The solution was architecturally requiring exploit confirmation before a finding reaches the output queue - if the LLM can't synthesize a working exploit attempt, the finding is dropped. This brought false positives to near zero but required significant prompt engineering and tool integration work.
Local LLM reliability. Getting CodeLlama to produce consistently structured JSON output for exploit synthesis was harder than expected. We solved this with strict output schema enforcement via Pydantic models and a retry loop that reprompts with the schema violation highlighted.
Re-attack precision. The re-attack step requires the exact original exploit parameters to be stored and replayed —-not a similar test. Building the exploit parameter serialization and deterministic replay was the most technically complex piece.
GitLab PR quality. We wanted the generated PRs to look like something a senior engineer wrote, not AI output. Significant iteration went into the PR template, description format, and how the evidence bundle is presented to make it immediately useful to a human reviewer.
Accomplishments that we're proud of
Won 2 National level hackathons , improving Ouroboros as per industry needs.
What we learned
The most important insight: the loop matters more than any individual component. A better SAST tool or a smarter LLM is a marginal improvement. Closing the loop with verified re-attack is a categorical improvement. Every tool in the pipeline exists to support the re-attack step — that single mechanism is what makes this different from everything else.
We also learned that local-first LLM inference is production-viable for security tooling. Regulated industries cannot send vulnerability data to external APIs. Running CodeLlama locally means Ouroboros can be deployed in air-gapped environments - a requirement most security tools fail at procurement.
What's next for Ouroboros - Autonomous Security
- Controlled Autonomy: Risk-based auto-merging (low-risk fixes auto-approved, high-risk require review)
- On-Prem Deployment: Run inside enterprise environments with private LLMs (no code leaves infra)
- Smarter Agents: Better reasoning, cross-file analysis, and learning from past fixes
- Advanced Verification: Re-attack + sandbox testing + fuzzing
- Compliance Layer: SOC2, ISO27001 mapping with full audit trails
- Expanded Coverage: Cloud, APIs, and supply chain vulnerabilities
Vision - Security that doesn’t just detect problems - it fixes and verifies itself
Built With
- docker
- fastapi
- gitlab
- langchain
- langgraph
- next.js
- ollama
- postgresql
- python
- redis
Log in or sign up for Devpost to join the conversation.