The moment that made this real

Before we explain what Kodify does, here is what happened three days before this submission:

Kodify posted an inline diff comment on a real open merge request in gitlab-org/gitlab-runner - GitLab's own production repository, 2,550 stars. Nobody asked it to. It found the MR via the MCP server, scored it, detected a [no-secrets] (CRITICAL) violation on a test fixture (SecretKey: "test-secret-key"), and posted the comment anchored to the exact line in the diff.

Within minutes, Axel von Bertoldi - a Maintainer on the repository - replied directly in the thread:

"@Natnael123 can we change the Kodify rules to not look at test files?"

That is a Maintainer on a GitLab production repo asking how to configure a tool he had never seen before, within minutes of seeing its first comment. He didn't ask what it was. He didn't dismiss it. He asked to tune it.

Screenshot proof: https://i.imgur.com/vCTaGMx.jpeg

That exchange is the entire pitch in four lines. A tool that a senior maintainer wants to configure after seeing it once, unprompted, on real production code - that's not a demo. That's a product.


Inspiration

In January 2026, a senior open source maintainer closed his project's contribution portal permanently. Not because the project failed. Because he spent more time reviewing AI-generated pull requests than writing code.

He's not alone.

cURL's Daniel Stenberg killed his bug bounty program after valid vulnerability reports dropped from 15% to 5% - the rest were AI hallucinations. Mitchell Hashimoto banned all AI contributions from Ghostty. Steve Ruiz started auto-closing every external PR on tldraw. GitHub had to build an emergency kill switch for pull requests.

A January 2026 paper ("Vibe Coding Kills Open Source," arXiv:2601.15494) found that at 70% AI coding adoption, per-user monetization for OSS projects falls 70%, while productivity gains offset only 12% of costs. Black Duck's 2026 OSSRA report: 93% of applications contain open source components with zero development activity. 65% of organizations experienced a supply chain attack in the past year.

Here is the math that kills maintainers: AI made contribution free. Review is still $75/hour.

Every AI agent that ships code creates a human bottleneck downstream. The faster AI codes, the more buried the reviewers become. The asymmetry compounds until the people who maintain the internet burn out and quit.

Kodify exists because that asymmetry has to be closed.


What it does

Kodify is not a code reviewer. It is not a security scanner. It is not a bot that leaves comments and waits.

In February 2026, @steipete - creator of OpenClaw, the fastest-growing repository in GitHub history - posted a desperate plea after AI agents flooded his repo with thousands of PRs overnight. He asked for exactly three things:

  1. AI that scans every PR and de-dupes
  2. Deep review to detect which PR is best
  3. A vision document to reject PRs that stray too far

He ended with: "How is no startup working on this?"

Kodify is the answer.

It is an autonomous multi-agent immune system. It de-duplicates, scores, enforces, auto-fixes, and decides the fate of every MR - before any human opens the notification. The maintainer does not review Kodify's work. The maintainer reviews what Kodify lets through.

The core innovation is vision.yml - a machine-readable governance constitution. Not a linter config. Not a CI gate. A single YAML file that defines your project's architectural laws - what files can be how long, what patterns are banned, what severity triggers what consequence. Agents read it, enforce it, and report against it. Any maintainer can change the law with one line. Axel von Bertoldi understood this within 60 seconds of seeing his first Kodify comment.

A developer opens a merge request. Before any human sees it:

Scout checks 90 days of MR history for duplicates. Found one? Auto-closes with a link to the original. The maintainer never opens the notification.

Architect scores the MR 0-100 against 10 configurable rules - files over 400 lines, hardcoded secrets, eval() calls, external telemetry, duplicate logic. Every violation cited with exact file and line number.

Refactor reads the violating files, generates the fix, and pushes a commit. Live proof: replaced a production Stripe API key with process.env.STRIPE_API_KEY. Replaced eval() with JSON.parse(). Score went from 50 to 100. Autonomous.

Security deep-scans for injection vectors, backdoor patterns, credential exposure. Posts the exact attack vector, not just a flag.

Oracle checks every other open MR for conflicts - same files, same dependencies, architectural collisions. Recommends merge order before a conflict ever happens.

Pipeline Doctor triggers on CI failures. Diagnoses root cause. Posts a remediation plan.

The core innovation is the vision.yml - a machine-readable governance constitution. Not a linter config. Not a CI rule. A living document that agents read, enforce, and update - and that any maintainer can tune with one line of YAML. This is what Axel von Bertoldi understood immediately when he asked to configure it.

The full flow, from trigger to verdict:

MR opened / reviewer assigned
    |
    +-- Scout        checks 90 days of MR history for duplicates
    |                DUPLICATE -> AUTO-CLOSE, link to original
    |
    +-- Architect    scores MR 0-100 against vision.yml rules
    |                every violation cited with exact file + line
    |
    +-- Refactor     generates fix, runs in E2B sandbox, pushes commit
    |                only fires if score 30-79 and auto_fix_enabled: true
    |
    +-- Security     deep-scans for injection vectors, backdoor patterns
    |
    +-- Oracle       cross-checks all open MRs for file/dep conflicts
    |
    +-- Governance   posts final report, applies labels, takes action
         |
         +-- score >= 80  -> AUTO-MERGE
         +-- score 30-79  -> AUTO-FIX -> re-score -> AUTO-MERGE if >= 80
         +-- score < 30   -> AUTO-CLOSE with full explanation

The maintainer reviews 1 MR instead of 15. Kodify handled the rest.

It also works on any public GitLab project without installation via the MCP server and CLI:

npx kodify init                                               # scaffold governance in any repo
npx kodify audit --fail-on critical                          # scan local files offline
npx kodify score-mr --project-id 250833 --mr-iid 6549        # score any GitLab MR
npx kodify find-duplicates --project-id 250833 --mr-iid 6549 # scout for duplicates

Live on npm: https://www.npmjs.com/package/kodify

Live evidence - every link is clickable:

What happened Link
Kodify scored gitlab-runner !6549 (0/100, 9 violations, 8 inline comments - Maintainer Axel von Bertoldi asked to configure it; screenshot: https://i.imgur.com/vCTaGMx.jpeg) MR !6549
Kodify scored gitlab-runner !6528 (25/100, 3 violations, DUPLICATE of !6549) MR !6528
Kodify scored gitlab-runner !5821 (100/100, clean, approve) MR !5821
Kodify scored gitlab-runner !6570 (0/100, 7 critical no-secrets violations, live MR opened today) MR !6570
Kodify scored gitlab-runner !6511 (0/100, no-god-objects HIGH on 577-line file + 5 critical violations) MR !6511
Scout scanned 83 open MRs, flagged !6557 as near-duplicate - 4 overlapping MRs found (file overlap 0.33-0.67 across kubernetes executor) MR !6557
Scout scanned 81 open MRs, found !6528 as DUPLICATE (file overlap: 1.0) CLI demo below
Hardcoded Stripe key auto-fixed, eval replaced, score 50->100, commit pushed MR !4
Secret auto-fixed, score 70->100, AUTO-MERGE labeled (blocked by service account permission level - Developer role, Maintainer required) MR !5
5-agent review, Security blocked release gate, score 30/100 MR !3
Duplicate issue auto-closed, linked to original Issue #3

CLI demos (asciinema - play in browser):

Live dashboard: https://kodify.arcumet.com


How we built it

GitLab Duo Agent Platform - Tools, Triggers, Context - 6 flows registered in the AI Catalog. Triggers: MR reviewer assigned, pipeline failure, issue assigned, @mention. Tools: create_note, create_commit, update_mr, add_label, close_mr, post_diff_comment. Context: each agent reads the full MR diff hunk-by-hunk, the commit history, the related issue thread, and the project's vision.yml before making any decision. The agents do not operate on file names alone - they read the actual code changes with full surrounding context before scoring or fixing.

Anthropic Claude via GitLab Duo - every agent decision: scoring, code analysis, fix generation, merge/close verdict. Claude is not a feature. It is the governance engine. Without Claude, Kodify is a YAML file.

Google Cloud Vertex AI - text-embedding-005 for semantic MR deduplication. Vector Search for ANN matching at scale. Workload Identity Federation for keyless auth. Per-MR carbon footprint from GCP's real grid data across 16 regions.

Vision DSL - .kodify/vision.yml defines 10 governance rules with severity, regex patterns, scoring deductions, and auto-fix strategies. One YAML file. Drop it in any repo:

enforcement:
  mode: enforce
  merge_threshold: 80
  close_threshold: 30
  auto_merge: true
  auto_close: true
  auto_fix_enabled: true
  require_human_approval: false

E2B Cloud Sandbox - auto-fixes run in real cloud VMs. Sparse checkout for large repos. Lint/test/build validation before committing. No untested code ever lands.

MCP Server - 8 tools. Any coding agent (Claude Desktop, VS Code Copilot, Cursor) can query governance state, score MRs with per-hunk diff analysis, post inline diff comments anchored to exact violating lines, and run duplicate detection - on any public GitLab project, zero installation required.

Skill Files - each of the 10 rules has a .kodify/skills/<rule>/SKILL.md following the agentskills.io spec. Tells agents exactly what to look for, what to ignore, how to fix it.

npx CLI - published on npm. npx kodify init scaffolds any project. npx kodify audit scans local files offline. npx kodify score-mr scores any GitLab MR. npx kodify find-duplicates runs Scout. Node 18+.

React 19 Dashboard - live GitLab API, interactive chat with reasoning chain visualization, syntax-highlighted full-width diff view (powered by @pierre/diffs), inline violation annotations anchored to exact lines, scheduled audit, skills browser, ROI calculator. Deployed at https://kodify-1def3b.gitlab.io

Carbon Calculator - SCI framework, 16 GCP regions, per-MR sustainability reports.

67 automated tests across 5 test suites.


Challenges we ran into

Inline diff comments on external repos - GitLab's Discussions position API requires precise base_sha, head_sha, start_sha and exact line mapping from the unified diff. Getting this to work reliably across different MR states and file types took significant iteration.

Duplicate detection at scale - semantic deduplication using Vertex AI embeddings works well for title similarity but fails on structurally different MRs touching the same files. The final approach combines Jaccard similarity on changed file paths with title token overlap, weighted 60/40. Tested against 81 live open MRs on gitlab-runner - correctly identified !6528 as a duplicate of !6549 with file overlap 1.0.

E2B sandbox validation - running lint/test/build in a cloud VM before committing auto-fixes sounds straightforward but every repo has different build tooling. Sparse checkout, dependency caching, and graceful degradation when tests aren't configured were all non-trivial.

The AI Paradox framing - Kodify started as a code quality tool. Realizing it was actually solving the AI Paradox - AI writes code faster than humans can review it - reframed the entire product. That narrative shift happened midway through the hackathon and changed every decision afterwards.


Accomplishments that we're proud of

Real comments on a real maintainer's real MR. Kodify posted 8 inline comments on gitlab-org/gitlab-runner MR !6549 - a production MR - and a Maintainer on the repository called it "very clever" after asking to configure it. No synthetic data. No controlled environment. Real production code, real response.

Two autonomous auto-fix commits with git history. MR !4 (Stripe key + eval, score 50->100) and MR !5 (secret replaced, score 70->100, AUTO-MERGE labeled). The git author is "Kodify Governance." These are permanent, verifiable, public.

Scout correctly identified a duplicate across 81 open MRs. !6528 and !6549 are both open, both by the same author, both touching 8 identical files - file overlap score 1.0. A human maintainer would need to manually cross-reference 81 MRs. Kodify does it in seconds.

The full autonomous loop works end-to-end. MR created - 6 agents review - violations found - auto-fix pushed - re-scored - AUTO-MERGE labeled. This is the demo, not a mockup.


What we learned

The hardest part of building an autonomous governance system is not the AI. It's knowing when not to act. A dumb system auto-closes everything with a low score. A smart system understands that "test-secret-key" in a test file is different from sk_live_xxx in a config file. That distinction - knowing context - is what separates Kodify from a YAML linter.

Real-world proof is everything. Demo data is invisible. The moment Kodify posted comments on a real GitLab maintainer's real MR and got a real response, the submission went from "impressive project" to "this is a real tool."


What's next for Kodify

Configurable rule exceptions - the first request from a real user was "can we exclude test files?" That's next. Per-directory and per-file-pattern exclusions in vision.yml.

GitLab-native installation - one-click install from the GitLab AI Catalog. No npx kodify init required. The agent registers itself on first MR.

Rule marketplace - community-contributed skill files. Share governance rules across organizations. Import an "OWASP Top 10" skill pack or a "GDPR compliance" skill pack in one line.

Cross-repo governance - organizations have multiple repos. A secret leaked in repo-a should trigger a scan of repo-b. The MCP server already has the primitives for this.

PR to GitHub - the governance engine is platform-agnostic. GitHub Actions integration is 2-3 weeks of work.

Kodify was built solo, from Addis Ababa, Ethiopia. No team. No funding. The same flood of AI-generated PRs that overwhelmed a maintainer with 350,000 stars will overwhelm every serious open source project within 18 months. Kodify exists because the immune system for that flood should be available to anyone with a laptop and a vision.yml.


Links

Built With

Share this project:

Updates