Corpus

Inspiration

Vibe coding is eating software development. Developers are shipping entire products by describing what they want to Claude or Cursor — and it works, until it doesn't. The problem isn't that AI writes bad code. The problem is that AI writes plausible code: it removes the token validation guard because the new flow "doesn't need it," it changes a function signature because the new version "looks cleaner," it hallucinates an npm package that either doesn't exist or is a typosquatted backdoor. No existing tool catches this because no existing tool sits between the AI and the file system.

We wanted to build the immune system that vibe-coded software never had.

What it does

Corpus intercepts AI-generated code at the moment of writing — before it ships — and verifies it against a structural graph of your codebase.

When an AI tool like Claude Code or Cursor writes a file, Corpus diffs it against the saved graph and checks for three things that cause real production incidents:

Removed guard clauses — security checks that the AI silently deleted
Changed function signatures — breaking changes that silently break callers
Removed exported functions — contracts the rest of the codebase depends on

If a violation is found, Corpus returns structured fix instructions to the AI, which regenerates the file. This loop continues until the file is VERIFIED. The developer never sees the bad code — it never lands.

Beyond structural verification, Corpus runs 12 specialized security scanners (secrets, SQL injection, prompt injection, data exfiltration, CORS misconfig, and more), computes a per-file trust score from 0–100, and exposes everything through 7 MCP tools that Claude Code and Cursor connect to natively.

The policy layer is built on 10 deterministic Jac walkers — not LLM opinions. Same input, same verdict, every time.

After scanning 280 open-source repos (216K files, 723K graph nodes), the pattern learner has identified which security findings are real production risks versus high-frequency noise in test files, achieving 45% noise reduction with zero configuration.

How we built it

The stack is deliberately boring at the infrastructure layer so the interesting parts could be interesting.

Graph Engine — regex-based AST parser (no ts-morph dependency) that scans TypeScript/JavaScript projects in under a second, extracting function signatures, guard clauses, exports, and call edges into a .corpus/graph.json structural map.

Auto-Fix Engine — diffs incoming file content against the saved graph, classifies violations by severity (CRITICAL / WARNING), and emits structured fix instructions in plain English that the AI can act on immediately.

Jac Policy Walkers — 10 walkers written in Jac (jaseci.org) for deterministic graph traversal. Jac was the right choice here: no hallucination risk, no probabilistic variance, pure rule evaluation.

Pattern Learner — scans findings across all repos, classifies by file type (production / test / build tool), computes false positive rates per pattern, and auto-suppresses patterns that are noise in context (eval() at 89% frequency in test files = suppressed).

MCP Server — 7 tools exposed to Claude Code and Cursor. corpus_check is the core interception tool. corpus_health gives the immune system status. The rest handle secrets, trust scoring, injection detection, and safety checks.

Immune Memory — local .corpus/memory.json per project, synced to Backboard.io for cross-session and cross-machine persistence. Tracks violation history, fix counts, and baseline snapshots.

Web Dashboard — live scan page, graph explorer, pattern evolution visualizer, auto-fix demo, and real-time monitoring. Built to show the system working, not just explain it.

The OSS scanner ran as a cron job, cloning repos in subprocess isolation to avoid OOM, with auto-commit every 10 repos so no scan run was ever lost.