CHP / Code Highway Patrol

Inspiration

AI coding agents forget your project's rules. You tell them no console.log, no DB imports from the frontend, no silent try/catch. They say "got it." Hours later the diff lands with half the rules broken. Often as codebases grow, *.MD files such as AGENTS.MD, SKILLS.MD that contain any sort of rules, are rarely obeyed. As models debug the same issue, and complete several passes, we have found that even on the most powerful frontier models, this issue still persists. Thus we have created CHP; a way to heuristically and semantically enforce your rules of the codebase at the lowest level possible for higher quality code, production-ready results, and fast execution!

What it does

CHP enforces project-specific rules (laws) on every change an AI agent or human makes. A law is anything auditable: "no console.log," "no API keys in source," "frontend can't import the database client," a naming convention, a max function length. If a script can check it, CHP enforces it.

The unit. A law is three files: law.json (metadata + hook list), verify.sh (the check), GUIDANCE.md (the explanation injected into the agent's context). core/law-mutate.sh keeps them in lockstep. Edit one, the other two update.

The dispatcher. core/dispatcher.sh receives events from any hook and runs the laws registered to it. 25+ hook points total: 15 git, 6 agent lifecycle (pre/post-prompt, pre/post-tool, pre/post-response), 4 CI/CD (pre/post-build, pre/post-deploy).

The hosts. Five AI coding tools (Claude Code, Codex, Windsurf, OpenCode, Gemini) call the same Bash core through thin plugin folders. An MCP server (lib/mcp-server.js) adds chp_analyze, chp_check, chp_create_law, chp_validate for any other client.

The fail path. When verify.sh returns non-zero, the law's autoFix mode decides. never blocks. ask proposes a fix and waits. auto patches in place, re-stages, re-verifies.

The agents. Four roles, one job each:

  • Chief (agents/chief.md): authors laws, owns the registry.
  • Officer (agents/officer.md): runs verify.sh and blocks on violations. The line the agent can't argue past.
  • Detective (agents/detective.md): writes the GUIDANCE.md injected into the agent's context, and tightens it on repeat failures.
  • Technician (agents/technician.md): patches violations, re-stages, re-verifies.

The probe. core/probe.sh tests the law itself. Plants known violations (console.log, API key prefixes like sk-…, AIza…, AKIA…, ghp_…, xoxb-…), runs verify.sh, reports whether it caught them. Laws get tested the way code does.

The skills. Seven slash commands are the day-to-day surface: write-laws, audit, investigate, review-laws, status, decompose-law, marketplace.

The CLI. 14 chp-* commands wrap the runtime (chp-law, chp-audit, chp-fix, chp-scan, chp-doctor, chp-hooks, chp-install, chp-list, chp-show, chp-status, chp-search, chp-publish, chp-marketplace, chp-login). core/detector.sh finds installed hosts and wires their plugin folders.

The marketplace. Pink Donut is a real backend. Publish rule packs with chp-publish, install with chp-install, search with chp-search. Versioned, like npm.

How we built it

Runtime: Bash + jq. CLI: Node.js wrapper around 14 subcommands. Zero npm dependencies. Clone and run on any machine with git.

  • Core engine (core/): dispatcher.sh (event router), hook-registry.sh (hook → law map), verifier.sh (fileScope glob matching), law-mutate.sh (atomic edits across the three files), fix-trigger.sh (routes failures to Technician), probe.sh (law self-test), tightener.sh (records repeat failures), detector.sh (host autodetection), checkers/ (pattern, threshold, structural, agent).
  • Plugin adapters: .claude-plugin/, .windsurf-plugin/, .codex-plugin/, .opencode/, gemini-extension.json. Each is a thin folder; the Bash core underneath is identical.
  • MCP server (lib/mcp-server.js): chp_analyze, chp_check, chp_create_law, chp_validate.
  • CLI (bin/chp): routes to the 14 chp-* subcommands.
  • Skills (skills/): seven Markdown-defined slash commands the host loads on init.
  • Test suite (tests/): 14 shell test scripts including end-to-end auto-fix and hook flows. Run with scripts/run-tests.sh.
  • Marketplace backend: Pink Donut, REST API, versioned packs.

Challenges we ran into

Deterministic vs. agentic enforcement. Asking an agent "does this look good?" is the loop we were trying to break. So every law compiles to a verify.sh with a real exit code. The agent helps write the script; the script stands at the boundary.

Cross-tool compatibility. Five hosts, five plugin/hook/MCP conventions. Windsurf's marketplace forced trust and packaging decisions on day one. The fix: keep the core host-agnostic. Each tool gets a thin plugin folder; the Bash underneath never knows which one called it.

One dispatcher, three event domains. Git, agent lifecycle, and CI hooks have wildly different triggers, file contexts, and exit-code semantics. We built one event format so a single law fires across all three with no per-host duplication.

Testing the rules, not the code. Most enforcement tools assume the rule itself is correct. A broken verify.sh silently approves what it was meant to block. Probe plants known-bad content for each check and asserts the verifier catches it before the law ever runs on real code.

Accomplishments that we're proud of

  • Verification actually runs. Every law compiles to a real script with a real exit code, on every commit, every tool call, every deploy.
  • Five AI hosts, one Bash core. Adding the fifth took hours, not weeks.
  • Auto-fix closes the loop. Violations don't just block. The Technician patches, re-stages, re-verifies.
  • Laws test themselves. Probe asserts each law catches the violations it claims to.
  • Pink Donut is real. Not a demo. Sign in, chp-publish, chp-install, chp-search. Versioned like npm.
  • Zero npm dependencies. Pure Bash + jq. Clone, run.
  • No new DSL. Your rules stay in plain language. CHP layers underneath.

What we learned

Agent judgment is too loose to ship (especially for PROD!): same code, different verdicts. Every law has to compile to a real exit code. Single-stage enforcement (only-CI or only-pre-commit) misses most agent mistakes; firing the same law at pre-prompt, pre-tool, pre-commit, and pre-deploy puts the feedback inside the turn the agent is working in. Host-agnostic beats host-specific: pure Bash + jq and a thin plugin folder per tool meant the fifth integration was an afternoon, not a sprint. And rules need their own test layer. Without Probe, a broken verify.sh silently green-lights what it was meant to catch.

What's next for CHP

  • Starter pack. no-console-log, no-secrets, scoped imports, function-complexity laws ship with chp init.
  • Probe coverage for threshold and structural. Currently pattern only.
  • Signed, versioned law packs. Cryptographic provenance for shared rule sets.
  • Framework packs on Pink Donut. React, Next.js, FastAPI, Rails.
  • Editor diagnostics. Render GUIDANCE.md inline in VS Code and JetBrains, so violations explain themselves before the commit hook fires.

Built with

  • Bash: core engine, dispatcher, hook templates, every verify.sh
  • jq: JSON processing inside shell
  • Node.js: CLI wrapper, MCP server, OpenCode plugin
  • Markdown: agent prompts, law guidance, skills
  • JSON: laws, hook registry, plugin manifests, MCP schemas
  • Git: primary enforcement surface
  • MCP: interface for Claude Code, Windsurf, and other MCP clients
  • Claude Code, Windsurf, Codex, OpenCode, Gemini: five AI hosts, dedicated plugin adapters
  • Pink Donut: marketplace backend at pinkdonut.work

Built With

Share this project:

Updates