Inspiration
Every engineering team hits the same wall: code reviews don't scale. Senior engineers spend 30%+ of their time reviewing merge requests, creating bottlenecks that delay deploys by hours or days. Worse, human reviewers are inconsistent—a tired reviewer on Friday afternoon might miss the exact auth bug they'd catch on Monday morning.
I wanted to build a system that treats code review as a signal processing problem: extract structured semantics from raw diffs, aggregate risk signals with weighted scoring, and enforce policy decisions—all autonomously, in under 2 seconds.
What I Learned
The hardest part wasn't building the pipeline—it was building a pipeline that can't be fooled. Early versions had a critical flaw: the CI ran gate scripts directly from the merge request branch, meaning a malicious MR could modify the very code that evaluates it. Solving this required a trust model where gate logic is sourced from a protected ref (POLICY_REF), creating a clear separation between "code being reviewed" and "code doing the reviewing."
I also learned that fail-open validation is worse than no validation—it gives false confidence. Every boundary in DiffGuard is fail-closed: if the validator dependency is missing, the pipeline crashes rather than silently skipping checks.
The risk scoring model uses a weighted signal aggregation approach:
$$S = \min\left(\sum_{i=1}^{n} w_i \cdot \mathbb{1}[s_i],\ 100\right)$$
where $w_i$ are predefined weights for each signal $s_i$ (e.g., AUTH_CODE_MODIFIED = 30, VALIDATION_REMOVED = 25), and the indicator function $\mathbb{1}[s_i]$ fires when the signal is detected in the diff.
How I Built It
The system is a 3-stage contract-driven pipeline:
Analyze—A diff parser (
semantic_from_diff.py) reads unified diffs and extracts structured semantics: file-level change types, risk areas, behavioral signals (auth touched, validation removed, etc.)Score—The risk engine (
src/risk_engine/engine.py) aggregates signals into a weighted score, classifies risk level (low/medium/high), and recommends actionsEnforce—The policy enforcer (
scripts/enforce_policy.py) converts recommendations into executable actions (block merge, request reviewers, enforce canary deployment)
Each boundary is validated against strict JSON schemas in contracts/, ensuring data integrity throughout. The entire pipeline runs in GitLab CI on every merge request event.
Challenges
Signal accuracy—The initial
validation_removeddetector checked all diff text, causing false positives when someone added validation code. Fixed by restricting detection to removed lines only.CI self-tampering — An MR could modify gate scripts to approve itself. Solved with
POLICY_REF— gate scripts are checked out from a protected branch.Rename semantics—Git rename diffs have special metadata (
rename from/rename to) that the parser initially ignored, misclassifying renames as modifications.Category inference—A crude heuristic (
<20 lines = refactor) misclassified security hotfixes. Added context-aware detection that considers risk areas alongside delta size.
Log in or sign up for Devpost to join the conversation.