title
problem
pipeline
demo
risk engine
enforcement
trust model

Inspiration

Every engineering team hits the same wall: code reviews don't scale. Senior engineers spend 30%+ of their time reviewing merge requests, creating bottlenecks that delay deploys by hours or days. Worse, human reviewers are inconsistent—a tired reviewer on Friday afternoon might miss the exact auth bug they'd catch on Monday morning.

I wanted to build a system that treats code review as a signal processing problem: extract structured semantics from raw diffs, aggregate risk signals with weighted scoring, and enforce policy decisions—all autonomously, in under 2 seconds.

What I Learned

The hardest part wasn't building the pipeline—it was building a pipeline that can't be fooled. Early versions had a critical flaw: the CI ran gate scripts directly from the merge request branch, meaning a malicious MR could modify the very code that evaluates it. Solving this required a trust model where gate logic is sourced from a protected ref (POLICY_REF), creating a clear separation between "code being reviewed" and "code doing the reviewing."

I also learned that fail-open validation is worse than no validation—it gives false confidence. Every boundary in DiffGuard is fail-closed: if the validator dependency is missing, the pipeline crashes rather than silently skipping checks.

The risk scoring model uses a weighted signal aggregation approach:

$$S = \min\left(\sum_{i=1}^{n} w_i \cdot \mathbb{1}[s_i],\ 100\right)$$

where $w_i$ are predefined weights for each signal $s_i$ (e.g., AUTH_CODE_MODIFIED = 30, VALIDATION_REMOVED = 25), and the indicator function $\mathbb{1}[s_i]$ fires when the signal is detected in the diff.

How I Built It

The system is a 3-stage contract-driven pipeline:

Analyze—A diff parser (semantic_from_diff.py) reads unified diffs and extracts structured semantics: file-level change types, risk areas, behavioral signals (auth touched, validation removed, etc.)
Score—The risk engine (src/risk_engine/engine.py) aggregates signals into a weighted score, classifies risk level (low / medium / high), and recommends actions
Enforce—The policy enforcer (scripts/enforce_policy.py) converts recommendations into executable actions (block merge, request reviewers, enforce canary deployment)

Each boundary is validated against strict JSON schemas in contracts/, ensuring data integrity throughout. The entire pipeline runs in GitLab CI on every merge request event.

Challenges

Signal accuracy—The initial validation_removed detector checked all diff text, causing false positives when someone added validation code. Fixed by restricting detection to removed lines only.
CI self-tampering — An MR could modify gate scripts to approve itself. Solved with POLICY_REF— gate scripts are checked out from a protected branch.
Rename semantics—Git rename diffs have special metadata (rename from / rename to) that the parser initially ignored, misclassifying renames as modifications.
Category inference—A crude heuristic (<20 lines = refactor) misclassified security hotfixes. Added context-aware detection that considers risk areas alongside delta size.

Built With

ci/cd
cli
edge-cases
git
gitlab
json
pytest
python

Updates

Mostapha EL ANSARI started this project — Mar 04, 2026 07:59 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.