Downwind — Pre-Merge Impact Checks

Walks Orbit's graph forward from your MR to find what's downstream — then flags the stale, single-owner code on that path that nobody's watching. It reports reachability honestly, including where it can't see. Scout, not a gate.


Inspiration

Here's a story every developer on a monorepo knows.

You open a merge request. One file, one fix, every check is green. You merge. Four hours later a pipeline you've never heard of goes red — in a package three layers away from the file you touched. You didn't break your code. You broke code that depends on your code, through a chain of imports you couldn't see and never opened.

The worst part isn't the break. It's the discovery: the file that broke was last touched fourteen months ago by a single engineer who hasn't committed since. Nobody on your team knew it existed. You merged into a blind spot — and you had no way to know.

Every monorepo has these invisible downstream chains. Every team finds them at 2 AM. We built Downwind to surface them at merge-request time, before they become incidents — and to be honest about the chains it can't see, because a tool that hides its blind spots is just a different blind spot.


What It Does

The instant you open a merge request, Downwind walks GitLab Orbit's code knowledge graph forward from your changed files and computes the reachable downstream set — every file that imports yours, transitively, across package boundaries, up to three hops out. Then it joins git history to ask the question no impact tool asks: is the critical code on that path still owned by anyone?

It posts a Mermaid dependency graph as an MR comment — your change at the root, edges fanning to the reachable set, stale single-owner nodes highlighted. You see what's downstream, who last touched it, and where the blind spots are, before you click merge.

Two things we are deliberate about, because they're where tools like this usually fail:

  • Reachable is not the same as affected, and we say so. Downwind reports what your change can reach through the import graph — not a proof that every reachable file's behavior changes. Type-only imports and re-exports inflate reachability; we surface the set and let you read it, rather than dressing a graph traversal up as a guarantee. The honest signal is more useful than the confident one.
  • When there's nothing to flag, it says nothing. It's a scout, not a gate. It never blocks your merge and never comments for the sake of commenting. Silence is the default.

How It Works

The engine does three things in sequence.

1. The forward walk. From the files changed in the MR, the engine queries Orbit via /api/v4/orbit/query, walking IMPORTS edges forward — not "what does this file import" but "what imports this, and what imports that." That produces the transitive downstream closure across package boundaries, up to three hops. The insight isn't the edge; it's the closure — "A imports B, B imports C, and C is a year-old single-owner file" is something no single-hop tool tells you.

2. The ownership join. For each downstream node, the engine runs git log to find who last touched the file, when, and how many distinct authors have contributed. Orbit finds the at-risk nodes; git history tells you who's behind on them. Neither leg works without the other — without the graph, you don't know which files to check ownership on; without the history, the graph is a structure diagram with no human signal. Ownership-by-last-touch is a proxy, not ground truth — a reformatter commit or a license-header bump reads as a "touch" — so we treat the output as a prompt to look, not a verdict.

3. The decision. Each node is classified: stale + single-owner is a blind spot; fresh + multi-owner is healthy. If nothing in the reachable set is concerning, Downwind stays quiet. If it finds a blind spot, it posts the full graph with the flagged node highlighted and the ownership data visible. GitLab renders the Mermaid natively — no external tools.


How Downwind Uses Orbit

Orbit provides the resolved, indexed, cross-package import graph that makes the forward walk a query rather than a repo crawl. When the engine asks "what imports werkzeug.routing.rules?", Orbit returns resolved symbol-level edges across package boundaries — the kind of cross-package chain a regex import-scanner can approximate on a small repo but cannot follow reliably through re-exports, relative imports, and package indirection at scale.

The ownership join uses git log, not Orbit — Orbit nodes carry a commit_sha but no author or timestamp. The graph finds the nodes worth asking about; git answers the human question. That separation is deliberate: it keeps attribution clean (we never imply Orbit knows authorship) and it means the ownership signal works on any repo with git history, whether or not Orbit has indexed it.

The engine is hexagonal — graph sources are swappable. Orbit Remote is the primary source and the one that scales (the graph is already indexed, the walk is a query). A local DuckDB snapshot serves as a fallback and powers both the eval harness and the cold-run test below. This swappability is a design choice, not an Orbit weakness — it's what lets us test the engine against repos Orbit hasn't indexed, and it's what makes a future Orbit Local integration a one-adapter change.


What It Sees, and What It Doesn't

Most impact tools quietly inherit their graph engine's blind spots and never tell you. We'd rather list ours, because for a tool that sells confidence, silent under-reporting is the dangerous failure mode — the missed edge is the one that pages you.

Orbit's static import resolution does not currently follow dynamic imports, reflection, __init__.py re-exports, or namespace-package indirection. Downwind inherits those gaps. That means the reachable set is a lower bound on true reachability, not an upper one: when Downwind is wrong, it under-reports rather than over-warns. We'd rather be quiet-and-occasionally-incomplete than loud-and-wrong — but a reviewer deserves to know which way the error points. It points toward silence, and we say so on the comment itself.


Architecture

Downwind runs as two components, both triggered by the same MR-opened event.

The flow — a GitLab Duo custom flow published to the AI Catalog. It fires on MR-opened, reads the diff, checks Orbit health, and logs its reasoning in AI → Sessions. This is the platform-native trigger and the catalog artifact.

The CI job — installs the engine (pip install -e .), runs downwind check --mr-iid <iid>, queries Orbit live, joins git history, and posts the Mermaid comment via the GitLab API.

The split is not a design preference — it's a documented platform constraint. Custom flows can't post MR comments natively today. We could have faked a single-component story; instead we documented the seam. The flow triggers and reasons; the CI job analyzes and posts. The developer sees one seamless result: open MR → graph appears. When the platform exposes comment-posting to flows, this collapses to one component with no engine changes — the hexagonal design already isolates the reporter.

Implementation details:

  • Orbit queries are live — hitting the hosted graph on every run, not cached.
  • The walk is deterministic — same file set at same distances every run. Row ordering varies cosmetically; the closure is stable.
  • 207 tests, ruff-clean, pytest + lint on every push.

The Eval Harness

The part we're most proud of — and the part we're most careful not to oversell.

We ship a published harness with 70 scenarios across four legs:

Leg What it tests Result
Walk (24) Does the BFS produce the correct downstream set? 24/24 ✓
Ownership (16) Does the join return the right author, date, staleness? 16/16 ✓
Decision (13) Does FLAG/QUIET come out right given walk + ownership? 13/13 ✓
Ranking (17) Does the right suspect rank #1? P@1: 88%, P@3: 100%

One command reproduces every number: downwind eval.

The harness caught a real bug during development — false flags on empty input, the exact failure mode that gets these tools muted in production. We killed it before it ever reached the demo. It's a regression test now.

Where this would be theatre if we stopped here: those 70 scenarios run against committed fixtures on the repo the engine was built on. That's a correctness check, not a generalization check — it proves the engine does what we think on inputs we authored.

So we ran it cold. We vendored 19 source files from psf/requests — the most-downloaded Python library ever — into our monorepo, pushed to main, and let Orbit re-index (~1 minute). Then we pointed the engine at requests/sessions.py, a file the engine had never seen.

Cold-run result on psf/requests: 34 reachable nodes from sessions.py, depth ≤ 4. A cross-package edge emerged at depth 2: sessions.pyrequests/adapters.pywerkzeug/test.py — the engine followed the chain from Requests into Werkzeug internals without configuration. By depth 4 it had reached 7 werkzeug sansio and security modules. The graph walk generalized. Ownership join — honest limitation: because Requests was vendored in a single commit, all 10 requests/*.py files flag as single-owner — a known artifact of the test method, not an engine bug. The werkzeug files the walk reached carry our demo's seeded history, not Requests' real contributor data. The ownership join is tested on fixtures; the cold run tests the walk, not the join. 3 missed edge classes: dynamic imports (importlib.import_module), TYPE_CHECKING guards, and optional external deps. Consistent with static resolution scope; disclosed above.

(GitLab grants hackathon developers a single provisioned repo, so we vendored the target into our project and let Orbit index it on main. The walk generalization is real — 34 nodes, cross-package, against the live graph. The ownership generalization requires a repo with real multi-author history, which a single provisioned project can't provide.)

We publish where we're imperfect: ranking always finds the right answer, sometimes second instead of first. Measurable, not magical.


Challenges We Ran Into

  • Flows can't post MR comments. The flow agent can read the API and create issues but has no native MR-note tool. We split the architecture and documented it rather than faking a single component.
  • path_finding requires undocumented rel_types. Orbit's path_finding rejects calls without a rel_types parameter to bound fan-out — undocumented in the v0.73.0 recipe. We found it empirically and worked around it with BFS traversal. (Filed as a Contribute-track fix.)
  • Orbit nodes don't carry authorship. Files and Definitions have a commit_sha but no author or timestamp; MergeRequest nodes carry merged_at/author_id but no username. The ownership join required git log — honest and functional.
  • project_id filtering is inconsistent across entity types. Works for MergeRequest, silently returns 0 rows for File/Definition. Discovered by trial and error.
  • Orbit indexes only the default branch. We needed multi-author history for the demo on a single-author repo, so we seeded backdated commits on a feature branch. Orbit ignores non-default branches entirely — the demo's ownership story is a constructed fixture, and we flag it as such. The cold run above is the un-staged counterpart.

What We Learned

  • The power is in the closure, not the edge. Any tool says "A imports B." The insight that matters is "A imports B, B imports C, and C was last touched by one person a year ago." Transitive, cross-package, temporally aware.
  • Silence is a feature, and it's the hard part. A tool that comments on every MR gets muted in a week. Restraint is what earns a permanent place in the workflow.
  • Honesty about blind spots is a feature too. Listing what the graph can't see — and naming which direction the error points — separates a tool you trust from a tool you mute the first time it's confidently wrong.
  • Eval-first pays for itself, but only if you name its limits. The harness caught the empty-input bug off-camera. Running the walk cold on psf/requests — 34 nodes, cross-package, against the live graph — is what turns "we trust our engine" into "here's what it did on code it had never seen." Reporting that the ownership join couldn't be tested cold (vendored history) is what turns "we're honest" from a claim into a practice.

What's Next

  • CVE reachability (the real V2). Point the same walk backward — from a vulnerable symbol to your entrypoints — and answer the question every security team actually asks: is this CVE reachable in our code, or just sitting in our lockfile? Reachability tooling that distinguishes "present" from "reachable" is genuinely scarce, and the reverse-walk is the only new piece — the engine, eval harness, and flow architecture carry over unchanged.
  • Optional merge-check for teams that want a gate. Today Downwind is non-blocking by design and a comment can't block a merge — we won't pretend the second is the first. For teams that want to gate on a confirmed blind spot, a merge-check wrapper over the same engine output is the next build. Default stays scout; the gate is opt-in.
  • Multi-MR collision detection. Flag when another open MR touches a node in your reachable set — "you're not the only one editing into this blind spot."
  • Contribute-track fix. Document and fix the path_finding rel_types requirement upstream.

Built With

  • gitlab-ci-cd
  • gitlab-duo-flow
  • gitlab-orbit-knowledge-graph
  • mermaid-js
  • mit-license
  • pytest
  • python
  • ruff
Share this project:

Updates