Inspiration

Look at what people point a code graph at and it is almost always the same thing: review the diff. Blast-radius, dead-code, lint-the-change. That is the obvious use. It also leaves the harder half of the job untouched: actually understanding the code. A new contributor clones a repo with hundreds of definitions and has no idea where to start. A reviewer opens an unfamiliar MR and has to reverse-engineer, from the diff alone, the existing code the change depends on. The README says what the project does, never what to read first or in what order. That information already lives in the call graph. It just never gets extracted and handed to the human who needs it.

What it does

Cartographer turns GitLab Orbit's code graph into a reading order. It runs in two modes.

MR-scoped onboarding (the new capability): on every merge request it maps the changed files to their definitions, walks UPSTREAM along real CALLS edges from those symbols, and posts a comment titled "To review this change, read these in order." Each entry is a file:line plus a one-line reason, depth-ordered so the closest dependencies come first. The reviewer stops guessing the change from the diff and reads the handful of existing definitions it actually rests on. This is comprehension, not review, and it is the thing no diff-reviewer produces.

Repo-wide onboarding: point it at any repo and it writes a polished ONBOARDING.md: the entry points, the five most-depended-on definitions (ranked by fan-in, with file:line, a reason, and a snippet), where the tests live and what they exercise, and a dependency-ordered reading path from the entry points outward. It is the orientation a good maintainer would write by hand for every newcomer if they had the time.

How we built it

Orbit is the engine. orbit index parses the repo into a local DuckDB code graph (definitions, files, typed edges: CALLS, DEFINES, IMPORTS, CONTAINS, EXTENDS). The interesting queries use relationships, not rows: anchors aggregate inbound CALLS to rank by fan-in; the reading path is a recursive BFS over CALLS with a cycle guard; the MR mode seeds that same BFS from the diff's symbols and walks upstream. Every gl_edge join scopes BOTH endpoints to one commit and branch, so multi-repo graphs never leak cross-repo edges. The MR integration is a GitLab CI job on merge_request_event pipelines that indexes the branch and posts the guide via the REST API, editing the same note in place on re-runs. It runs on Free-tier CI with one project token, no Duo seat. It also ships as an Orbit Skill, so any MCP agent (Claude Code, Cursor, Codex) can run the same queries with no local script. Pure Python standard library, plus orbit v0.78.

Challenges we ran into

The graph stores no source text in v0.78, so snippets are read from disk by file and line. Multi-repo graphs share one database, so every edge query has to scope both sides or cross-repo CALLS leak in. Library repos with no main need the reading path seeded from the top anchors. The reading-path query had to filter generic names (a definition literally called string) so the headline output stays useful. MR changes that are self-contained, in docs, or in an unsupported language are detected and answered with a clear "review the diff directly" rather than an empty guide.

Accomplishments that we're proud of

A live, working MR integration that is genuinely novel: orienting the reviewer to a change instead of reviewing it. Proven end to end on a real merge request (a change to billing.py auto-produced the reading path verify_user and write_row, then hash_password and fetch_row, the exact existing code needed to review it). The repo-wide guide is language-agnostic with zero per-language config, demonstrated on a Go CLI library (cobra, 343 definitions) and a TypeScript HTTP client (ky). Plain Markdown plus an MCP skill, so it composes into CI, editor agents, and onboarding docs.

What we learned

Ranking code by importance is common; ordering the actual reading sequence along the call graph is not, and it is exactly what a one-shot LLM cannot fake without the graph. Most of what a human needs to understand a codebase or a change is already latent in the edges. The value is in extraction and honest presentation, not new analysis.

What's next for Onboarding Cartographer

A CI gate that keeps ONBOARDING.md fresh on every release. "Onboard me to this feature," seeding the path from a symbol the contributor names. Per-language entry-point detection (HTTP routes, CLI command registration). Surfacing both guides on the GitLab project page and the MR for first-time contributors.

Built With

Share this project:

Updates