Marshal

Declare the change. Every repo reaches done — even the ones nobody owns.

Marshal is a completion platform for org-wide code change on the GitLab Duo Agent Platform. You bring the change — a Java upgrade, a CVE patch, ripping out a deprecated internal SDK. Marshal discovers the affected fleet from GitLab Orbit, sequences it so nothing breaks downstream, lands a real merge request per repo, and drives every single one to a terminal state — tracked on a live dashboard that drains to zero, with a human approval gate at every step.

The change is pluggable. The finishing is the product.

3 agents. Wave-ordered rollout. A completion ledger that never loses a repo.


Inspiration

Here's a story every platform team knows.

Java 17 hits end-of-life. Leadership declares it: "We're moving the whole org to 21 this quarter." Sprint one, the energy is real — 80% of the repos move in two weeks. Everyone celebrates.

A year later, the org is still on 17.

Not because the work was hard. Because the last 20% had no forcing function. A shared auth library that three services depend on never got re-tested, so nobody dared touch it. A payments service whose only owner left in March. A reporting job with a build so flaky no one wanted to be the person who broke it. Each one waited on a human who was never assigned. And somewhere in month three, a consumer service got upgraded before the library it depends on — and broke production for an afternoon.

Now swap "Java 17 EOL" for any fleet-wide change — a CVE you must patch everywhere, a deprecated internal SDK you're retiring, a CI standard you're enforcing. The shape is identical, and it stalls in the same two places:

  1. Ordering. You change a consumer before its shared library, and the build breaks downstream.
  2. The long tail. Nobody chases the stragglers, so the effort stalls at "80% done" forever.

Plenty of tools generate the diff. None of them finish the job — across a whole fleet, in the right order, chasing every straggler to a human who can close it. So we built Marshal.


What It Does

Marshal takes an org-wide code change from a sentence to a drained roster. It runs one lifecycle — DISCOVER -> ANALYZE -> LAND -> FINISH — regardless of what the change actually is, and gets out of the way at every approval gate.

The moment What Marshal does
You mention @marshal migrate ... on an issue, in plain English Picks the engine for the change, checks Orbit health, discovers the exact affected fleet, reads build files for context
Fleet discovered Scores per-repo risk, computes dependency-aware rollout waves, posts an analysis comment + live dashboard link
You reply /approve Generates a work-items preview: per repo, the issue title, MR branch, risk score, and pre-flight signals
You reply start phase 1 Lands Wave 1 — one Issue + one real MR per repo, each with an actual diff from the engine
The engine returns low confidence AI gap-analysis reads the diff and posts an advisory review comment on what still needs a human
An MR merges The ledger row flips to merged automatically via Closes #N — no bookkeeping
A repo gets stuck Auto-escalates: resolves the right human, assigns + @mentions them with the precise failure reason
You reply /status A live MR status table across all waves, with pipeline signals and blockers
The campaign ends Every repo sits at merged or waived. The roster is at zero.

The lifecycle never changes. Only the engine that produces the diff does.


Some Changes Marshal Can Drive

Change Engine Status
Java 8/17 -> 21 + Spring Boot 3 OpenRewrite (UpgradeToJava21) Built — the reference engine, demoed end-to-end
Patch a CVE across the fleet (e.g. Log4Shell) cve_bump Reference stub — shows the contract
Enforce or standardize a CI template ci_template Reference stub
Bump a shared dependency org-wide dep_bump Reference stub
Retire a deprecated internal SDK your custom engine Bring your own — interface below
Anything with no deterministic tool ai_authored fallback Reference stub — routes to review

A Worked Example: Java 17 -> 21

A six-repo Java fleet: platform-commons, auth-core, api-gateway, user-service, payments-api, reporting-service. One sentence in.

Marshal discovers all six from Orbit, reads their build files, and orders them so the shared libraries change first:

Wave 1: platform-commons, auth-core        <- most depended-on; change first
Wave 2: api-gateway, user-service          <- depend on Wave 1
Wave 3: payments-api, reporting-service    <- depend on Wave 2

Then it lands one real MR per repo. On screen, you watch two outcomes that, together, are the entire thesis:

A repo finishes clean. OpenRewrite's UpgradeToJava21 produces a deterministic, compiling diff. The MR merges. The ledger row flips to merged. The dashboard bar advances.

A repo hits the long tail — and doesn't vanish. The change leaves residue the engine can't fully resolve. Marshal catches it and routes it to a person:

Marshal on payments-api/!6: The engine migrated 41 files to Java 21 — build is green on the JDK bump. But the Spring Boot 3 step is incomplete: SecurityConfig.java:23 still imports javax.servlet, which Spring Boot 3 replaced with jakarta.servlet. 2 tests fail as a result. This repo has no CODEOWNERS. From Orbit's authorship graph, the last three committers to pom.xml are @priya, @marco, @devansh. Assigning @priya — you committed the Maven config 6 weeks ago. Marking blocked, not silent.

The clean ones finish themselves. The hard ones land in a named person's queue today, with a reason attached — instead of lingering for a quarter. Either way, the roster drains to zero.

That whole flow — discovery, ordering, landing, the clean-vs-escalated split, the drain to zero — is engine-agnostic. Java is the example. It is not the limit.


Bring Your Own Engine

Marshal's transform engine is a pluggable interface. An engine answers one question — what's the diff for this repo? — and returns a branch, a diff, and a confidence level. Everything else is Marshal's job.

Here's a custom engine for an org-specific change — retiring a deprecated internal SDK fleet-wide:

from marshal.engines.base import TransformEngine, Transform

class RetireAcmeLoggingSDK(TransformEngine):
    """Org-specific: replace the deprecated acme-logging SDK across the fleet."""
    name = "retire-acme-logging"

    def applies_to(self, repo) -> bool:
        return repo.declares("com.acme:acme-logging")

    def plan(self, repo) -> Transform:
        diff = repo.codemod(replace="com.acme.logging",
                            with_="com.acme.observability")
        return Transform(
            branch=f"marshal/{self.name}",
            diff=diff,
            confidence="high" if diff.compiles() else "review",
        )

Register it, point Marshal at a one-line goal, and you get the entire completion machinery for free:

  • Orbit fleet discovery — which repos the change touches
  • Risk scoring + dependency-aware wave ordering
  • One Issue + one MR per repo, landed in waves
  • The completion ledger — every repo drives to merged or waived
  • The live dashboard and the auto-escalation for anything that stalls

You write the one method that produces the diff. Engines that can't guarantee a clean result return confidence="review", which routes them through the same human gate and escalation path as everything else — so even a best-effort or AI-authored change is safe to run fleet-wide.


How We Built It

Three agents over deterministic steps, on GitLab Duo. A coordinator drives the lifecycle; specialists reason about the fleet:

Agent / step Role
Coordinator Parses the goal, selects the transform engine, runs the phase, gates on human approval
Analyzer (reasoning) Risk scoring, dependency-aware wave ordering, impact summaries
Migrator (per repo) Lands the Issue + MR, runs gap-analysis on the resulting diff
Build-file reader (deterministic) Versions + declared deps from pom.xml / build.gradle — stated openly, not Orbit
Ledger (deterministic) Epic + one work item per repo; state machine; Closes #N auto-close
Escalation (deterministic) CODEOWNERS -> recent committers (Orbit AUTHORED) -> assign + @mention

Orbit is the context layer; the agent reasons. Marshal uses Orbit live for fleet discovery (File.language -> Project), ownership and authorship (MEMBER_OF, AUTHORED), in-repo blast radius (ImportedSymbol), and MR/pipeline state.

The transform engine is the platform's seam. Every engine returns the same contract — a branch with a real diff and a confidence level — so the ledger, dashboard, and escalation never change no matter what the change is. OpenRewrite for Java/JVM is the fully-built reference engine; the CVE-bump, CI-template, dep-bump, and AI-authored-fallback engines are reference implementations of the same contract; and anyone can drop in a custom engine for an org-specific change.

Productionized CI/CD. The pipeline runs validate (YAML + agent-config schema + LICENSE presence) -> test (pytest with coverage) -> catalog-size enforcement on the published agent definition -> AI Catalog publish on tag -> GitLab Pages deploy -> campaign-state sync. [N] passing tests cover the build-file parser, the wave-ordering sort, the ledger state machine, the escalation owner-resolution, and the dashboard renderer.

Claude powers the reasoning steps — order inference, residue detection, fix-vs-escalate, and the human-readable rationale on every issue and MR.


Challenges We Ran Into

  • Orbit has no cross-repo code edges. Definition IDs are content-hashed and scoped per project + branch, so the same symbol in two repos has different IDs — Orbit genuinely can't topologically sort a fleet by code dependencies. We moved ordering into agent inference over declared build-manifest dependencies, which is the more authoritative signal anyway.
  • Keeping the platform engine-agnostic. The temptation was to special-case Java everywhere. We held the line: the ledger, dashboard, and escalation know nothing about OpenRewrite — they only see a branch, a diff, and a confidence level. That discipline is what makes a custom engine a one-file change.
  • Custom CI/CD variables aren't available inside Duo Workflow flows. Only runtime-injected variables are present, so all configuration derives from those — no PAT, no manual variable setup.
  • Non-deterministic changes. Some changes (a framework major-version jump, an AI-authored diff) can't promise a clean result. Rather than exclude them, we made confidence="review" a first-class outcome that routes through the same gate and escalation as everything else.
  • Maven isn't always in the runner. The OpenRewrite engine attempts a direct clone -> run -> push in the flow sandbox, with a fallback to committing the recipe as a CI job when Maven is unavailable — so the diff still lands.

Accomplishments We're Proud Of

  • A change that finishes — a complete roster draining to a terminal state, not a status report that assesses and walks away.
  • A real platform seam, not a script. One-method engines; the hard machinery built once and reused for any change.
  • Real diffs, not tickets. Every MR carries an actual change from the engine.
  • No repo left behind. Blocked rows are never silent; they're always in a named person's queue with a reason.
  • Honest scoping that survives a technical judge. Every claim about Orbit is one Orbit can actually back.

What We Learned

  • Fleet-wide changes don't fail on the diff. They fail on orchestration with context — which repos, in what order, owned by whom, and chasing the ones that stall. That's exactly the gap a context-aware agent platform closes.
  • The long tail is a routing problem, not a technical one. The fix isn't a smarter recipe; it's resolving the right human and handing them the repo with the reason attached.
  • The right abstraction is the change, not the language. Once the completion machinery stopped caring what the diff was, "what else can this drive?" answered itself.

What's Next for Marshal

  • More built engines behind the existing interface — JS/TS via jscodeshift, Python via libCST, Go via gopls.
  • A small library of org-ready playbooks: Log4Shell CVE patching, CI-template enforcement, dependency upgrades, internal-SDK retirements.
  • Richer pre-flight risk from Orbit's security domain (Finding, Vulnerability).
  • Cross-campaign memory: a durable record of which repos waived, why, and who owns them.

Built With

GitLab Duo Agent Platform · GitLab Orbit · OpenRewrite · GitLab Pages · GitLab Files API · native GitLab Epics / Work Items / Merge Requests · Anthropic Claude · Python · Maven


MIT licensed. Published to the GitLab AI Catalog.

Built With

Share this project:

Updates