-
-
A plain-English goal flows through three Duo agents, with Orbit as the live context layer and a pluggable engine producing the diff
-
How a campaign actually feels: mention Marshal, approve, start each wave. Marshal proposes, a human approves, Marshal acts.
-
Every repo is a state machine that must reach merged or waived; a blocked repo escalates instead of going silent.
-
Drop in a custom engine for an org-specific change and inherit discovery, ordering, tracking, and escalation for free.
-
A blocked repo resolves to a real owner and lands in their queue with a reason.
-
Marshal's four-beat spine. Every campaign runs DISCOVER → ANALYZE → LAND → FINISH, with a human approval gate between each beat.
Marshal
Declare the change. Every repo reaches done — even the ones nobody owns.
Marshal is a completion platform for org-wide code change on the GitLab Duo Agent Platform. You bring the change — a Java upgrade, a CVE patch, ripping out a deprecated internal SDK. Marshal discovers the affected fleet from GitLab Orbit, sequences it so nothing breaks downstream, lands a real merge request per repo, and drives every single one to a terminal state — tracked on a live dashboard that drains to zero, with a human approval gate at every step.
The change is pluggable. The finishing is the product.
3 agents. Wave-ordered rollout. A completion ledger that never loses a repo.
Inspiration
Here's a story every platform team knows.
Java 17 hits end-of-life. Leadership declares it: "We're moving the whole org to 21 this quarter." Sprint one, the energy is real — 80% of the repos move in two weeks. Everyone celebrates.
A year later, the org is still on 17.
Not because the work was hard. Because the last 20% had no forcing function. A shared auth library that three services depend on never got re-tested, so nobody dared touch it. A payments service whose only owner left in March. A reporting job with a build so flaky no one wanted to be the person who broke it. Each one waited on a human who was never assigned. And somewhere in month three, a consumer service got upgraded before the library it depends on — and broke production for an afternoon.
Now swap "Java 17 EOL" for any fleet-wide change — a CVE you must patch everywhere, a deprecated internal SDK you're retiring, a CI standard you're enforcing. The shape is identical, and it stalls in the same two places:
- Ordering. You change a consumer before its shared library, and the build breaks downstream.
- The long tail. Nobody chases the stragglers, so the effort stalls at "80% done" forever.
Plenty of tools generate the diff. None of them finish the job — across a whole fleet, in the right order, chasing every straggler to a human who can close it. So we built Marshal.
What It Does
Marshal takes an org-wide code change from a sentence to a drained roster. It runs one lifecycle — DISCOVER -> ANALYZE -> LAND -> FINISH — regardless of what the change actually is, and gets out of the way at every approval gate.
| The moment | What Marshal does |
|---|---|
You mention @marshal migrate ... on an issue, in plain English |
Picks the engine for the change, checks Orbit health, discovers the exact affected fleet, reads build files for context |
| Fleet discovered | Scores per-repo risk, computes dependency-aware rollout waves, posts an analysis comment + live dashboard link |
You reply /approve |
Generates a work-items preview: per repo, the issue title, MR branch, risk score, and pre-flight signals |
You reply start phase 1 |
Lands Wave 1 — one Issue + one real MR per repo, each with an actual diff from the engine |
| The engine returns low confidence | AI gap-analysis reads the diff and posts an advisory review comment on what still needs a human |
| An MR merges | The ledger row flips to merged automatically via Closes #N — no bookkeeping |
| A repo gets stuck | Auto-escalates: resolves the right human, assigns + @mentions them with the precise failure reason |
You reply /status |
A live MR status table across all waves, with pipeline signals and blockers |
| The campaign ends | Every repo sits at merged or waived. The roster is at zero. |
The lifecycle never changes. Only the engine that produces the diff does.
Some Changes Marshal Can Drive
| Change | Engine | Status |
|---|---|---|
| Java 8/17 -> 21 + Spring Boot 3 | OpenRewrite (UpgradeToJava21) |
Built — the reference engine, demoed end-to-end |
| Patch a CVE across the fleet (e.g. Log4Shell) | cve_bump |
Reference stub — shows the contract |
| Enforce or standardize a CI template | ci_template |
Reference stub |
| Bump a shared dependency org-wide | dep_bump |
Reference stub |
| Retire a deprecated internal SDK | your custom engine | Bring your own — interface below |
| Anything with no deterministic tool | ai_authored fallback |
Reference stub — routes to review |
A Worked Example: Java 17 -> 21
A six-repo Java fleet: platform-commons, auth-core, api-gateway, user-service, payments-api, reporting-service. One sentence in.
Marshal discovers all six from Orbit, reads their build files, and orders them so the shared libraries change first:
Wave 1: platform-commons, auth-core <- most depended-on; change first
Wave 2: api-gateway, user-service <- depend on Wave 1
Wave 3: payments-api, reporting-service <- depend on Wave 2
Then it lands one real MR per repo. On screen, you watch two outcomes that, together, are the entire thesis:
A repo finishes clean. OpenRewrite's UpgradeToJava21 produces a deterministic, compiling diff. The MR merges. The ledger row flips to merged. The dashboard bar advances.
A repo hits the long tail — and doesn't vanish. The change leaves residue the engine can't fully resolve. Marshal catches it and routes it to a person:
Marshal on
payments-api/!6: The engine migrated 41 files to Java 21 — build is green on the JDK bump. But the Spring Boot 3 step is incomplete:SecurityConfig.java:23still importsjavax.servlet, which Spring Boot 3 replaced withjakarta.servlet. 2 tests fail as a result. This repo has no CODEOWNERS. From Orbit's authorship graph, the last three committers topom.xmlare @priya, @marco, @devansh. Assigning @priya — you committed the Maven config 6 weeks ago. Markingblocked, not silent.
The clean ones finish themselves. The hard ones land in a named person's queue today, with a reason attached — instead of lingering for a quarter. Either way, the roster drains to zero.
That whole flow — discovery, ordering, landing, the clean-vs-escalated split, the drain to zero — is engine-agnostic. Java is the example. It is not the limit.
Bring Your Own Engine
Marshal's transform engine is a pluggable interface. An engine answers one question — what's the diff for this repo? — and returns a branch, a diff, and a confidence level. Everything else is Marshal's job.
Here's a custom engine for an org-specific change — retiring a deprecated internal SDK fleet-wide:
from marshal.engines.base import TransformEngine, Transform
class RetireAcmeLoggingSDK(TransformEngine):
"""Org-specific: replace the deprecated acme-logging SDK across the fleet."""
name = "retire-acme-logging"
def applies_to(self, repo) -> bool:
return repo.declares("com.acme:acme-logging")
def plan(self, repo) -> Transform:
diff = repo.codemod(replace="com.acme.logging",
with_="com.acme.observability")
return Transform(
branch=f"marshal/{self.name}",
diff=diff,
confidence="high" if diff.compiles() else "review",
)
Register it, point Marshal at a one-line goal, and you get the entire completion machinery for free:
- Orbit fleet discovery — which repos the change touches
- Risk scoring + dependency-aware wave ordering
- One Issue + one MR per repo, landed in waves
- The completion ledger — every repo drives to
mergedorwaived - The live dashboard and the auto-escalation for anything that stalls
You write the one method that produces the diff. Engines that can't guarantee a clean result return confidence="review", which routes them through the same human gate and escalation path as everything else — so even a best-effort or AI-authored change is safe to run fleet-wide.
How We Built It
Three agents over deterministic steps, on GitLab Duo. A coordinator drives the lifecycle; specialists reason about the fleet:
| Agent / step | Role |
|---|---|
| Coordinator | Parses the goal, selects the transform engine, runs the phase, gates on human approval |
| Analyzer (reasoning) | Risk scoring, dependency-aware wave ordering, impact summaries |
| Migrator (per repo) | Lands the Issue + MR, runs gap-analysis on the resulting diff |
| Build-file reader (deterministic) | Versions + declared deps from pom.xml / build.gradle — stated openly, not Orbit |
| Ledger (deterministic) | Epic + one work item per repo; state machine; Closes #N auto-close |
| Escalation (deterministic) | CODEOWNERS -> recent committers (Orbit AUTHORED) -> assign + @mention |
Orbit is the context layer; the agent reasons. Marshal uses Orbit live for fleet discovery (File.language -> Project), ownership and authorship (MEMBER_OF, AUTHORED), in-repo blast radius (ImportedSymbol), and MR/pipeline state.
The transform engine is the platform's seam. Every engine returns the same contract — a branch with a real diff and a confidence level — so the ledger, dashboard, and escalation never change no matter what the change is. OpenRewrite for Java/JVM is the fully-built reference engine; the CVE-bump, CI-template, dep-bump, and AI-authored-fallback engines are reference implementations of the same contract; and anyone can drop in a custom engine for an org-specific change.
Productionized CI/CD. The pipeline runs validate (YAML + agent-config schema + LICENSE presence) -> test (pytest with coverage) -> catalog-size enforcement on the published agent definition -> AI Catalog publish on tag -> GitLab Pages deploy -> campaign-state sync. [N] passing tests cover the build-file parser, the wave-ordering sort, the ledger state machine, the escalation owner-resolution, and the dashboard renderer.
Claude powers the reasoning steps — order inference, residue detection, fix-vs-escalate, and the human-readable rationale on every issue and MR.
Challenges We Ran Into
- Orbit has no cross-repo code edges. Definition IDs are content-hashed and scoped per project + branch, so the same symbol in two repos has different IDs — Orbit genuinely can't topologically sort a fleet by code dependencies. We moved ordering into agent inference over declared build-manifest dependencies, which is the more authoritative signal anyway.
- Keeping the platform engine-agnostic. The temptation was to special-case Java everywhere. We held the line: the ledger, dashboard, and escalation know nothing about OpenRewrite — they only see a branch, a diff, and a confidence level. That discipline is what makes a custom engine a one-file change.
- Custom CI/CD variables aren't available inside Duo Workflow flows. Only runtime-injected variables are present, so all configuration derives from those — no PAT, no manual variable setup.
- Non-deterministic changes. Some changes (a framework major-version jump, an AI-authored diff) can't promise a clean result. Rather than exclude them, we made
confidence="review"a first-class outcome that routes through the same gate and escalation as everything else. - Maven isn't always in the runner. The OpenRewrite engine attempts a direct clone -> run -> push in the flow sandbox, with a fallback to committing the recipe as a CI job when Maven is unavailable — so the diff still lands.
Accomplishments We're Proud Of
- A change that finishes — a complete roster draining to a terminal state, not a status report that assesses and walks away.
- A real platform seam, not a script. One-method engines; the hard machinery built once and reused for any change.
- Real diffs, not tickets. Every MR carries an actual change from the engine.
- No repo left behind. Blocked rows are never silent; they're always in a named person's queue with a reason.
- Honest scoping that survives a technical judge. Every claim about Orbit is one Orbit can actually back.
What We Learned
- Fleet-wide changes don't fail on the diff. They fail on orchestration with context — which repos, in what order, owned by whom, and chasing the ones that stall. That's exactly the gap a context-aware agent platform closes.
- The long tail is a routing problem, not a technical one. The fix isn't a smarter recipe; it's resolving the right human and handing them the repo with the reason attached.
- The right abstraction is the change, not the language. Once the completion machinery stopped caring what the diff was, "what else can this drive?" answered itself.
What's Next for Marshal
- More built engines behind the existing interface — JS/TS via jscodeshift, Python via libCST, Go via gopls.
- A small library of org-ready playbooks: Log4Shell CVE patching, CI-template enforcement, dependency upgrades, internal-SDK retirements.
- Richer pre-flight risk from Orbit's security domain (
Finding,Vulnerability). - Cross-campaign memory: a durable record of which repos waived, why, and who owns them.
Built With
GitLab Duo Agent Platform · GitLab Orbit · OpenRewrite · GitLab Pages · GitLab Files API · native GitLab Epics / Work Items / Merge Requests · Anthropic Claude · Python · Maven
MIT licensed. Published to the GitLab AI Catalog.
Built With
- claude
- gitlab
- openrewrite
- python
Log in or sign up for Devpost to join the conversation.