💡 Inspiration

Microservice teams move fast — but nobody sees the full picture when a change lands. A dev renames a field, merges it, and three hours later Slack is on fire because two downstream services nobody remembered are broken in production.

We've all been there. We wanted to build something that makes the invisible visible — before the merge, not after.


🚀 What it does

Cross-Service Impact Analyzer is a GitLab Duo ambient flow that automatically detects the blast radius of any MR across your entire microservice ecosystem.

The moment a developer opens an MR, the system:

  • 📄 Reads the diff and extracts changed API endpoints and schema fields
  • 🔍 Traverses a live knowledge graph of all service dependencies
  • 🏷️ Classifies every downstream service as BREAKING, DEGRADED, or INFORMATIONAL
  • 💬 Posts a structured impact report directly as an MR comment — with per-team migration checklists, owner contacts, and Slack handles
  • 🔗 Links to an interactive D3.js blast radius visualizer showing exactly how the change propagates hop-by-hop across the system

Zero manual work. Zero Slack pings. Everything a reviewer needs, before the first human looks at the code.


🛠️ How we built it

The core insight: if you model your entire system as a graph, blast radius becomes a graph traversal problem — not a documentation problem.

Knowledge Graph

The core of the system is a typed, directed knowledge graph that models every architectural dependency across the microservice ecosystem — not just service-to-service HTTP calls, but every real-world coupling that can carry a blast radius.

Typed Nodes — everything that can be a source or target of impact:

Node Type What it represents
service A microservice or deployable unit
endpoint A single REST, gRPC, or GraphQL operation
schema A shared data contract or response model
event An async domain event on a message bus
database A data store — Postgres, MySQL, MongoDB, etc.
job A background worker, cron job, or batch process
queue A message queue, Kafka topic, or SQS stream
cache A cache layer — Redis, Memcached, etc.
config A config service, env vars, or feature flag namespace
gateway An API gateway, ingress controller, or load balancer
storage Object storage — S3, GCS, Azure Blob, etc.
external A third-party or vendor API dependency
auth An auth or identity provider — OAuth, OIDC, JWT issuer

Typed Edges — every dependency vector that carries blast radius:

Edge Type What it models
http_dependency Synchronous call between services
shared_schema Two or more services sharing the same data contract
event_producer Service emits a domain event
event_consumer Service consumes a domain event
db_write / db_read Service reads from or writes to a data store
job_triggers A service or event triggers a background job
cache_write / cache_read Service interacts with a cache layer
publishes_to / subscribes_to Service produces or consumes from a queue or stream
auth_dependency Service delegates authentication or authorization
config_reads Service reads runtime config or feature flags
gateway_routes API gateway routes external traffic to a service
calls_external Service calls a third-party API
stores_to Service reads from or writes to object storage

Each node carries owner, SLA, Slack handle, criticality, and migration notes. Each edge carries criticality, fields used, and the reason the dependency exists.

Bootstrapped automatically from OpenAPI specs via bootstrap_graph.py — no manual maintenance. Rebuilt on every merge via GitLab CI so the graph always reflects the live system.

GitLab Duo Ambient Flow — 2-Agent Chain

  • Triggers on every MR automatically — zero setup, no webhooks
  • Agent 1 diff_analyzer — calls GitLab MCP tools to extract changed endpoints and schema fields, outputs structured IMPACT_DIFF_RESULT with severity
  • Agent 2 impact_reporter — reads graph from main, runs BFS traversal, classifies every downstream node as BREAKING / DEGRADED / INFORMATIONAL
  • Posts result via create_merge_request_note — blast radius table, per-team migration checklists, live graph link

D3.js Graph Viewer

  • Force-directed graph deployed on Netlify — no backend, no server
  • Impact rendered entirely via URL params ?changed= &breaking= &degraded=
  • Hop-by-hop blast radius animation, per-node info panels with migration notes
  • Filterable sidebar — type checkboxes, quick-view presets (Infrastructure, APIs, Event flows, Data layer)

Self-Healing Graph — CI + 3-Agent Updater

  • GitLab CI rebuilds the full graph on every merge by re-parsing OpenAPI specs
  • Post-merge 3-agent pipeline for surgical updates:
    • Agent 1 — semantically analyzes what structurally changed
    • Agent 2 — computes minimal node and edge patches needed
    • Agent 3 — applies patches directly via create_commit, no full rebuild required

🧱 Challenges we ran into

  • Agent reliability — getting the Duo agent to always post the MR comment required rewriting the system prompt with explicit MANDATORY tool call instructions and a dedicated fallback section. Prompt engineering for agentic flows is closer to system design than software development
  • Cross-branch file reading — the ambient flow was defaulting to the MR source branch when reading graph.json, causing false "graph not bootstrapped" errors. Fixed by explicitly forcing ref=main in the agent prompt
  • Blast radius accuracy — distinguishing directly broken services from transitively degraded ones required careful hop-level BFS classification, not just reachability
  • Graph freshness at scale — a full CI rebuild works now, but patching only affected nodes on each merge without a full rebuild is the real challenge we're still solving

🏆 Accomplishments that we're proud of

  • ✅ A fully working end-to-end ambient flow that triggers on real MRs and posts real, actionable output — no manual steps, no human in the loop
  • Hop-by-hop blast radius animation that makes the propagation of impact genuinely intuitive — you watch the blast travel the graph in real time
  • ✅ The knowledge graph stays current automatically — rebuilt from live OpenAPI specs on every merge
  • ✅ The interactive viewer works from any URL with zero backend — just URL params, embedded graph data, and D3.js

📚 What we learned

  • GitLab Duo ambient flows are powerful but require very precise prompt engineering — the agent needs explicit tool call instructions, not just descriptions of what you want
  • Knowledge graphs are a natural fit for dependency problems — once the graph exists, the blast radius logic is straightforward; the hard part is keeping the graph accurate
  • The gap between "it works" and "it works reliably in a demo" is large — we spent as much time on prompt robustness as on the core features

🔮 What's next for Cross-Service Impact Analyzer

Feature Description
🤖 Semantic graph updater AI agent reads the diff, understands what changed semantically, and patches only affected nodes via create_commit — no full rebuild
🌐 Multi-repo support Extend the graph across multiple GitLab repositories, not just services in one repo
👥 Auto-suggested reviewers Automatically add owners of breaking services as required approvers on the MR
📊 SLA-aware impact scoring Weight blast radius by service SLA and criticality — breaking a 99.99% service scores higher than breaking an internal tool
💬 Slack integration Post impact summaries directly to the owning team's channel the moment a breaking change is detected upstream

Built With

Share this project:

Updates