Inspiration
It's 2:57 AM. The pager goes off, checkout is throwing errors, the deploy was six hours ago, and the person on call didn't write that code. The hard part was never typing the fix — it was finding it, proving it, and daring to ship it at 3 AM. Every "AI fixes your code" demo I'd seen skipped straight to the diff and asked you to trust it. In production, trust is the one thing a robot hasn't earned. What would it take to trust a robot's fix at 3 AM — without taking its word for anything?
What it does
Darn is an agent that mends production — and proves every step with an artifact you can independently re-run.
- Davis detects, before the agent moves. Darn polls
query-problemson the Dynatrace MCP gateway. The problem is raised by Dynatrace's AI, not self-declared by the agent. Receipt: the problem ID and deep link. - Grail forensics. Budgeted
execute-dqlqueries pull failure rate by endpoint, the exception and stack frames, and the onset timestamp. Onset is intersected with CI deployment markers and the commit diff (GitHub compare API) to blame the exact hunk. Receipt: every DQL block, copy-pasteable — paste it into the tenant and get the same numbers. - Gemini writes the fix. Gemini on Vertex AI is briefed with the receipts and writes a minimal diff. The PR body is an evidence dossier: problem ID, DQL receipts, trace excerpt, the blamed hunk, the fix.
- A human approves. Always. Darn never merges by itself. The "Darn can merge" toggle in settings is locked off — it exists to show you it's off.
- Dynatrace is the referee. "Fixed" means the previously failing request now succeeds, error rate recovered via DQL, and the Davis problem closed. Darn posts that closure evidence back onto the PR.
- The medic wears a heart monitor. The agent's own OpenTelemetry traces — every tool call, every token — ship to the same Dynatrace tenant that watches the app it heals. Audit the agent like you audit the app.
The demo path is a live sock shop ("Loose Threads") with synthetic traffic and a big amber button: Tear a hole in it — four genuinely different pre-authored sabotages shipped as real commits to the public repo. The real path is Use it on yours: connect your own tenant and repo, and Darn watches your Davis problems and opens PRs with the same receipts.
How we built it
One repo, three Cloud Run services:
server/— Python/FastAPI. The agent pipeline (detect → diagnose → fix → PR → verify) built on Google ADK, talking to the Dynatrace-hosted MCP gateway over streamable HTTP with a platform token, Gemini on Vertex AI for the diff, the GitHub API for commits/compares/PRs, Firestore for incident records, Secret Manager for BYO tokens. Live updates stream over SSE.web/— Vite + React. Five pages; the soul is the incident view: a receipt ledger where each pipeline stage hangs on a stitched thread and expands to the artifact that proves it.shop/+trafficbot/— the patient: a real FastAPI sock shop, OpenTelemetry-instrumented into Dynatrace, plus steady synthetic shoppers so Davis has signal.shop/defects/holds the four sabotage patches that the tear button ships through the real CI.
Challenges
- Honesty is an engineering problem. The product rule was: anything that can't be real is removed, never faked. That meant typed scope errors from the MCP client, health cards that say exactly what's missing, a detection stage that only completes when the real Davis problem arrives — no fake spinners that resolve on a timer.
- Trial-tenant token scopes. Platform-token scopes (Grail bucket reads) and ingest-token provisioning on a fresh trial tenant are fiddly; the build treats every missing scope as a first-class, visible state instead of a crash — the live deployment tells you precisely which scope it's waiting on.
- Davis needs signal, not vibes. OTel-only ingest with low traffic can be too quiet for anomaly detection — hence the traffic bot's steady request rate and defects designed to move exactly the metrics Davis watches (error rate on checkout/pay/inventory, response time on catalog).
- A public demo that strangers can't wreck. One incident at a time, a "needle" that passes to spectators if the tearer walks away, cooldowns, and auto-revert keeping the public repo tidy between incidents.
Accomplishments
The receipt ledger. Every claim in a Darn PR — "failure rate spiked here", "first failure was 38 seconds after this deploy", "this hunk did it" — is an artifact a human can independently verify, most of them DQL blocks you can paste into Dynatrace and watch return the same numbers. And the strip-test holds: remove Dynatrace and Darn loses its detector, its evidence language, its definition of done, and its own audit trail. It isn't plumbing — it's the referee.
What we learned
"Fixed" is a claim about the world, not about the code. Letting Dynatrace — not the agent — decide when the problem is closed changed the architecture: verification became replay + recovery DQL + waiting for the Davis problem to transition, and the agent got humbler and more trustworthy at the same time.
What's next
Multi-service blame (cascading failures), the GitHub App flow for zero-PAT installs, and incident notebooks — a shareable Dynatrace notebook generated per mend.
Built With
- cloud-build
- davis-ai
- dql
- dynatrace
- dynatrace-mcp
- fastapi
- firestore
- gemini
- github-actions
- github-api
- google-adk
- google-cloud-run
- model-context-protocol
- opentelemetry
- python
- react
- secret-manager
- typescript
- vertex-ai
- vite
Log in or sign up for Devpost to join the conversation.