Inspiration
Production incidents cost companies thousands per minute. Most response time is wasted on manual, repetitive investigation — the same steps every time. CloudGuardian automates that entire process.
What it does
CloudGuardian is a 7-agent autonomous reliability engineer. When an incident is reported, it automatically investigates via Dynatrace, triages root cause, matches historical patterns, proposes remediation options with risk scores, waits for human approval, executes the fix, and generates a full postmortem — all in under 2 minutes.
How we built it
Built with Google Cloud Agent Builder framework (ADK 2.1.0) — the official SDK for building and deploying agents on Vertex AI Agent Platform. The cloudguardian-supervisor agent is registered on Agent Platform (resource ID: 797357036370132992) and the full 7-agent system runs via ADK's runtime on Cloud Run. Dynatrace MCP Server (20+ tools: list_problems, execute_dql, list_vulnerabilities, find_entity_by_name) 7 specialized agents: Supervisor, Watcher, Triage, Learning, Remediation, Executor, Reporter OpenTelemetry + GoogleADKInstrumentor shipping traces to Dynatrace Cloud Run for hosting, nginx proxy for the web UI CloudGuardian implements Dynatrace's recommended observability pattern for Google ADK agents — using OpenTelemetry with GoogleADKInstrumentor to capture full end-to-end traces across the agentic AI stack, as documented on the Dynatrace Hub Vertex AI page. Modular Python architecture with per-agent files.
Closed loop integration
Dynatrace monitors infrastructure → CloudGuardian reads from Dynatrace via MCP → CloudGuardian fixes infrastructure → Dynatrace observes CloudGuardian via OTel. Both directions proven with live data.
Challenges
mcp module incompatibility with Python 3.13 in Vertex AI Agent Engine — solved by switching to Cloud Run with custom Dockerfile Windows CMD vs Linux binary path differences for MCP stdio transport — solved with platform detection in mcp_wrapper.py BatchSpanProcessor flushing before Cloud Run container shutdown — solved with GoogleADKInstrumentor which hooks natively into ADK
Accomplishments
Real Dynatrace MCP tool calls confirmed in ADK Events tab cloudguardian-supervisor registered on Vertex AI Agent Platform (Agent Builder deployment confirmed) Full 7-agent chain working end to end with human approval gate Distributed traces visible in Dynatrace showing every agent_run, call_llm, and execute_tool span Live web UI deployed on Cloud Run
What we learned
GoogleADKInstrumentor is the correct way to instrument ADK agents for Dynatrace — it automatically captures every tool call and model completion without manual span creation.
What's next
Real Dynatrace entity resolution for production services Slack/PagerDuty integration for approval notifications Multi-environment support (staging vs production) DukanPage integration for MSME incident monitoring.

Log in or sign up for Devpost to join the conversation.