WordPress Health Guardian
What it does
WordPress Health Guardian is an AI-powered agent that monitors and analyzes WordPress site health. Given any WordPress URL, it runs comprehensive checks — uptime, SSL certificates, DNS resolution, and WordPress-specific endpoints — while simultaneously querying Dynatrace for active problems and monitored entities. A Gemini 2.5 Flash agent (via Vertex AI) analyzes all the collected data and generates a structured health report with a score, issue summary, and actionable recommendations. Results are persisted to Firestore for trend analysis, and the entire application is instrumented with OpenTelemetry shipping traces to Dynatrace.
How we built it
The application is built on Google's Agent Development Kit (ADK) with a FastAPI web server deployed on Cloud Run. The agent uses a custom VertexAIGemini model class that forces the Vertex AI backend on GCP, running gemini-2.5-flash in us-central1. The agent has ten function tools covering health checks, Dynatrace queries, report generation, and GCP config detection.
For the Dynatrace integration, we implemented a dual approach:
- The official
@dynatrace-oss/dynatrace-mcp-serverconnected via ADK'sMcpToolsetfor deep observability queries - Direct Dynatrace API v2 calls using a classic token with
problems.readandentities.readscopes for reliable data access
For observability of the agent itself, we added OpenTelemetry instrumentation with an OTLP HTTP exporter that ships traces to Dynatrace's /api/v2/otlp/v1/traces endpoint, using a dedicated token with the openTelemetryTrace.ingest scope.
Google Cloud services used:
- Vertex AI — Gemini 2.5 Flash model inference
- Firestore — Health check history persistence
- Secret Manager — Four secrets (Dynatrace MCP token, classic API token, OTel token, scheduler auth)
- Cloud Scheduler — Weekly automated health checks (Mondays 8AM UTC)
- Cloud Run — Serverless container hosting
- Artifact Registry — Container image storage
- Cloud Build — CI/CD pipeline
Challenges we ran into
Vertex AI rate limits: The free tier for gemini-2.5-flash has a 2-requests-per-minute limit on certain quotas, causing frequent 429 errors. We solved this by adding a try/except wrapper around the agent call that falls back to direct health checks when the model is unavailable, ensuring users always receive a report.
ADK data serialization: The ADK framework's tool call mechanism was passing Python repr strings (single quotes, True/False booleans) instead of JSON to the report generation function. We fixed this by changing parameter types from str to dict and adding a _safe_parse() fallback that tries json.loads first, then ast.literal_eval.
Dynatrace MCP compatibility: The MCP server failed on Cloud Run's Linux environment because the initial npx.cmd command is Windows-specific. We fixed it by switching to npx (without .cmd), which works on both platforms.
OpenTelemetry configuration: The initial OTel setup used the platform token (lacking OTLP ingest scope) and had an incorrect endpoint URL format. We created a dedicated classic token with openTelemetryTrace.ingest scope and normalized the endpoint URL to use the live.dynatrace.com domain.
Accomplishments we're proud of
- A fully functional multi-service architecture spanning 7 GCP services + Dynatrace
- Graceful fallback behavior when any component (Vertex AI, Dynatrace) is unavailable
- Real Dynatrace API integration returning live problem and entity data (not just "MCP available" placeholders)
- OpenTelemetry traces shipping to Dynatrace for agent-level observability
- Clean, responsive web UI with markdown-rendered health reports and four dedicated tabs
What we learned
- Building agents with Google ADK requires careful attention to tool function signatures — the framework serializes arguments in specific formats and mismatches cause silent failures
- Dynatrace's platform token and classic tokens serve different purposes and scopes; you need at least three distinct tokens for full integration (MCP, API v2 data, and OTel ingest)
- Vertex AI free tier rate limits (2 RPM for
gemini-2.5-flash) are significantly more restrictive than the equivalent Gemini API limits, making fallback logic essential for any production-like deployment - Cloud Run's Linux environment differs from local Windows development in subtle ways (binary names, path resolution, npm modules with native dependencies)
Built With
- artifact-registry
- cloud-build
- cloud-run
- cloud-scheduler
- css
- dynatrace-api-v2
- dynatrace-mcp-server
- fastapi
- firestore
- gemini-2.5-flash
- google-adk
- html5
- javascript
- opentelemetry
- secret-manager
Log in or sign up for Devpost to join the conversation.