Inspiration
Everyone is "vibe coding" right now, spinning up complex apps in minutes. But when those apps hit production and scale, the AI isn't there to wake up at 3 AM to fix memory leaks or database bottlenecks. We realized we needed an AI that doesn't wait for a prompt. We needed a proactive agent that defends infrastructure automatically.
What it does
SRE24 is an autonomous Site Reliability Engineer. When Dynatrace detects a production anomaly, it fires a webhook to SRE24. Our agent automatically connects to Dynatrace to investigate the distributed trace, scales up Google Cloud Run infrastructure to keep the app alive (its "Ops Hand"), and autonomously generates a highly-optimized GitHub Pull Request to permanently patch the root cause (its "Dev Hand").
How we built it
- Agent Core: Built using the Google Agentic Development Kit (ADK) and powered by Gemini 2.5 Flash for rapid reasoning and code generation.
- Telemetry: Integrated the Dynatrace MCP (Model Context Protocol) server to allow the agent to execute raw DQL queries and fetch live traces.
- Backend: Built a multi-tenant FastAPI server that intercepts webhooks and delegates them to isolated background worker threads.
- Frontend: Designed a highly responsive, glassmorphism React + Vite dashboard to visualize the agent's real-time thought process.
Challenges we ran into
Handling concurrent, multi-tenant agent executions was incredibly complex. The Google ADK runner executes agents in isolated background threads to prevent event-loop blocking. Standard Python ContextVars (which we used to securely pass GitHub and Dynatrace tokens) do not natively propagate into new threads. We had to engineer a custom threading. Thread monkeypatch to securely inject context across asynchronous boundaries without introducing race conditions.
Accomplishments that we're proud of
We achieved true autonomy. SRE24 doesn't just summarize a log file it dynamically queries a live production environment, analyzes the trace, locates the exact failing file in a GitHub repository, applies a fuzzy-matched code patch, and opens a PR. It does all of this in under 30 seconds, completely unprompted by a human.
What we learned
We learned the power of combining the Model Context Protocol (MCP) with autonomous agentic loops. By giving Gemini direct access to execute Dynatrace DQL queries, it wasn't just guessing based on generic alerts; it was reasoning over exact, live production state.
What's next for SRE24
Expanding the capabilities of the "Dev Hand." We plan to integrate more deeply with Dynatrace's Davis AI.
Built With
- adk
- cloudrun
- dynatrace
- fastapi
- gemini
- mcp
- react
- sqlalchemy
- supabase
- vite
Log in or sign up for Devpost to join the conversation.