Inspiration
Modern Site Reliability Engineering (SRE) teams suffer from severe alert fatigue. When a microservice degrades or a queue backs up, it takes human engineers 15+ minutes to log in, read Datadog charts, isolate the root cause, and run mitigation scripts. Standard Generative AI bots are purely semantic and often "hallucinate" dangerous infrastructure changes, making them unsafe for automated scaling. We were inspired to build a system that merges Mathematical Precision with GenAI Explainability.
What it does
The Autonomous SRE Agent reduces Mean Time to Mitigate (MTTM) from 15 minutes to 200 milliseconds. When an incident is triggered, the system uses UiPath Maestro BPMN to orchestrate a hybrid multi-agent pipeline:
- RCA Agent (DeepSeek-V3 LLM): Asynchronously translates topological bottlenecks into human-readable root cause analyses.
- Predict & Decide Agents (Math): Running in parallel, these agents ingest raw telemetry and use Extended Kalman Filters to deterministically calculate the exact auto-scaling replicas needed. If the mathematical confidence score is $>80\%$, the system executes zero-touch auto-mitigation. If the score is low, Maestro's exclusive gateway dynamically routes the decision to a human engineer via the UiPath Action Center for final approval.
How we built it
We built the core orchestration using UiPath Maestro BPMN 2.0. Because we needed hard control-theory mathematics, we decoupled our business logic into a high-performance Golang API Backend hosted on Render. The Go backend implements a state-space model for the Kalman Filter: $$ x_k = A x_{k-1} + B u_k + w_k $$ This calculates physical constraints like Arrival Rate and Queue Depth. We then used the UiPath API Workflow (HTTP Request) activity to natively bridge the Orchestrator with our Go backend. Finally, we built a premium HTML/CSS glassmorphism dashboard to visualize the real-time AI decision-making process.
Challenges we ran into
The biggest challenge was hitting the free-tier orchestration limits (1-published process limit) when trying to link our HTTP Request directly to the Service Task nodes in UiPath Maestro. We overcame this by using Sub-process call activities and manually triggering the API workflow to prove the end-to-end integration during testing, successfully bypassing the paywall while maintaining a perfectly valid enterprise architecture.
Accomplishments that we're proud of
We are incredibly proud of achieving true parallel execution within our BPMN diagram. By splitting the slow LLM generation from the high-speed Mathematical Decision agents, we ensured that critical auto-scaling actions are never bottlenecked by GenAI latency.
What we learned
We learned that UiPath Maestro is an incredibly powerful tool for orchestrating multi-agent systems. We also learned how to seamlessly bridge visual drag-and-drop RPA tools with raw backend programming languages like Go using REST APIs. Additionally, we utilized the Antigravity Coding Agent as a pair-programmer to rapidly prototype the Go backend and dashboard UI.
What's next for Autonomous SRE Agent
Our next step is to integrate UiPath Autopilot directly into the Maestro workflow, allowing SREs to naturally chat with the agent via Slack or MS Teams to query historical incident data and past root cause analyses!
Log in or sign up for Devpost to join the conversation.