Inspiration
Reduces Mean Time To Recovery (MTTR) from minutes/hours to under 60 seconds by automating incident detection, decision-making, and remediation.
What it does
AgentOps is an intelligent control plane that uses Gemini AI to monitor Google Cloud Run services, detect anomalies, and automatically execute remediation actions
How we built it
- Real-time Monitoring: Tracks Cloud Run service health using Cloud Monitoring
- AI-Powered Analysis: Gemini 1.5 Flash analyzes anomalies and recommends actions
- Automated Remediation: Executes rollbacks, scaling, and rebuilds automatically
- Live Dashboard: Real-time visualization of service health and incidents
- Fault Injection: Built-in fault injection for testing and demos ## Challenges we ran into -** Deployment and Build issues form local to gcloud. But the simple cloud commands gcloud commands helped a lot to triage and resolve the issues. ## Accomplishments that we're proud of -** Practical test and case implementation, this is a stepping stone for a bigger project that can leverage much powerful systems and provide more flexible platform stability. ## What we learned -** How easy its to code and develop these days using AI. -** Its much simple to deploy using gcloud in cloud run -** ## What's next for AgentOps -** Enhance this project to make it more autonomous using MultiAgents using google ADK , google MCP toolkit
Log in or sign up for Devpost to join the conversation.