Intelligent Cloud Resource Optimizer

Intelligent Cloud Resource Optimizer

Inspiration Cloud computing, while offering immense flexibility and scalability, often leads to spiraling costs due to over-provisioning, idle resources, and a lack of real-time visibility into spending. Many organizations struggle to keep their cloud bills in check while ensuring optimal application performance. We were inspired to create an intelligent, autonomous solution that could actively monitor, analyze, and recommend actions to optimize Google Cloud resources, taking the guesswork out of cost management and performance tuning. The Google ADK, with its powerful multi-agent capabilities, seemed like the perfect framework to build such a sophisticated system.

What it does The Intelligent Cloud Resource Optimizer acts as a smart cloud financial operations (FinOps) assistant. It's designed to:

Monitor Cloud Usage (Monitor Agent): Continuously gathers vital metrics like CPU utilization, memory consumption, network traffic, and billing data for key Google Cloud services (e.g., Compute Engine VMs, Cloud Storage buckets). Analyze for Inefficiencies (Analysis Agent): Processes the collected data to pinpoint underutilized VMs, identify idle resources (e.g., unattached disks, unused static IPs), and detect patterns indicating potential cost savings or performance bottlenecks. It can flag resources that are consistently below a defined usage threshold. Generate Smart Recommendations (Recommendation Agent): Based on the analysis, it formulates concrete, actionable recommendations. Examples include: Right-sizing VMs: Suggesting smaller machine types for underutilized instances. Storage Tiering: Recommending transitioning data to more cost-effective storage classes (e.g., Nearline, Coldline, Archive) based on access patterns. Idle Resource Deletion: Identifying and recommending the deletion of unattached disks or unused static IPs. Scheduling Non-Production Resources: Suggesting schedules for shutting down development/test environments during off-hours. Policy Compliance Check (Policy Agent): Before presenting recommendations, this agent ensures they align with predefined organizational policies (e.g., minimum VM size for production, specific compliance standards for data). Interactive Interface (ADK Web UI): Presents the optimized recommendations in a clear, digestible format through the ADK's built-in web interface, allowing cloud administrators to review and approve actions. How we built it We built the Intelligent Cloud Resource Optimizer as a multi-agent system using the Google Agent Development Kit (ADK) and integrated it deeply with Google Cloud Platform (GCP) services.

Core Agent Design with ADK:

We defined distinct, specialized agents: MonitorAgent, AnalysisAgent, RecommendationAgent, and PolicyAgent. This modularity was crucial for managing complexity and leveraging ADK's multi-agent capabilities. The MainOrchestratorAgent (an LlmAgent for dynamic routing) was built to intelligently delegate tasks between these specialized agents based on the overall goal of optimization. We utilized ADK's Tool system extensively. Each agent was equipped with custom Python tools (functions) to interact with specific Google Cloud APIs. Google Cloud Integration:

Google Cloud Monitoring API: Our MonitorAgent used the Python client library for the Cloud Monitoring API to fetch metrics for Compute Engine instances (CPU, Memory), and Cloud Storage buckets (storage size, access times). Google Cloud Billing API (Simulated/Partial): For hackathon purposes, we either used the Cloud Monitoring API for cost data (which can provide some cost-related metrics) or simulated fetching billing data, focusing on deriving cost implications from resource usage. Vertex AI (Gemini Models): We leveraged the Gemini models available through Vertex AI as the underlying LLM for our LlmAgents. This enabled natural language understanding for user queries and powerful reasoning for generating intelligent recommendations and explanations. Google Cloud SDK & ADC: We ensured our local development environment was authenticated using Application Default Credentials, allowing seamless interaction with GCP services. Data Flow and Orchestration:

The MainOrchestratorAgent initiates the process, perhaps in response to a user prompt like "Analyze my cloud costs." It delegates to the MonitorAgent to collect relevant data. The raw data is then passed to the AnalysisAgent, which processes it and identifies potential optimizations. The AnalysisAgent's findings are sent to the RecommendationAgent, which crafts specific, actionable suggestions. Before output, the PolicyAgent acts as a guardrail, ensuring recommendations adhere to pre-configured organizational policies. Finally, the approved recommendations are displayed in the ADK Web UI. Local Development and Debugging:

We heavily utilized the ADK's built-in CLI and Web UI for local development, testing, and debugging. The visual trace of agent execution in the Web UI was invaluable for understanding agent thought processes and debugging tool calls. Challenges we ran into API Rate Limits and Quotas: When fetching extensive monitoring data, we occasionally hit Google Cloud API rate limits. We implemented basic retry mechanisms and focused on fetching targeted data for the demo. Complex Data Aggregation and Analysis: While Google Cloud Monitoring provides metrics, combining and analyzing this data effectively to identify specific optimization opportunities (e.g., consistently underutilized VMs over a week) required careful aggregation logic within the AnalysisAgent. Prompt Engineering for Actionable Recommendations: Crafting prompts for the RecommendationAgent to consistently generate precise, actionable, and contextually relevant suggestions (e.g., specifying the exact new machine type for a VM) was an iterative process. Multi-Agent Coordination: Designing the MainOrchestratorAgent to seamlessly delegate tasks and pass information between different agents, ensuring each agent understood its input and expected output, required careful planning and debugging of the workflow. Policy Definition and Enforcement: Translating abstract "organizational policies" into concrete, programmable rules for the PolicyAgent (e.g., minimum CPU thresholds, specific storage encryption requirements) was a challenge that required clear rule sets. Time Constraints: The hackathon's tight deadline meant prioritizing core functionality and demonstrability over comprehensive feature sets or robust error handling for all edge cases. Accomplishments that we're proud of Successfully Implemented a Multi-Agent System: We are particularly proud of building a functional multi-agent system where different specialized agents collaborate to achieve a complex goal – cloud optimization – demonstrating the true power of Google ADK. Real-world Problem Solving: Our solution directly addresses a significant pain point for many cloud users: managing and optimizing cloud spend. The ability to identify real cost savings is a tangible benefit. Seamless Google Cloud Integration: We successfully integrated with critical Google Cloud APIs (Monitoring, Vertex AI), showcasing ADK's capability to connect with external services effectively. Intuitive User Experience: Despite the underlying complexity, the ADK Web UI provides a clear and interactive way for users to understand the optimization recommendations, making the insights actionable. Clear Modular Design: The distinct roles of each agent (Monitor, Analyze, Recommend, Policy) make the system scalable, maintainable, and easy to understand, aligning with best practices for agentic application development. What we learned Power of ADK for Multi-Agent Systems: We gained a deep appreciation for how ADK simplifies the development of complex multi-agent architectures, particularly its mechanisms for agent delegation (LlmAgent routing) and tool integration. Importance of Clear Tool Definitions: Well-defined Tool descriptions and input schemas are critical for LLMs to effectively utilize them within an agent's reasoning process. Iterative Prompt Engineering: Developing effective agent behavior requires continuous iteration on prompts and instructions to guide the LLM's reasoning and output. Cloud API Capabilities: We deepened our understanding of various Google Cloud APIs and their potential for automation and data extraction, especially in the context of FinOps. Debugging Agentic Flows: The ADK's built-in debugging tools (like the visual trace in the Web UI) are indispensable for understanding the agent's thought process and identifying where issues occur in a multi-step workflow. What's next for Intelligent Cloud Resource Optimizer Automated Execution: Implement an ExecutionAgent (with proper guardrails and approval workflows) to automatically apply approved recommendations (e.g., using Google Cloud Deployment Manager, Cloud Functions for resizing or deleting resources). This would transform recommendations into direct actions. Predictive Optimization: Integrate more advanced Vertex AI capabilities for predictive analytics to anticipate future resource needs or cost spikes, enabling proactive optimization. Broader Service Coverage: Extend support to optimize more Google Cloud services, such as Cloud SQL, BigQuery, GKE clusters, and networking resources. Customizable Policies: Allow users to easily define and manage their own custom optimization and compliance policies through a user-friendly interface. Integration with Cloud Cost Management Tools: Integrate with Google Cloud's native Cost Management tools and the Recommender API for even richer insights and a unified view. Reporting and Dashboards: Develop comprehensive reporting and dashboard features (perhaps using Looker Studio or BigQuery) to visualize cost savings and performance improvements over time. Alerting and Notifications: Implement proactive alerts for cost anomalies or non-compliant resource usage, integrating with services like Cloud Monitoring Alerts or Pub/Sub.

Built With

adk-web-ui
and
and-tool-system
cli
cloud
cloud-monitoring-api
cloud-run
docker
git/github
google
google-agent-development-kit-(adk)-including-llmagent
google-cloud-aiplatform)
google-cloud-billing-(conceptual/api-interaction)
google-cloud-client-libraries-for-python-(google-cloud-monitoring
google-cloud-iam
google-cloud-monitoring
google-cloud-platform-(gcp)
n
vertex-ai-(gemini-models)
vertex-ai-api
workflowagent

Updates

Hemanth Kumar S started this project — Jun 23, 2025 02:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.