Inspiration

During our interactions with various cloud managers and DevOps teams, we uncovered a critical challenge that resonates across industries: a significant portion of DevOps engineers' time up to 30-40% is spent manually managing and silencing noisy CloudWatch alarms, many of which are false positives or of low priority. This repetitive task not only wastes valuable time but also incurs unnecessary costs, creating a cycle of inefficiency and security risks. Recognizing this widespread pain point as a fundamental bottleneck in cloud operations, we saw an opportunity to leverage AI to address and automate it, transforming cloud management into a more intelligent and self-sufficient process. Our idea was fueled by the realization that a self-healing, AI-powered system could significantly reduce operational overhead, improve accuracy, and boost cost savings. This motivated us to develop a comprehensive autonomous agent system that intelligently manages alarms, performs remediation actions in real-time, and ensures better security and compliance—all on the cloud platform that companies rely on daily.

What it does

Our project, CloudWise, is designed to make cloud management smarter and more efficient. It continuously monitors AWS environments through a combination of cost tracking, compliance checks, and operational alerts. It detects idle EC2 instances and automatically shuts them down to save costs. It also monitors S3 buckets for unauthorized public access and enforces security policies by blocking exposure issues automatically. This is complemented by a powerful dashboard built with Streamlit, which visualizes real-time savings, compliance status, and system health, providing both high-level insights and detailed logs. What makes CloudWise truly innovative is its AI-driven decision-making capability, powered by Amazon Bedrock's Titan model. This AI agent evaluates various signals, reasons about appropriate actions, and even provides recommendations, making the system not just reactive but proactively intelligent. The entire setup is wrapped into a modular, serverless architecture where Lambda functions handle automation logic, DynamoDB logs all operations, and PowerPipe benchmarks ensure ongoing compliance. Users can toggle automation features on the dashboard, giving them control while the system intelligently manages routine operations.

How we built it

Our journey began with understanding critical pain points through industry conversations. We then designed a modular architecture utilizing a suite of AWS services: Lambda for serverless execution, Bedrock's Titan model for reasoning, DynamoDB for logging and state management, and CloudWatch with alarms to trigger automated actions. We integrated PowerPipe's AWS compliance mods for real-time benchmarking, which feeds into the dashboard for compliance visibility. The UI was built with Streamlit, offering an intuitive interface to visualize data and control automation settings. Building this system posed challenges, especially in seamlessly integrating various AWS services and ensuring that our AI agents could make trustworthy decisions. We iteratively refined prompts for the Titan model, optimized Lambda workflows for responsiveness, and implemented robust IAM roles to secure the automation pipeline. Our deployment scripts automated the packaging and provisioning of Lambda functions, ensuring scalability, extensibility, and ease of updates.

Challenges we ran into

One of the major hurdles was managing permissions—AWS security policies needed fine-tuning to allow Lambda to perform actions safely without over-permissioning. Synchronizing real-time alarm triggers with AI decision-making required careful event orchestration. Additionally, training and prompting Amazon Bedrock’s Titan model to reliably reason about cloud states and actions was complex, requiring multiple iterations for clarity and precision. Visualizing compliance and cost data securely and interactively within the dashboard also posed integration challenges, especially in rendering large datasets efficiently.

Accomplishments that we're proud of

We’re particularly proud of creating a system that demonstrates true autonomous behavior—not just alerting but actively remediating issues without manual input. Our successful integration of Amazon Bedrock’s foundational model for reasoning and the dynamic automation of both cost and security policies represent a significant milestone. Additionally, our implementation of a real-time, filterable dashboard that combines logs, compliance benchmarks, and AI reasoning outputs provides a transparent and actionable view into the system’s operations. This project exemplifies the potential for self-healing cloud environments driven by AI.

What we learned

Throughout this project, we learned the immense power of combining AI reasoning with cloud automation. Fine-tuning prompts for large language models like Bedrock Titan is crucial for generating reliable decisions under operational constraints. We also realized the importance of designing secure, minimal-privilege IAM roles and the value of modular, extensible architecture for future growth. The experience of integrating diverse AWS services highlighted the importance of event-driven architecture and robust monitoring to ensure responsiveness and reliability.

What's next for Cloudwise: AI Agents for AWS Cost & Compliance Optimization

Looking ahead, we aim to expand CloudWise’s capabilities by incorporating additional AWS resources like RDS and EKS, and integrating security tools such as AWS Security Hub for comprehensive compliance monitoring. Enhancing the AI reasoning capabilities to include multimodal inputs and explanations will make the system more transparent and trustworthy. We also plan to develop alerting workflows that notify teams via Slack or email, allowing human oversight when needed. Ultimately, our goal is to evolve CloudWise into a fully autonomous, self-healing cloud platform that scales seamlessly across enterprise environments, continually learning and optimizing itself.

Built With

  • amazon-ec2-ai-model:-amazon-bedrock-(titan-model)-database:-amazon-dynamodb-monitoring-&-compliance:-aws-cloudwatch
  • amazon-web-services
  • aws-config
  • bash
  • bash-frontend-framework:-streamlit-cloud-platform:-aws-compute:-aws-lambda
  • cli
  • csv
  • dashboard
  • deployment
  • deployment:
  • lambda
  • packaging
  • powerpipe
  • powerpipe-(steampipe-mod-aws-compliance
  • python
  • steampipe-mod-aws-thrifty)-apis-&-sdks:-boto3-(aws-sdk-for-python)
  • streamlit
  • visualization:
Share this project:

Updates