The Kernel: Autonomous FinOps SRE Agent

Inspiration

Cloud waste is the silent killer of enterprise profitability. As an SRE, I watched teams spend hours manually identifying idle Kubernetes nodes and unattached persistent disks. Inspired by the need for "Action-First" AI, I built The Kernel—an autonomous agent designed to move from detection to remediation without human latency.

What it does

The Kernel serves as a self-healing layer for GCP infrastructure. It monitors cloud telemetry, uses Gemini 1.5 Pro to reason about cost-optimization opportunities, and autonomously generates Terraform infrastructure remediation. It then opens a GitLab Merge Request, effectively making SRE workflows self-correcting.

How we built it

We utilized a serverless architecture on GCP Cloud Run to handle incoming Cloud Monitoring webhooks.

  • Reasoning: Gemini 1.5 Pro analyzes the resource metadata to determine the optimal downsizing configuration.
  • IaC Generation: The agent dynamically writes Terraform HCL, ensuring compliance with enterprise infrastructure standards.
  • Orchestration: Integrated directly with the GitLab REST API to automate the deployment lifecycle.

Challenges faced

The primary challenge was ensuring idempotency—preventing the AI from generating conflicting infrastructure changes. We solved this by implementing a state-verification layer that checks for existing pending Merge Requests before triggering new remediation.

What we learned

We validated that Agentic AI is most powerful when it acts as an extension of the existing CI/CD pipeline rather than a separate chat interface.

Built With

Share this project:

Updates