Agentkube: AI Agent for Kubernetes Cluster 🚀

We are Live on Product Hunt

  • Click to Watch the Promo Video

Agentkube is an AI-powered Kubernetes management platform designed to simplify Kubernetes adoption. By automating cluster management, investigating issues, and providing intelligent solutions through an intuitive interface, Agentkube bridges the gap between developers and complex cluster operations. It reduces incident response time while maintaining enterprise-grade reliability, catering to developers scaling applications and DevOps engineers streamlining operations.

Inspiration 🌟

Agent Kube Review

In today's cloud-native landscape, managing Kubernetes clusters has become increasingly complex, demanding specialized knowledge and constant attention. Many organizations struggle with:

  • Complex cluster configurations requiring deep expertise
  • Frequent operational issues leading to system downtime
  • Time-consuming troubleshooting processes
  • Steep learning curves for new team members
  • Limited visibility into cluster health and performance

While more and more organizations are adopting Kubernetes, 100% of the D2IQ’s survey respondents said they had experienced certain challenges along this journey. The top three included:

  • Lack of IT resources (36%)
  • Effective scaling (34%)
  • Keeping up with the rapid advancement of underlying technologies (33%)

When moving Kubernetes workloads to production environments, the major three challenges become:

  • Security
  • Production environment reliability
  • Troubleshooting difficulties

A report by Red Hat further emphasizes security as a top challenge, with 94% of survey respondents having experienced a security incident in their Kubernetes and container environments. Specific security concerns include:

  • Exposures due to misconfigurations in Kubernetes environments (47%)
  • Vulnerabilities (31%)
  • Attacks (13%)

Agentkube was born from a vision to democratize Kubernetes management. We believe that the power of Kubernetes should be accessible to all teams, regardless of their expertise level. Our platform bridges the gap between complex cluster operations and development teams, making Kubernetes management more intuitive, efficient, and reliable.

What it Does 💡

Problem Statement

Recent surveys reveal that over 78% of organizations adopting and managing Kubernetes face significant challenges in effectively handling their clusters. The primary issues include:

  • Average incident resolution time exceeding 4 hours
  • 40% of teams lacking proper monitoring solutions
  • Struggles with configuration management, reported by 65% of organizations.
  • A significant skills gap in Kubernetes expertise, noted by 83% of respondents.
  • Once Kubernetes is adopted, cluster management becomes overwhelming, diverting engineers’ focus from innovation to infrastructure management.
  • Deploying Kubernetes workloads in production is fraught with challenges such as security vulnerabilities, troubleshooting complexities, reliability concerns, and escalating costs.
  • Initial Kubernetes setups often lack proper cost optimization, leading to inefficient operations and inflated expenses.
  • Poor practices arise from ineffective implementation of cost monitoring, autoscaling, and cluster sharing mechanisms.

Agentkube's Solution 💪

Our platform introduces an AI-powered approach to Kubernetes management, fundamentally transforming how teams interact with their clusters. By combining advanced AI capabilities with robust automation, we're making Kubernetes operations more accessible and efficient.

Key Features 🌟

1. Smart Investigation

investigation

The Smart Investigation follows a multi-agent approach, where each investigation agent focuses on a specific aspect of cluster analysis. Agents collaborate by referencing outputs from previous steps, applying a Chain of Thought process to make informed decisions.

Our advanced investigation system automates the analysis of cluster issues:

Automated Analysis

  • Real-time cluster state monitoring
  • Proactive issue detection
  • Systematic problem diagnosis
  • Root cause analysis

AI-Powered Insights

  • Predictive analytics for potential problems
  • Contextual recommendations

By providing actionable insights for effective problem resolution and future prevention. This extensive analysis enables teams to maintain peak cluster performance while decreasing operational complexity.

2. Response Protocols

The Response Protocol in Agentkube is an automated workflow system designed to handle Kubernetes incidents by executing predefined actions, diagnosing issues, and ensuring quick resolutions to maintain cluster stability.

protocols Custom-defined sequences for handling various cluster scenarios:

Protocol Management

  • YAML-based configuration
  • Version control integration
  • Template library
  • Custom protocol creation

Automation Features

  • Event-triggered execution
  • Conditional branching
  • Parallel execution support

3. AI-Powered Editor

The AI Editor in Agentkube provides intelligent suggestions for optimizing Kubernetes configurations, automating YAML creation, and simplifying complex edits, ensuring error-free and efficient cluster management.

👉 Watch AI Editor Demo

editor

Intelligent assistance for Kubernetes manifest management:

  • Syntax validation (Coming Soon)
  • Best practice suggestions
  • Resource optimization recommendations

4. Monitoring

monitor

The Monitoring feature in Agentkube offers real-time insights into cluster health, resource usage, and performance metrics, enabling proactive issue detection and streamlined operations.

Comprehensive cluster visibility powered by Prometheus and Metrics Server:

Core Metrics

  • CPU utilization
  • Memory usage
  • Network performance
  • Storage metrics

Visualization Tools

  • Real-time graphs
  • Historical data views (Currently for Past 3 days)

Technical Architecture

Architecture

How it Works

Agentkube simplifies Kubernetes cluster management by integrating powerful AI and metric visualization features.

To get started, ensure your environment includes a Kubernetes cluster (version 1.16 or higher), kubectl configured for cluster access, Helm 3.x installed, a valid Agentkube API key (generated via the Agentkube Dashboard under Settings > API Keys), and dashboard access. Installation involves deploying the Agentkube-Operator using Helm, which scrapes metrics from Prometheus and the Metrics Server for visualization in the dashboard.

The dashboard enables seamless metric analysis, response protocol management for diagnosing and resolving cluster issues, and an AI-powered editor for working with Kubernetes manifest files. Additionally, the "Talk to Cluster" feature lets developers communicate naturally with their cluster for diagnostics and operations.

Response protocols streamline issue diagnosis and resolution, while the Agentkube AI Agent leverages models like OpenAI and Claude to assist in investigations and file modifications. Investigation workflows include report generation, junior developer access, and senior engineer approvals, with actions finalized using DocuSign eSign. For more information and detailed guides, visit the Agentkube Documentation.

Role of DocuSign in Agentkube Architecture

👉 Watch the DocuSign integration with Agentkube Demo-3:47

DocuSign plays a crucial role in Agentkube's architecture by providing authority and facilitating streamlined workflows. It enables senior developers or team leads to gain insights into ongoing investigations easily, helping them resolve issues when junior developers encounter challenges. Additionally, DocuSign allows junior developers to request permissions from senior authorities for performing critical operations that require higher-level authorization. This ensures accountability, secure approvals, and efficient resolution of complex issues within the team, enhancing collaboration and maintaining operational integrity.

I hereby confirm that the project was solely created by me and completed entirely within the hackathon timeline, from 20th November 2024 to 27th January 2025.

Implementation of DocuSign - Code

1. getConsentUrl

export const getConsentUrl = async (req: Request, res: Response) => {
  // Implementation...
}

This function generates a DocuSign consent URL for JWT authentication. It takes a redirect URI from the request body and creates a URL with necessary OAuth parameters (response type, scope, client ID). When users open this URL, they can grant consent for the application to access DocuSign services. Function Explanation Visual Selection (2)

2. getAccessToken

export const getAccessToken = async (_: Request, res: Response) => {
  // Implementation...
}

Function Explanation Visual Selection

This function handles the JWT authentication flow with DocuSign. It:

  • Reads a private key from a file
  • Creates a JWT with necessary claims (issuer, subject, audience, etc.)
  • Exchanges the JWT for an access token by making a request to DocuSign's OAuth endpoint
  • Returns the access token in the response

3. getUserInfo

export const getUserInfo = async (req: Request, res: Response) => {
  // Implementation...
}

This function retrieves user information from DocuSign using an access token. It makes a request to DocuSign's userinfo endpoint and returns the user's details. The access token must be provided in the request body.

4. sendEnvelopeREST

export const sendEnvelopeREST = async (
  req: Request<{}, {}, SignatureRequestBody>,
  res: Response
) => {
  // Implementation...
}

This is the most complex function that:

  • Takes investigation details, email addresses, and DocuSign credentials
  • Retrieves investigation information from a database
  • Creates an HTML document with the investigation report
  • Sends the document for signature through DocuSign's REST API
  • Sets up signature fields and carbon copy recipients
  • Returns the envelope status and related information

Function Explanation Visual Selection (1)

The function creates a professionally formatted HTML document with investigation details and sends it through DocuSign's API for signature. It handles both primary signers and carbon copy recipients.

Each of these functions is part of a complete DocuSign integration flow:

  1. Get consent from user
  2. Get access token using JWT
  3. Get user information
  4. Send documents for signature

Investigation Runtime

The investigation engine leverages WebAssembly's capabilities directly in the dashboard to enhance how insights are gathered. By running investigation modules in a WebAssembly runtime within the browser, we achieve near-native performance for complex cluster analysis. This enables real-time cluster insights without server-side processing delays.

The engine follows a multi-agent approach, where each investigation agent focuses on a specific aspect of cluster analysis. Agents collaborate by referencing outputs from previous steps, applying a Chain of Thought process to make informed decisions. The WebAssembly integration allows investigations to run in parallel, processing multiple data streams simultaneously while maintaining responsive dashboard performance. This architecture significantly reduces the mean time to resolution (MTTR) by providing instant feedback and analysis results directly in the user interface, ensuring faster and smarter problem-solving.

Database Architecture

  • PostgreSQL for structured data
  • Qdrant for vector storage
  • Supabase for authentication
  • Redis for message queue (BullMQ)

How We Built It 🔧

Backend Development

  • Typescript and Go
  • Kubernetes Operators with Kubebuilder
  • WebAssembly modules written in Go
  • PostgreSQL & Qdrant integration

Frontend Stack

  • React with Vite
  • TypeScript
  • Tailwind CSS
  • Custom Components

AI Implementation

  • OpenAI API integration
  • Qdrant - Vector search implementation

DevOps Pipeline

  • CI/CD with GitHub Actions
  • Docker containerization
  • Helm chart deployment
  • GCP infrastructure

Challenges We Ran Into

Technical Complexity

  • Real-time cluster analysis optimization
  • Migrating from BullMQ to Wasm
  • Multi-cluster support implementation
  • Performance optimization
  • Secure communication
  • Data protection

Scale Management

  • Multi-tenant support
  • Resource optimization

Accomplishments I'm Proud Of

Technical Achievements

  • Successfully implemented real-time cluster analysis
  • Developed efficient WebAssembly modules
  • Created scalable architecture
  • Optimized AI response time
  • Multi-Agent System

User Impact

  • Reduced average MTTR(Mean time to recovery) by 60%
  • Improved developer efficiency by 40%
  • Simplified management processes
  • Enhanced user experience

My Journey with Agentkube

After failing with two startups, I needed a fresh start. As Simon Sinek says,

“People don’t buy what you do; they buy why you do it.”

My "why" was clear: solve real problems and create meaningful solutions. That’s how Agentkube was born.

Last April 2024, during my internship at AI Planet, I got a crash course in Kubernetes while managing the entire DevOps side and handling cloud migration. It was intense but gave me firsthand exposure to the complexities of Kubernetes. I knew then that simplifying this beast was something worth building.

“It’s easier to do a hard startup than an easy startup,” ~ Sam Altman

It resonated deeply with me. Building Agentkube has been the hardest thing I’ve ever done. There were countless days I had no idea what I was doing—no roadmap, no team, just me, struggling to figure out how systems would communicate and how the design would work.

After getting laid off without explanation last October, I had two choices: feel sorry for myself or fight back. I chose to fight. Over the last few months, I built three startups, failing twice before realizing I needed to focus on solving real problems. Perfect3sixty struggled with sales, and Autohr didn’t take off due to rushed execution. But I learned, pivoted, and poured everything into Agentkube.

Now, after endless sleepless nights, Agentkube is live on Product Hunt. It doesn’t matter if it wins a hackathon or not—I’m proud of what I’ve built. I took the hardest challenge I could think of and turned it into a reality. Agentkube is my proof that even after setbacks, you can bounce back and create something meaningful.

Potential Impact

Agentkube streamlines Kubernetes management, allowing organisations to overcome adoption barriers and operational difficulties. By automating cluster operations and providing intuitive interfaces, engineers can concentrate on innovation rather than infrastructure. The platform improves security, simplifies troubleshooting, and reduces costs, resulting in more efficient and dependable production settings. Agentkube enables enterprises/startups to scale easily while maintaining stable, secure, and cost-effective Kubernetes clusters.

What's Next for Agentkube

  1. Feature Expansion

    • Alert integration system
    • Advanced AI capabilities
  2. Technical Improvements

  • Performance optimization
  • Enhanced security features
  • Scalability improvements

Try it out 🚀

Built with passion by developer, for developers 💪

Built With

Share this project:

Updates