AI-Native Cloud Orchestrator

Inspiration

With the celebration of GKE's 10th anniversary, we're inspired by how Kubernetes has revolutionized cloud-native applications. However, managing and optimizing large-scale microservices remains a significant challenge. We envision an intelligent orchestration layer on top of GKE that leverages Google's powerful AI to automate complex operational tasks, making cloud-native infrastructure truly autonomous and efficient. Our inspiration comes from wanting to simplify the developer experience and reduce operational overhead, allowing teams to focus on innovation rather than infrastructure management.

What it does

AI-Native Cloud Orchestrator is an intelligent control plane for GKE clusters. It acts as an AI-powered co-pilot for DevOps teams. Key features include: Predictive Autoscaling: Utilizes Google's machine learning models to analyze historical traffic patterns and predict future load, proactively scaling pods and nodes before demand spikes occur. Intelligent Traffic Routing: Employs reinforcement learning to dynamically route traffic based on real-time service health, latency, and cost, ensuring optimal performance and resilience. Automated Anomaly Detection: Continuously monitors application logs and metrics to identify and flag unusual patterns, helping to detect potential issues like security threats or performance bottlenecks before they impact users.

How we built it

Our architecture is built entirely on Google Cloud. The core of the project is a set of microservices deployed on Google Kubernetes Engine (GKE). We plan to use Google's Vertex AI to train and deploy our predictive models. Data for these models will be collected from Google Cloud Operations Suite (formerly Stackdriver) for logs and metrics. The different AI agents will be written in Python, using libraries like TensorFlow and Scikit-learn, and will interact with the Kubernetes API to manage the cluster resources. The entire infrastructure will be defined as code using Terraform.

Challenges we ran into

Accomplishments that we're proud of

We are proud of designing a comprehensive and forward-thinking architecture that tightly integrates GKE with Vertex AI. This design moves beyond simple reactive automation and pioneers a predictive, AI-first approach to cloud infrastructure management. We've completed the initial system design and a proof-of-concept for data ingestion from the Cloud Operations API.

What we learned

Through the initial design phase, we have already learned a great deal about the complexities of real-time data processing and the nuances of the Kubernetes API. We've gained a deeper appreciation for the power of integrating managed services like GKE and Vertex AI, which allows us to focus on the core AI logic rather than managing the underlying infrastructure.

What's next for AI-Native Cloud Orchestrator

Our next step is to use the Google Cloud credits to build out our first functional prototype. We will focus on implementing the predictive autoscaling feature. We aim to train our first model on Vertex AI and deploy the corresponding AI agent to a GKE cluster to test its effectiveness in a controlled environment. We are excited to bring this vision to life and contribute to the GKE ecosystem.

Built With

docker
google-cloud-operations-suite
google-kubernetes-engine-(gke)
python
scikit-learn
tensorflow
terraform
vertex-ai

Updates

肥肉一斤 started this project — Aug 19, 2025 02:30 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.