InfraMinds: Autonomous Multi-Agent Cloud Architecture

Inspiration

The current state of AI in DevOps is fundamentally broken. Tools like ChatGPT or Copilot are simply "single-pass text generators." You ask for a database, and they blindly spit out Terraform HCL code. They don't understand network topologies, blast radiuses, or security constraints, which leads to hallucinated, broken infrastructure.

We realized that infrastructure shouldn’t start with configuration; it should start with understanding. We wanted to build an autonomous multi-agent system that acts like a Senior Cloud Architect—one that can look at a whiteboard sketch, debate security policies, and actually test its own code before deploying it.

What it does

InfraMinds is an autonomous, multi-agent infrastructure generator that bridges the gap between human intent and production-grade cloud deployments.

Instead of generating raw text, our agents build a Living Infrastructure Graph:

  • Multimodal Intent Agent: You upload a whiteboard sketch. The agent extracts the spatial topology and converts it into a structured NetworkX abstract graph.
  • Policy & Security Agents: A "Self-Healing Loop" kicks in. One agent designs the architecture, while a "Critic Agent" actively debates it. If a database is placed in a public subnet, the Critic Agent forces a graph mutation to fix it before any code is written.
  • Execution Agent: The verified graph is compiled into Terraform code and run through a rigorous 5-stage testing pipeline against a local mock AWS environment. If the deployment fails, the agent uses "X-Ray Vision" to diagnose the stderr logs, patches the code, and retries autonomously.
  • Blast Radius Simulator: Users can visually delete a node in the UI and watch the AI instantly calculate the cascading "Kill Chain" of downstream failures.

How we built it

We engineered a strict unidirectional data flow: Intent → Reasoning → Implementation → Execution.

  • Frontend: Next.js (React 19), Tailwind CSS, and React Flow with dagre for auto-layout graph visualizations.
  • Backend & Graph Engine: FastAPI (Python) serving real-time ndjson streams, heavily utilizing NetworkX for directed acyclic graph (DAG) dependency mapping.
  • AI Core: Google Gemini 2.0-Flash via the Google GenAI SDK, utilizing multimodal vision capabilities for sketch parsing.
  • Simulation & IaC Pipeline: We integrated LocalStack for local AWS simulation and tflocal to ensure our agents were verifying code dynamically, not just statically.

Challenges we ran into

Our biggest risk was LLM Hallucination. When mapping abstract intents to explicit AWS primitives (like VPCs, Subnets, and IGWs), the AI would sometimes drop required dependencies.

To solve this, we couldn't rely on prompt engineering alone. We had to build a Multi-Pass Cognitive Architecture. We built custom Python scripts to verify the mathematical monotonicity of the graph—ensuring no nodes were lost in translation. Furthermore, integrating the LocalStack container pipeline meant our agents had to learn how to capture stderr from a failed terraform apply, analyze the failure, rewrite the specific HCL block, and restart the deployment loop entirely on their own.

Accomplishments that we're proud of

  • The Closed-Loop Self-Healing: Moving from a standard "text-to-code" app to an autonomous agentic loop that actively fixes its own Terraform syntax errors and policy violations is a massive technical leap.
  • Blast Radius Graph Traversal: Successfully mapping cloud topologies into NetworkX so that the UI can accurately highlight a visual "Kill Chain" in milliseconds.
  • Zero-to-Production via Sketch: Taking a literal hand-drawn whiteboard diagram and having an AI output a mathematically verified, LocalStack-tested infrastructure deployment in under 2 minutes.

What we learned

We learned that Graph representation is vastly superior to raw text generation for complex agentic workflows. By forcing the AI to manipulate a structured graph instead of strings of code, we eradicated syntax errors during the design phase and could enforce deterministic security policies programmatically.

What's next for InfraMinds

InfraMinds is built to scale into an enterprise B2B SaaS platform. Our immediate roadmap includes:

  1. Multi-Cloud Agent Swarms: Expanding the semantic graph to map to Azure and GCP primitives.
  2. FinOps Integration: Enhancing the cost-prediction agent to actively suggest cheaper AWS instance alternatives during the design phase.
  3. State Drift Detection: Connecting the agent directly to production AWS accounts to visually highlight differences between the intent graph and actual cloud reality.

Built With

  • fastapi
  • google-gemini
  • localstack
  • networkx
  • next.js
  • python
  • react
  • react-flow
  • tailwind
  • terraform
Share this project:

Updates