Inspiration
I work with cloud infrastructure every day, and the pattern is always the same. Something breaks at 2am. Someone gets paged. They SSH in, find the misconfigured security group, fix it manually, push the Terraform change, and go back to sleep. The whole cycle takes 30–60 minutes for something a machine could catch and fix in seconds.
I wanted to build a system that closes that loop automatically. Real telemetry comes in, an AI evaluates it against security best practices, writes the Terraform patch, and puts it in front of an operator for one-click approval. No more guessing. No more 2am scrambles.
What It Does
ZenithOS monitors cloud infrastructure (AWS and GCP) for security misconfigurations in real time.
When it detects a violation — like an open SSH port or a public S3 bucket — it runs the telemetry through an AI agent powered by GPT-4o. The agent evaluates the issue against CIS Benchmark rules, then generates a Terraform unified diff to fix it.
Everything shows up live in a Command Center dashboard:
- The agent’s reasoning process
- The generated patch
- A reliability score
- The infrastructure topology
An operator can approve or reject the fix with one click, and all decisions are tracked in an audit log.
How We Built It
Three services connected by NATS:
Rust Telemetry Engine
Built with axum and tokio.
- Handles the REST ingest API (
POST /metrics) - Runs a WebSocket gateway that bridges NATS events to the browser
Chosen for throughput. It can handle thousands of telemetry payloads per second without breaking a sweat.
Python AI Agent
Uses LangGraph to orchestrate a multi-step pipeline.
Incoming telemetry goes through:
analyzenode
- GPT-4o with structured JSON output to detect violations
- GPT-4o with structured JSON output to detect violations
remediatenode
- GPT-4o again, this time generating a Terraform unified diff
- GPT-4o again, this time generating a Terraform unified diff
Each step publishes its progress to NATS so the dashboard can stream the agent’s thought process in real time.
Next.js Dashboard
- React Flow for the topology graph
- shadcn/ui for the component library
- A single WebSocket connection to the Rust engine
The dashboard auto-reconnects if the backend restarts. All approve/reject decisions are tracked in an audit history panel.
Messaging Layer
NATS ties everything together.
- The Rust engine publishes telemetry
- The Python agent subscribes and publishes events back
- The Rust WebSocket gateway forwards everything to the browser
Sub-millisecond latency end to end.
Challenges We Ran Into
LangGraph + Python 3.14
LangChain’s Pydantic v1 compatibility layer throws warnings on Python 3.14. It still works, but the console noise was confusing at first. We had to dig through the source to confirm it was harmless.
Axum WebSocket Types
Axum requires the ws feature flag to be explicitly enabled. The type inference errors when it’s missing are not helpful at all. Took a while to realize it was just a Cargo feature issue.
React Hydration Mismatches
The WRI gauge component used Math.random() for sparkline data, which produced different values on server vs client. Fixed it with suppressHydrationWarning and moved the randomization to a client-side effect.
Getting the Layout Right
The remediation diff panel kept cutting off content when both it and the agent feed shared the same column. Spent time tuning the flex layout to give the diff viewer enough space to actually show the patch.
What We Learned
- NATS is extremely fast and simple for service-to-service messaging. No schema registry. No config files. Just connect and publish.
- LangGraph’s structured output mode with Pydantic models is a clean way to get reliable JSON from an LLM without parsing hacks.
- Rust + axum is a strong fit for a WebSocket gateway. The async model maps naturally to fan-out broadcasting.
- Streaming the AI agent’s intermediate steps to the UI makes the system feel alive and trustworthy. Users can see why the AI made a decision before they approve it.
Built With
Rust
Python
TypeScript
Next.js
React Flow
LangGraph
LangChain
OpenAI GPT-4o
NATS
axum
tokio
shadcn/ui
Tailwind CSS
Pydantic
Docker
Built With
- aws-sdk
- axum
- docker-data:-redis-(messaging)
- langgraph-infrastructure:-terraform
- python-3.12-frameworks:-tokio
- rust
- tokio

Log in or sign up for Devpost to join the conversation.