ZenithOS dashboard before running test
ZenithOS dashboard after running test

Inspiration

I work with cloud infrastructure every day, and the pattern is always the same. Something breaks at 2am. Someone gets paged. They SSH in, find the misconfigured security group, fix it manually, push the Terraform change, and go back to sleep. The whole cycle takes 30–60 minutes for something a machine could catch and fix in seconds.

I wanted to build a system that closes that loop automatically. Real telemetry comes in, an AI evaluates it against security best practices, writes the Terraform patch, and puts it in front of an operator for one-click approval. No more guessing. No more 2am scrambles.

What It Does

ZenithOS monitors cloud infrastructure (AWS and GCP) for security misconfigurations in real time.

When it detects a violation — like an open SSH port or a public S3 bucket — it runs the telemetry through an AI agent powered by GPT-4o. The agent evaluates the issue against CIS Benchmark rules, then generates a Terraform unified diff to fix it.

Everything shows up live in a Command Center dashboard:

The agent’s reasoning process
The generated patch
A reliability score
The infrastructure topology

An operator can approve or reject the fix with one click, and all decisions are tracked in an audit log.

How We Built It

Three services connected by NATS:

Rust Telemetry Engine

Built with axum and tokio.

Handles the REST ingest API (POST /metrics)
Runs a WebSocket gateway that bridges NATS events to the browser

Chosen for throughput. It can handle thousands of telemetry payloads per second without breaking a sweat.

Python AI Agent

Uses LangGraph to orchestrate a multi-step pipeline.

Incoming telemetry goes through:

analyze node
- GPT-4o with structured JSON output to detect violations
remediate node
- GPT-4o again, this time generating a Terraform unified diff

Each step publishes its progress to NATS so the dashboard can stream the agent’s thought process in real time.

Next.js Dashboard

React Flow for the topology graph
shadcn/ui for the component library
A single WebSocket connection to the Rust engine

The dashboard auto-reconnects if the backend restarts. All approve/reject decisions are tracked in an audit history panel.

Messaging Layer

NATS ties everything together.

The Rust engine publishes telemetry
The Python agent subscribes and publishes events back
The Rust WebSocket gateway forwards everything to the browser

Sub-millisecond latency end to end.

Challenges We Ran Into

LangGraph + Python 3.14

LangChain’s Pydantic v1 compatibility layer throws warnings on Python 3.14. It still works, but the console noise was confusing at first. We had to dig through the source to confirm it was harmless.

Axum WebSocket Types

Axum requires the ws feature flag to be explicitly enabled. The type inference errors when it’s missing are not helpful at all. Took a while to realize it was just a Cargo feature issue.

React Hydration Mismatches

The WRI gauge component used Math.random() for sparkline data, which produced different values on server vs client. Fixed it with suppressHydrationWarning and moved the randomization to a client-side effect.

Getting the Layout Right

The remediation diff panel kept cutting off content when both it and the agent feed shared the same column. Spent time tuning the flex layout to give the diff viewer enough space to actually show the patch.

What We Learned

NATS is extremely fast and simple for service-to-service messaging. No schema registry. No config files. Just connect and publish.
LangGraph’s structured output mode with Pydantic models is a clean way to get reliable JSON from an LLM without parsing hacks.
Rust + axum is a strong fit for a WebSocket gateway. The async model maps naturally to fan-out broadcasting.
Streaming the AI agent’s intermediate steps to the UI makes the system feel alive and trustworthy. Users can see why the AI made a decision before they approve it.

Built With

Rust
Python
TypeScript
Next.js
React Flow
LangGraph
LangChain
OpenAI GPT-4o
NATS
axum
tokio
shadcn/ui
Tailwind CSS
Pydantic
Docker

Built With

aws-sdk
axum
docker-data:-redis-(messaging)
langgraph-infrastructure:-terraform
python-3.12-frameworks:-tokio
rust
tokio

Updates

Alkamal01 Aliyu started this project — Feb 28, 2026 03:45 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.