Kairos AI is a secure infrastructure automation platform that uses AI agents to manage cloud and DevOps operations, reducing manual toil while keeping teams in contro
Most infrastructure failures are not hard to fix — they’re hard to detect quickly, triage correctly, and execute safely.
I was tired of the DevOps reality:
DevOps today is mostly reactive: we wait for things to break, then scramble to fix them.
So I built Kairos (INFRAai) — a DevOps agent that monitors cloud infrastructure 24/7, detects incidents, recommends safe fixes, and can execute them autonomously with human approval.
This project taught me that agentic AI is not just prompting — real autonomy requires structure, state, safety, and feedback loops.
Key things I learned:
Kairos is designed as a pipeline of connected layers:
"1""STOP AUTONOMY"After every action, Kairos checks metrics again to confirm the problem is actually resolved.
Early versions sometimes produced JSON that looked correct but had missing or hallucinated fields.
I solved this using strict Pydantic schema validation and enforcing structured outputs.
Scaling or changing infra safely required preserving Terraform state.
I avoided regeneration and made the agent perform targeted modifications only.
GCP monitoring alerts come as deeply nested JSON and vary across incident types.
I built a normalization layer that consistently extracts:
The biggest challenge was making an autonomous infra system that users can trust.
I solved this using:
Kairos triggers a CPU incident using a simple threshold rule:
$$ CPU_usage = \frac{\text{busy_time}}{\text{total_time}} \times 100 $$
An alert triggers when:
$$ CPU_usage > 90\% $$
After applying an action (like scaling), Kairos re-checks the metric and confirms recovery.
Leave feedback in the comments!
Log in or sign up for Devpost to join the conversation.
Log in or sign up for Devpost to join the conversation.