SRE Sentinel

Inspiration

Every sysadmin, SRE, or DBA has faced the same problem: logs full of red at 2 AM and no safe way to get help without leaking sensitive data. Many “AI Ops” tools live in the cloud, which doesn’t work in regulated or air‑gapped environments. SRE Sentinel flips that model: an offline‑first, private SRE assistant that plans diagnostics and produces auditable reports locally.

What It Does

Host Diagnose (read‑only): Plans and runs safe SSH checks for CPU/Disk/Memory, security posture, system logs, and patch/update status. Produces a clean Markdown report with a trace_id.
Log Analyzer: Summarizes log files, highlights patterns/anomalies, proposes next steps.
Audit Trail: Structured JSON logs (with hashed inputs) plus SQLite history of runs for trust and compliance.
Reports & History: Saves reports under reports//; keeps plan/results/report in sentinel.db.
Interfaces: CLI‑only REPL with natural language intents and slash commands (e.g., /host:diagnose, /report:last).
Offline First: Runs fully local with llama.cpp or OpenAI compatible local adapters (e.g., Ollama/LM Studio). Cloud providers (OpenAI/OpenRouter) supported for dev. SRE_OFFLINE_ONLY=true enforces local only.

How We Built It

Rust Core: CLI REPL (Rustyline + Clap), plugin registry for tools, JSON tracing logs, and SQLite persistence.
Prompt‑First Planner: The LLM plans read only diagnostics and later synthesizes a Markdown report from results.
Providers: Async OpenAI‑compatible client for OpenAI/OpenRouter/local adapters; llama.cpp for in‑process local inference.
Auditing & Persistence: Daily‑rolling JSON logs, trace_id in report title and DB, reports saved by host.

Challenges We Ran Into

Enforcing read only safety over SSH (strict allowlist, shell‑meta rejection, per‑command timeouts).
Making JSON‑only planning reliable (schema guidance + repair pass).
Balancing offline only guarantees with convenient dev workflows (provider/endpoint gating).
Streaming UX parity across local and OpenAI‑compatible backends.

Accomplishments We’re Proud Of

Deterministic, auditable runs with trace_id woven through logs, DB, and report.
Clean CLI UX: natural‑language intent routing and friendly slash‑command aliases.
End to end report generation from planned checks to actionable Markdown output.
Offline first posture that still supports quick dev on cloud providers when not restricted.

What We Learned

Prompt first planning reduces hardcoded flows while staying safe with strict command rules.
OpenAI compatible adapters simplify local model swaps, but JSON only prompting needs careful guidance.
Auditable by default design builds trust, especially for SRE change management.

What’s Next for SRE Sentinel

Diagnostics profiles flag (--profile quick|full) refinement and broader coverage.
Clipboard copy and richer /report:open UX; propagate trace_id in more outputs.
Expand safe allowlist and add targeted config inspectors (Nginx/Postgres) under read‑only constraints.
Harden llama.cpp path and model loading ergonomics; add more tests and packaging.

Built With

actix
gpt-oss
llama.cpp
rust
sqlite

Updates

Anthony Cervantes started this project — Sep 11, 2025 07:21 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.