Inspiration

Every sysadmin, SRE, or DBA has faced the same problem: logs full of red at 2 AM and no safe way to get help without leaking sensitive data. Many “AI Ops” tools live in the cloud, which doesn’t work in regulated or air‑gapped environments. SRE Sentinel flips that model: an offline‑first, private SRE assistant that plans diagnostics and produces auditable reports locally.

What It Does

  • Host Diagnose (read‑only): Plans and runs safe SSH checks for CPU/Disk/Memory, security posture, system logs, and patch/update status. Produces a clean Markdown report with a trace_id.
  • Log Analyzer: Summarizes log files, highlights patterns/anomalies, proposes next steps.
  • Audit Trail: Structured JSON logs (with hashed inputs) plus SQLite history of runs for trust and compliance.
  • Reports & History: Saves reports under reports//; keeps plan/results/report in sentinel.db.
  • Interfaces: CLI‑only REPL with natural language intents and slash commands (e.g., /host:diagnose, /report:last).
  • Offline First: Runs fully local with llama.cpp or OpenAI compatible local adapters (e.g., Ollama/LM Studio). Cloud providers (OpenAI/OpenRouter) supported for dev. SRE_OFFLINE_ONLY=true enforces local only.

How We Built It

  • Rust Core: CLI REPL (Rustyline + Clap), plugin registry for tools, JSON tracing logs, and SQLite persistence.
  • Prompt‑First Planner: The LLM plans read only diagnostics and later synthesizes a Markdown report from results.
  • Providers: Async OpenAI‑compatible client for OpenAI/OpenRouter/local adapters; llama.cpp for in‑process local inference.
  • Auditing & Persistence: Daily‑rolling JSON logs, trace_id in report title and DB, reports saved by host.

Challenges We Ran Into

  • Enforcing read only safety over SSH (strict allowlist, shell‑meta rejection, per‑command timeouts).
  • Making JSON‑only planning reliable (schema guidance + repair pass).
  • Balancing offline only guarantees with convenient dev workflows (provider/endpoint gating).
  • Streaming UX parity across local and OpenAI‑compatible backends.

Accomplishments We’re Proud Of

  • Deterministic, auditable runs with trace_id woven through logs, DB, and report.
  • Clean CLI UX: natural‑language intent routing and friendly slash‑command aliases.
  • End to end report generation from planned checks to actionable Markdown output.
  • Offline first posture that still supports quick dev on cloud providers when not restricted.

What We Learned

  • Prompt first planning reduces hardcoded flows while staying safe with strict command rules.
  • OpenAI compatible adapters simplify local model swaps, but JSON only prompting needs careful guidance.
  • Auditable by default design builds trust, especially for SRE change management.

What’s Next for SRE Sentinel

  • Diagnostics profiles flag (--profile quick|full) refinement and broader coverage.
  • Clipboard copy and richer /report:open UX; propagate trace_id in more outputs.
  • Expand safe allowlist and add targeted config inspectors (Nginx/Postgres) under read‑only constraints.
  • Harden llama.cpp path and model loading ergonomics; add more tests and packaging.

Built With

Share this project:

Updates