## Inspiration

Three real problems converge for anyone running their own Linux server:

  1. The expertise gap. Linux server hardening, package patching, firewall management, cron orchestration — every
    team needs them, only senior engineers do them confidently. Misconfiguration is consistently the #1 root cause of breaches at this tier.
  2. Tool fragmentation. Existing admin tools each cover one slice — state inspection, metrics, container logs, firewall management — and they don't talk to each other. Sysadmins context-switch across half a dozen tools to do a
    single job.
  3. AI agents that talk vs. agents that act. Most "AI assistants" describe what an admin should do. Almost none
    can do it safely on a real host with auditability and consent.

I wanted one console that addresses all three.

## What it does

MonitShark is a self-hosted web app you run with one Docker command. It gives you:

  • Live dashboard — CPU, memory, disk, network streaming over WebSockets, top processes, open alerts
  • System page — per-core CPU, per-disk I/O, per-NIC throughput, sensors (temps, fans, battery), kernel modules, listening ports with PID
  • Services — every systemd unit, start / stop / restart with confirmation
  • Docker — containers grouped by Compose project, live log streaming over WebSocket, lifecycle actions
  • Cron — per-user tabs + system crontab, full CRUD + run-now
  • Scripts — bash editor under /opt/cockpit/scripts/, run with timeout, install as systemd service, schedule via
    cron
  • Audit — 4 security audits (SSH, users, permissions, packages) with ~20 checks ranked by severity, one-click fix
  • Firewall — host firewall rules: add / delete / enable / disable with action, port, protocol, source filters,
    comments
  • Updates — distro-aware (apt/dnf), security-only updates separately
  • Permissions — file browser scoped to /etc /opt /var/log /home /root, chmod + chown
  • Logs — tail any file under /var/log, regex search, "Ask the agent to analyze" handoff to chat

And the differentiator — a chat agent (Groq + LangGraph) with 51 tools, 21 of them gated by an explicit confirmation card. The agent calls the same Python modules the REST API uses. When it wants to do something destructive, the
LangGraph state machine pauses on langgraph.types.interrupt(), surfaces a confirmation card to the React drawer, and resumes only after the user clicks Allow. Confirmation lives in the graph topology, not in a prompt the LLM could ignore.

## How we built it

  • Backend — Python 3.11, FastAPI, uvicorn, LangGraph 0.2, langchain-groq, psutil, pystemd, python-crontab,
    aiosqlite, PyJWT, passlib[bcrypt], docker SDK, distro
  • Frontend — React 18, Vite, TypeScript (strict), Tailwind CSS, shadcn-style primitives, TanStack Query, Recharts, axios, react-markdown, sonner
  • Reverse proxy / TLS — Caddy 2.8 with tls internal (local CA, self-signed)
  • Distribution — docker-compose, single host

The backend container runs --privileged --pid=host with /:/host:rw. Mutating commands use nsenter --target 1 to
execute in the host's namespaces; reads go through the bind-mount.

Safety invariants:

  • One subprocess gate (app/util/sh.run) — a CI test fails if any other module imports subprocess
  • Path allowlists for log files, scripts, file browser, cron paths
  • Pydantic + regex validation on every tool input before it reaches the host
  • 21 destructive tools all run through the confirmation gate
  • JWT auth (HS256, users.yml + bcrypt) on every REST + WebSocket endpoint

## Challenges we ran into

  • LLM tool-call format quirks. Some completions emit malformed function-call syntax. Worked around with retries and a graceful fallback to a friendly user-facing error.
  • Free-tier rate limits. 51 bound tools cost ~6-8k tokens per request. Switched default to
    llama-3.3-70b-versatile (higher TPM ceiling) and added a 2.5s throttle between outgoing requests.
  • WebSocket interrupt resumption. Getting the LangGraph interrupt() payload to surface as a React confirmation card and the user's response to resume the graph required handling astream(stream_mode="updates") and the special
    __interrupt__ chunk correctly.
  • Self-signed cert + Docker networking. Caddy on bridge couldn't reach a backend on host networking. Moved backend to bridge with expose: 8000.
  • Bcrypt $ escaping. Compose interprets $ as variable expansion, so admin password hashes need $$ doubling — documented loudly in .env.example.

## Accomplishments that we're proud of

  • The confirmation gate works end-to-end — agent proposes → user clicks → host changes. Verified live on Ubuntu
    24.04.
  • 51 tools, 11 management surfaces, 200 source files, all in one cohesive build with consistent design tokens (HSL Tailwind variables, light/dark themes, amber accent).
  • Genuine cross-distro support — apt/dnf detection, nsenter for host-namespace execution.
  • Real safety architecture, not security theatre — the single subprocess gate enforced by a unit test is the kind
    of invariant production codebases should have.
  • One-command bootstrap./start.sh generates a JWT secret, creates config/ from config.example/, and brings up the entire stack.

## What we learned

  • LangGraph's interrupt() is a remarkably elegant primitive for human-in-the-loop AI. Putting consent in the graph
    topology
    (not in a prompt or a tool wrapper) makes it un-bypassable.
  • Aggressive path allowlists + a single subprocess gate (with a CI test enforcing it) catches more issues than dozens
    of ad-hoc validations scattered across modules.
  • For a chat agent over an LLM with rate limits, what you bind to bind_tools() is a fixed token-cost overhead per request. Trimming tool docstrings is the cheapest win.
  • Self-hosted privileged tools shouldn't be hosted publicly even for demo — a recorded video is the right "live link" for this category.

## What's next for MonitShark

  • Comprehensive audit expansion — kernel sysctls, mount options, firewall posture, failed-login bursts, CIS
    benchmark mapping
  • Multi-host fleet management — run a MonitShark hub that coordinates agents across N machines
  • Open-ended audit mode — let the LLM plan checks dynamically rather than calling a fixed audit set
  • Code-splitting the frontend bundle — current 1.2 MB JS could be halved with route-level splitting
  • Custom dashboard layouts — drag-resize panels, saved views per user

Built With

Share this project:

Updates