AgentWatchDog

Inspiration

We all use AI agents daily — Cursor, Claude, Copilot, OpenClaw. They write code, run shell commands, read files, and make HTTP requests on our behalf. But here's the uncomfortable truth: these agents have real-world power without a permission system.

The moment that crystallized this for us: recent viral reports of OpenClaw (the "big lobster") autonomously performing dangerous actions — deleting files, accessing credentials, running unchecked shell commands — with no human in the loop. The AI community was alarmed, but the response was mostly "just don't give it access." That's not a solution — that's giving up on the productivity gains.

Around the same time, a teammate watched Cursor autonomously run rm -rf on a build directory. It happened to be the right directory — but it could have been the wrong one. There was no confirmation, no audit trail, no way to scope what it was allowed to touch.

These weren't hypothetical risks. They were happening in real workflows, to real developers, right now:

Agents reading .env files and leaking API keys
Agents executing dangerous CLI commands (rm -rf, nc -e, curl | bash)
Agents looping and causing runaway actions with no kill switch
And zero standard permission systems for any of this

The OpenClaw incidents proved that as agents get more capable, the risk doesn't scale linearly — it scales explosively. We asked ourselves: "Networks have firewalls. Operating systems have permission models. Why don't AI agents?"

That's why we built AgentWatchDog.

What It Does

AgentWatchDog is a real-time permission system for AI agents. It sits between the agent and the operating system, intercepting every tool call before execution.

Three core capabilities:

Block by default — Dangerous actions (shell commands, secret file access, data exfiltration) are blocked before they execute. Not logged after the fact — prevented.
Scoped Access — When an agent needs to perform a risky action, it triggers an approval request. A human reviews it, grants access for a specific scope and time window, and the agent proceeds autonomously within that scope — no repeated prompts.
Full Visibility — Every action attempt, every block, every scoped access grant is logged with structured context: who (agent_id, user_id), what (tool, args), when (timestamp), result (allow/block), why (risk score, matched rule).

How We Built It

Architecture: Two Layers of Defense

Agent (OpenClaw / Cursor / CLI)
        │
        ▼
   Python SDK (@firewall.guard)
        │
        ▼
┌───────────────────────────┐
│  HTTP Firewall (:3001)    │  ← Layer 1: Tool-call interception
│  Policy Engine (7 rules)  │
│  Risk Engine (0-100 score)│
│  Anti-Hijack Gateway      │
│  Audit Store              │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│  eBPF Kernel Layer        │  ← Layer 2: Unbypassable safety net
│  sys_enter_openat hook    │
│  Sensitive file detection │
└───────────────────────────┘
        │
        ▼
   React Dashboard (:3000)

Layer 1 — HTTP Firewall (Rust + Tokio + Axum): Every tool call hits POST /v1/intercept before execution. A three-dimensional risk scoring engine evaluates: tool weight ($0$–$40$) + argument danger ($0$–$40$) + call frequency ($0$–$20$) = total risk ($0$–$100$). The policy engine matches against 7 configurable rules covering dangerous shells, data exfiltration, SQL injection, and credential access. P99 latency: < 2ms.

$$\text{RiskScore} = W_{\text{tool}} + D_{\text{args}} + F_{\text{burst}}$$

Where $W_{\text{tool}} \in [0, 40]$ is the tool category weight, $D_{\text{args}} \in [0, 40]$ is computed from 25+ dangerous argument patterns, and $F_{\text{burst}} \in [0, 20]$ penalizes abnormal call frequency within a 60-second sliding window.

Layer 2 — eBPF Kernel Monitor (Rust + Aya framework): Hooks into sys_enter_openat at the kernel level. Even if an agent bypasses the SDK entirely and directly calls open(), the kernel still catches it. This layer is unbypassable — no user-space code can skip it.

Anti-Hijack Gateway: Replay protection (nonce + timestamp window), emergency Kill-Switch (freeze all dangerous operations instantly), and Step-Up authentication for high-risk tools.

Tech Stack

Component	Technology	Why
Core Engine	Rust + Tokio + Axum	Memory-safe, zero-GC, P99 < 2ms
Kernel Layer	eBPF via Aya framework	Unbypassable, zero-intrusion
Shared Types	`#[repr(C)]` no_std structs	Zero-copy kernel↔userspace
Dashboard	React 18 + Vite 6 + Tailwind 4	Real-time WebSocket updates
Agent SDK	Python (stdlib only, zero deps)	3-line integration
Config	TOML	Human-readable, hot-reloadable

Challenges We Faced

eBPF stack limit (512 bytes): The Linux kernel limits eBPF program stack to 512 bytes. Our FileOpenEvent struct with a 256-byte filename field nearly blew this limit. We had to use PerCpuArray scratch buffers — a non-obvious pattern that took significant debugging.
Pipeline ordering: Our Anti-Hijack gateway initially ran before the policy engine. This meant explicit block rules (e.g., block-dangerous-shell) never fired — the gateway returned "Step-Up Required" first. We had to restructure the entire pipeline: replay check → policy engine → risk gate → audit.
Cross-compilation: eBPF programs compile to bpfel-unknown-none target with #![no_std]. The user-space daemon compiles to x86_64-unknown-linux-gnu. Shared types must work in both worlds — no heap, no String, no Vec, fixed-size [u8; N] arrays only. This constraint shaped our entire data model.
Balancing security with usability: Blocking everything is easy but unusable. The "Scoped Access" concept — where users pre-approve specific risky actions for a defined scope — was our answer to making security practical rather than annoying.
Learning from OpenClaw incidents: The widely reported cases of OpenClaw performing dangerous autonomous actions validated our threat model but also raised the bar — we needed to handle not just obvious attacks like rm -rf, but subtle ones like data exfiltration via curl to legitimate-looking URLs and SQL injection through agent tool calls.

What We Learned

Pre-execution beats post-mortem. Logging that an agent deleted your SSH keys is useless. Blocking the deletion before it happens is the only thing that matters.
Two layers > one layer. The HTTP firewall is the primary defense, but agents can bypass SDKs. eBPF at the kernel level is the unbypassable safety net. Neither is sufficient alone.
Deterministic rules > LLM-based detection. Our risk engine uses pattern matching and weighted scoring — no LLM in the loop. This gives us predictable latency (< 2ms vs 100–500ms for LLM guardrails) and auditable decisions.
Agents need permission systems, not just guardrails. LLM Guardrails filter what agents say. We intercept what agents do. These are fundamentally different problems.
The OpenClaw wake-up call is real. When a capable autonomous agent goes rogue in production, the blast radius is enormous. The community needs tooling now, not after the next incident.

What's Next

Scoped Access UI — Full approval workflow in the dashboard
Network-layer enforcement — iptables integration so all agent traffic must pass through the firewall
Persistent audit logs — SQLite backend so nothing is lost on restart
Kubernetes sidecar mode — Zero-trust agent isolation in cloud-native environments

Built With

amazon-web-services
axum
claudcode
ebpf
neo4j
p99-<-2ms-|-|-kernel-layer-|-ebpf-via-aya-framework-|-unbypassable
pulse
python
react
rust
tavily
tokio
zero-deps)-|-3-line-integration-|-|-config-|-toml-|-human-readable
zero-gc
zero-intrusion-|-|-shared-types-|-`#[repr(c)]`-no\-std-structs-|-zero-copy-kernel?userspace-|-|-dashboard-|-react-18-+-vite-6-+-tailwind-4-|-real-time-websocket-updates-|-|-agent-sdk-|-python-(stdlib-only

Updates

Qingqing Kang started this project — Feb 27, 2026 07:17 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.