## Inspiration

AI agent frameworks like LangChain, CrewAI, and OpenAI function calling are exploding in adoption — but nobody is securing the boundary between agents and their tools. Real attack vectors exist today:

  • A web scrape returns hidden text that hijacks the agent's behavior
  • An agent gets tricked into accessing 169.254.169.254 — the AWS metadata endpoint that leaks IAM credentials
  • A research-only agent gets manipulated into executing shell commands

We kept seeing these vulnerabilities discussed in security research but no practical, framework-agnostic tool existed to defend against them. So we built one.

## What it does

ClawGuard is a real-time security gateway that sits between AI agents and their tools. Instead of calling a tool directly, agents route requests through ClawGuard's proxy, which runs three security layers before the response reaches the agent:

  1. Policy Enforcement — YAML-defined per-agent tool permissions with glob patterns and CIDR-based network deny lists that block SSRF before the request ever leaves
  2. Prompt Injection Detection — 10 weighted regex patterns with cumulative scoring, plus an optional Groq LLM classifier for ambiguous cases. Includes Unicode zero-width character evasion detection and NFKC normalization
  3. Real-Time Audit Trail — Every interaction is logged to SQLite, risk-scored, and streamed to a live dashboard via WebSocket

It's completely framework-agnostic. Any AI agent that makes HTTP calls can route through it with a single line change.

## How we built it

Backend: Python with FastAPI, fully async. The proxy pipeline flows through policy check → httpx forwarding → response scanning → return/block. Detection uses a weighted scoring system where each regex pattern has a severity weight (0.3–0.95), and a cumulative score determines the risk level. For medium-risk ambiguous cases, an optional Groq LLM classifier (llama-3.3-70b) provides semantic analysis with a 3-second timeout.

Frontend: Next.js with TypeScript and Tailwind CSS. The dashboard connects via WebSocket for real-time event streaming — you can watch attacks get blocked as they happen. Recharts for threat timeline visualization.

SDK: A pip-installable Python package (clawguard) with an async client, a @protect decorator that scans function return values, and ready-made wrappers for LangChain and CrewAI.

Testing: 99 tests total (87 backend + 12 SDK) covering pattern detection, weighted scoring thresholds, policy evaluation, end-to-end proxy integration, and all three demo attack scenarios.

## Challenges we faced Weighted scoring design — Binary match/no-match wasn't good enough. A single base64 string shouldn't block a response, but base64 combined with "ignore previous instructions" should. We landed on cumulative weighted scoring with per-pattern deduplication — each pattern contributes its weight once, avoiding inflation from repeated matches.

Unicode evasion — Attackers can insert zero-width characters between words to bypass regex. "ignore" looks like "ignore" to humans but not to pattern matchers. We solved this with NFKC normalization and dual-pass scanning — patterns run against both original text (to detect the evasion itself) and normalized text (to catch the hidden payload).

Async DNS — The initial policy engine used blocking socket.gethostbyname() for SSRF checks, which stalled the entire event loop under load. Replaced with asyncio.getaddrinfo() to keep the proxy non-blocking.

LLM response parsing — Groq's LLM sometimes wraps JSON in markdown fences, adds preamble text, or returns out-of-range confidence values. We built a robust extraction pipeline with regex-based JSON finding, required key validation, and confidence clamping.

## What we learned

  • The agent-to-tool boundary is a genuinely underserved attack surface with practical exploits
  • Weighted cumulative scoring is significantly more useful than binary detection for security classification
  • Fail-closed security requires thinking about every edge case — LLM timeout, DNS failure, malformed policy files — each needs an explicit decision about whether to block or pass

## What's next

  • Multi-agent dashboard with filtering and per-agent analytics
  • Webhook/Slack alerting on critical threats
  • Policy editor UI with hot-reload
  • PyPI publication for the SDK

Built With

Share this project:

Updates