Epistemic Sec: DAST for AI Agents

Inspiration

We kept seeing AI agents ship with hardcoded credentials in tool descriptions and no way to test if Token Vault actually stops an attacker. Static analysis tells you something looks risky but never proves it. We wanted a tool that attacks your agent with real payloads and shows you exactly what breaks and exactly what Token Vault contains. The question was simple: If someone prompt-injects your customer success agent right now, what actually happens?

What it does

Epistemic Sec is a DAST engine for AI agents. It:

  • Imports your LangChain or LangGraph agent
  • Extracts every tool schema
  • Builds a data flow graph
  • Runs 7 vulnerability detectors
  • Fires deterministic attack payloads through a sandbox
  • Scores the result on 7 security dimensions

The before/after comparison proves that Token Vault transforms a failing agent (0.13 with 11 critical vulnerabilities and 3 hardcoded credentials) into a passing one (0.66 with zero criticals and 14 mitigated findings). The dashboard lets you connect real OAuth accounts via Token Vault, run agent actions with scoped tokens, test policy enforcement, and upload scan results to visualize your security posture with a 7D radar chart.

How we built it

The DAST engine is pure Python with zero LLM dependency. An in-process bridge imports agents directly and extracts tool manifests from live Pydantic schemas. Seven detectors use word-boundary regex matching to find SQL injection, credential exposure, SSRF, path traversal, command injection, unauthorized actions, and missing audit trails. A sandbox gates execution by state impact so readonly tools run for real while mutating tools get mocked. The 7D scorer uses a multiplicative formula where a zero in any critical dimension means zero total. The dashboard is Next.js 16 with Auth0 Token Vault handling all OAuth token storage and exchange via RFC 8693. FGA provides relationship-based authorization checks before Token Vault issues scoped credentials. CIBA gates sensitive actions behind human approval via push notifications.

Challenges we ran into

  • False Positives: Our command injection detector kept flagging the secure agent's database tool as critical. Turned out substring matching was finding "script" inside "description" from the JSON schema dump. We rewrote all detectors to use word-boundary regex and split keywords into strong signals that trigger alone versus weak signals that need contextual evidence.
  • Scoring Accuracy: Another challenge was making the D1 scorer reflect Token Vault mitigation properly. Both the vulnerable and secure agents were scoring 0.00 on exploit resistance because the scorer penalized all findings equally regardless of mitigation status. We separated unmitigated findings from mitigated ones with different penalty weights so the score actually reflects blast radius containment.

Accomplishments that we're proud of

  • Every single number in our blog post is verified against the actual scanner output. 23 claims, 23 confirmed by running the code.
  • The engine runs with zero LLM calls so every scan is deterministic and reproducible. The same command gives the same score every time which makes it usable as a CI/CD merge gate.
  • We also proved that Token Vault works at the infrastructure level by catching actual enforcement exceptions in the sandbox rather than just checking configuration.

What we learned

Token Vault changes the security model fundamentally. If the agent never holds a credential then prompt injection cannot extract one. That sounds obvious but most agent frameworks still pass raw API keys through tool descriptions. We also learned that security testing tools need to test themselves. Our false positive from "description contains script" would have silently corrupted every scan if we had not investigated why a database tool was triggering command injection. The debugging story became one of the most authentic parts of our blog post.

What's next for Epistemic Sec

  • Dynamic execution coverage for all 7 detectors, not just the top attack paths.
  • PyPI publishing so teams can pip install and run in CI with one command.
  • A VS Code extension that scans on save.
  • Expanding beyond LangChain to support CrewAI, AutoGen, and raw OpenAI function calling agents.

The full technical deep dive is on Medium here, and the live dashboard is available at https://dashboard-nine-rho-imflqgwqjy.vercel.app.

Built With

  • auth0-ciba
  • auth0-fga-(openfga)
  • auth0-token-vault
  • langchain
  • next.js-16
  • pydantic
  • python
  • react-19
  • recharts
  • tailwind-css-4
  • typescript
  • vercel
Share this project:

Updates