From Unrestricted AI Agent Access (before) to Policy Driven Tool Enforcement at Runtime (After)
MCPWarden Works in three steps: Intercept, Inspect, Guard. For Every. Single. AI Action.
How it works? Step 1: The agent's malicious prompt injection
How it works? Step 2: The popup shows up to get approval from the human to avoid taking any restricted actions.
How it works? Step 3: This is the Realtime Monitoring Dashboard we built for analyzing traffic
How it works? Step 4: This is the Landing Info Corner Page for MCPWarden. This gives the download link for the interceptor to try out.
Fully driven by a single policy.yaml file. No code changes required, this is pure policy and config driven.
Packaged and readily available to use as a VSCode Extension

MCPWarden: Building Trust into the AI Stack

Inspiration

The rapid adoption of AI agents across public sector infrastructure has outpaced the security frameworks designed to protect it. Our work was driven by a clear and urgent signal from the data: 1 in 8 breaches in public sector agentic deployments are already attributable to AI agent vulnerabilities, with agent-targeted attacks rising 340% in 2025 alone. Compounding this, 78% of deployed agents operate with excess permissions, creating an attack surface that makes agentic incidents 6.2 times more costly than traditional breaches for cities and public institutions. The root cause is well understood. Prompt injection: the manipulation of an AI agent through malicious instructions embedded in its inputs has emerged as the primary threat vector against civic AI infrastructure. We built this project to close this gap. We present to you MCPWarden that intercepts, inspects, and enforces policy on every AI initiated MCP tool call before it executes, ensuring that it acts as a secure firewall for the users of AI Agents.

What it does

MCPWarden serves as a secure firewall that sits between Agentic Applications/LLMs and MCP Servers to ensure every interaction is protected.

Intercepts and Inspects: Every AI prompt and tool selection is monitored in real-time.
Automated Risk Verdict: It provides a "Guard" layer that evaluates if a tool call is safe.
Real-time Monitoring: Features a central dashboard for visibility into agent activity.
Immutable Audit Log: Keeps a permanent record of all actions to ensure attribution and accountability.

How we built it

We focused on creating a non-intrusive but powerful security layer:

In-process Hook Interception: Built a mechanism to intercept standard REACT flows (Reasoning and Acting).
Config-based Policy Framework: Developed a flexible system for defining security rules.
VSCode Extension: Packaged the tool as a familiar extension to integrate directly into the agentic workflow.

Challenges we ran into

One of the primary challenges was securing the Repeated REACT flow, the cycle where an LLM chooses a tool, calls the MCP server, and processes the response until a task is achieved. We had to ensure our "Intercept, Inspect, Guard" loop occurred fast enough to prevent latency while remaining robust enough to catch sophisticated prompt injections that might try to exfiltrate system data.

Accomplishments that we're proud of:

Civic Impact: Addressed and provided a solution to one of the top emerging threat landscapes in AI Cybersecurity.
Successful Integration: Created a secure layer that seamlessly "guards" the connection between the LLM and the MCP server.
Developer Accessibility: Packaged and readily available to use as a VSCode Extension.
Actionable Insights: Developed a dashboard that provides risk-mitigation overview and clear success/failure metrics.

What we learned

We learned that the security of agentic systems is not just a technical hurdle but a societal one; without trust, AI cannot be fully integrated into public institutions. Our research highlighted that AI Agents without guardrails are the leading cause of high breach costs, reinforcing the need for a secure approach implemented through a dedicated firewall like MCPWarden (Credits: https://www.digitalapplied.com/blog/ai-agent-security-2026-1-in-8-breaches-agentic-systems).

What's next for MCPWarden

Attribution Improvement: Label and classify malicious actions by AI Agents for identifying top buckets and improve observability.
Real-time Mobile Alerting: Implementing instant push notifications whenever a threat is blocked.
Predictive Threat Modeling at Scale: Leveraging AI/LLM's to simulate attack scenarios and shift-left in identitying vulnerabilities to enable auto-patching

Built With

ai
gemma
github
html
llm
mcp
promptengineering
python
vscode

Updates

Shilpa Jha started this project — Apr 26, 2026 02:57 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.