Check the project here


Inspiration

AgentShield was born out of frustration.

We all, Sam, Shirin, Khoa and Shoji, had experienced firsthand how AI agents, both coding and multitasking, were doing whatever they wanted. Deleting things we didn't ask them to delete. Sending things we didn't ask them to send. Making decisions we never approved. It felt like handing someone the keys to your house and watching them rearrange your furniture, except sometimes they also left the front door wide open.

When we started talking about it with friends in Indie Hacker groups, and then with other teams right here at this hackathon, we realized this problem was so much bigger than we initially thought. It is estimated that 1.5 million API keys have been leaked since the public release of OpenClaw. But we don't even have to go that far. Normal, everyday coding agents have hallucinated commands, bypassed security instructions, and handled sensitive information poorly.

Here are real cases we experienced ourselves:

  • Install OpenClaw, and it grabs API keys and dumps them into Telegram
  • Install Hermes, and it bypasses the security prompt entirely
  • Ask an agent to set up a Discord server, and it takes so long it ends up leaking the public keys
  • Give an agent a task, and it takes destructive decisions against your explicit instructions

Every single one of these happened to us or someone sitting next to us. And every time, the feeling was the same: helplessness. Your agent already acted. The damage is already done. There's no undo button.

Because of all this, we decided that there should be a fast, easy and lightweight way to secure agents and stop them from acting before it was too late. That's how AgentShield was born.

What It Does

AgentShield is a self-evolving security gateway that sits between your AI agent and the tools it calls. Every time an agent tries to execute an action - reading a file, calling an API, sending a message, running a command - AgentShield intercepts the request, scores its risk, and decides what happens next.

The Gateway (MCP Proxy)

qvg2b6v.md.png

The core of AgentShield. It works like this:

  1. Intercept - Every tool call from the agent passes through our proxy before reaching its destination.
  2. Score - Each request gets a risk score from 0 (safe) to 100 (dangerous), evaluated by a two-layer AI system:
    • Layer 1 (Local): Qwen 0.8B runs on-device via Ollama for sub-second first-pass evaluation. No data leaves your machine.
    • Layer 2 (Cloud): For ambiguous or complex requests, Qwen 3.2B via cloud provides deeper analysis with full context.
  3. Act - Based on the score:
    • Safe (0–30): Executes automatically. The agent never slows down.
    • Risky (31–70): Routed to a human for review via the dashboard. The agent waits.
    • Dangerous (71–100): Blocked instantly. The agent is stopped.
  4. Learn - TinyFish continuously scans for new vulnerability patterns and updates AgentShield's threat database. Codex auto-patches the system when new attack vectors are discovered. The firewall literally evolves while you sleep.

All incidents are approved, denied, or blocked - are logged with full context, timestamps, and risk breakdowns.

The Validation Dashboard

qvg2DMJ.md.png

A real-time control center where humans stay in the loop:

  • Live feed of all intercepted tool calls with color-coded risk indicators (green / amber / red)
  • One-click approve or deny for flagged requests - with plain-English explanations of what the agent is trying to do and why it was flagged
  • Full audit log with searchable history, timestamps, and risk score breakdowns
  • Code review panel - inspect the exact payload the agent is sending before you approve it
  • Anomaly notifications - the dashboard stays quiet when things are normal. You only hear from it when something matters.

How We Built It

AgentShield runs on a lean, hybrid architecture designed for speed and portability:

  • Gateway: Express.js API deployed on AWS Lambda via Docker, with API Gateway handling routing to the /intercept endpoint
  • Risk Engine: Two-layer evaluation - Ollama running Qwen 0.8B locally for fast first-pass scoring, Qwen 3.2B via cloud for deeper analysis on flagged requests
  • Self-Evolution: TinyFish runs cyclic vulnerability scans and feeds new patterns into the threat database; Codex monitors for weaknesses and auto-pushes patches
  • Dashboard: React + Tailwind CSS, connected to the gateway via WebSocket for real-time updates
  • Infrastructure: AWS Lambda + EC2, SQLite for audit logging, designed for minimal cold-start latency
  • Built with: Claude Code for accelerated development across the full stack

Challenges We Ran Into

Latency was the enemy. A security layer that slows your agent down defeats the purpose. We spent hours optimizing the two-layer scoring pipeline so that safe requests pass through in under 200ms - fast enough that the agent barely notices.

The risk scoring algorithm was harder than expected. Distinguishing "agent reads a config file" from "agent reads and exfiltrates a config file" requires real context awareness. We iterated through multiple prompt versions and tested against dozens of edge cases - nested tool calls, bulk operations, chained API requests - before landing on a scoring system we trusted.

Keeping the architecture light. We wanted AgentShield to be something you can drop into any agent framework in minutes, not a heavy enterprise platform. Every design decision was filtered through the question: "Does this make it harder to install?"

Making local and cloud models cooperate. Running parallel evaluation across Qwen 0.8B (local) and Qwen 3.2B (cloud) and merging their outputs into a single risk decision required careful orchestration - especially handling timeouts and fallbacks gracefully.

Accomplishments We're Proud Of

  • 41 out of 50 malicious actions blocked in testing - and all 50 were logged, giving full visibility even on the edge cases that slipped through
  • A functioning, self-evolving firewall built in 36 hours - not a mockup, not a demo script, a working security gateway with real interception and scoring
  • The self-evolution module works automatically - TinyFish finds new vulnerability patterns, Codex patches the system, no human intervention required
  • Hybrid local + cloud inference running in parallel - proving you can have both privacy (local scoring) and depth (cloud analysis) in the same pipeline
  • Sub-second response times for safe requests - security that doesn't kill your agent's speed

What We Learned

  • Agent security is not optional - it's infrastructure. Every team we talked to at LotusHacks had a story about an agent gone rogue. This isn't a niche problem; it's a gap in how agentic AI is being deployed.
  • TinyFish for cyclic processes - we learned how to use TinyFish to run continuous vulnerability detection loops that feed back into the system without manual triggers.
  • Codex for auto-patching - wiring Codex to monitor the codebase and push updates automatically was a breakthrough in keeping the system current without human bottlenecks.
  • Hybrid model orchestration - running local and cloud models in parallel, handling fallbacks, and merging their outputs taught us how to build resilient AI pipelines that degrade gracefully.
  • AWS Lambda for security-critical workloads - packaging a Docker-based proxy for Lambda and optimizing cold-start latency showed us how to run production-grade security infrastructure on a hackathon budget.

What's Next for AgentShield

  • Framework-agnostic packaging - make installation a one-liner for LangChain, CrewAI, AutoGen, and any MCP-compatible agent
  • Tunable risk thresholds - let teams configure their own tolerance levels per tool, per agent, per environment
  • Mobile notifications for approval flows - approve or deny flagged actions from your phone, not just the dashboard
  • Improved UX and navigation - polish the dashboard for non-technical security reviewers
  • Enterprise partnerships - integrate with existing security stacks (SIEM, SOAR) and partner with companies building agent infrastructure
  • Community-driven threat database - open-source the vulnerability patterns so the entire ecosystem benefits from collective defense

Built With

Share this project:

Updates

posted an update

Dream team assembled !! So excited to finally share a little behind-the-scenes of the team building AgentShield. We’ve got product, AI, backend, and dashboard work all moving in parallel, and the energy has been amazing from the start. Can’t wait to show you what we’ve been working on.

Log in or sign up for Devpost to join the conversation.