Inspiration
AI agents are getting handed the keys to production. File access, databases, code execution. Companies are deploying them faster than the tools to protect them. One prompt injection, one poisoned document, and a helpful agent becomes a destructive one. Existing tools either flag threats too late or just log them after the damage is done. We built Sentinel to actually stop things from happening.
What It Does Sentinel sits in front of every tool call an AI agent makes, scores the risk, and either blocks the action or escalates to a human. The escalation channel matches the urgency:
Low : logged silently Medium : posts to a Stream chat incident channel with approve / deny / escalate buttons so engineers can triage async High : surfaces on the dashboard with a voice copilot. The on-call engineer asks "what's this?" and says "block it" through VoiceOS Critical : auto-spawns a Tencent TRTC video war room so the team can respond together, with Sentinel reading the incident brief aloud We treat agent failures as infrastructure incidents. Same framing Cisco uses for AI Factory ops, applied to the new workload AI factories increasingly run: agents themselves.
How We Built It Layered detection. Regex heuristics short-circuit obvious threats in milliseconds. Claude Haiku handles the ambiguous band where rules alone are not enough. Results are cached by call fingerprint so the hot path stays fast.
Honest evals. A 36 example labeled set runs on every server boot and is shown live in the dashboard footer. Current numbers: 100% precision, 95% recall, F1 0.97.
MCP first. Sentinel exposes its operational surface as a Python MCP server. VoiceOS picks it up via stdio and routes voice commands ("what's the latest critical?", "release the first one") directly to our tools. Sentinel speaks the same protocol it defends.
Stack: FastAPI backend with in memory event bus and SSE streaming. Next.js 16 + Tailwind 4 frontend with a live event table, color coded severity tiles, and an embedded Stream chat sidebar. TRTC web SDK powers the war room with browser TTS reading the brief and Web Speech API capturing voice decisions.
Challenges We Ran Into Sponsors moved underneath us as we discovered what they actually shipped.
VoiceOS turned out to be a desktop action agent, not telephony. We redesigned the high tier escalation around voice triage of our MCP tools instead of phone calls. The integration ended up cleaner.
Tencent did not offer general compute, so we dropped the cloud deploy and focused their integration on TRTC video for the critical tier war room. Made the demo more cinematic.
Stream Chat React v14 changed where the custom message prop lives. Tencent's UserSig algorithm uses a non standard base64 alphabet that their docs glance over. Each fix was small but each one would have killed the demo if missed.
Accomplishments We're Proud Of Four sponsor integrations working end to end, each doing a distinct job. A layered classifier that beats single prompt approaches on both speed and accuracy. A demo flow where you click one button and watch an agent attack get auto blocked, a video war room open, a brief read aloud, and the dashboard plus Stream channel update in real time.
What We Learned The right pitch for agent security is not "another guardrails library." It is "PagerDuty for agents." Engineers know how to think about incident response. They do not know how to think about prompt injection. Meet them where they are.
What's Next Real MCP proxy deployment in front of production agent fleets, not just schema compatible interception Per customer fine tuned classifiers trained on their own agent traces Fleet wide firewall rule generation from observed attacks Role based decision authorization and audit logs Every company deploying agents in 2026 will need this layer. We are building it.
Log in or sign up for Devpost to join the conversation.