Inspiration
The first thing we did at the start of the hackathon was sketch out what's different about AI security in 2026. What we noticed is that almost every tool in the space today is shifted right: it sits in front of a deployed model at runtime, watching prompts go in and responses come out. Useful, but late. By the time a request hits a runtime guardrail, the dangerous function is already written, merged, and running in production.
Very few tools are shifted left, running during code review and flagging the risky pattern before it ever ships. That's the gap that stood out. LLMs don't just generate text anymore; they call functions that run shell commands, edit files, and query databases. A model getting tricked used to mean a bad chatbot reply. Now it can mean a wiped production database, and the place to catch that is in the pull request, not in a runtime filter after the fact.
What it does
PromptShield is a security scanner that reviews pull requests for AI agent vulnerabilities. You connect it as a GitHub App or paste code directly, and it runs three layers in parallel: regex-based static rules, AST dataflow taint tracking, and a Gemini 2.5 semantic auditor. Every finding comes back with a CWE number, an OWASP LLM Top 10 (2025) category, an evidence snippet, and a one-sentence fix.
The three things it's actually looking for are prompt injection (direct, indirect, and paraphrased-via-vector-search), dangerous agent tool surfaces (@tool functions that can run shell or SQL with unvalidated LLM-controlled arguments), and unsafe LLM output handling (a model's reply piped into exec, a shell, a raw SQL query, or innerHTML). On top of that, it posts inline PR comments, creates a Check Run that fails above a risk threshold, and writes everything into MongoDB Atlas so you can search and trend agent-tool exposure across repos. The dashboard on top of Atlas shows a live feed of scans, a hybrid search bar across findings, and a 7-day risk timeline so you can see whether a repo is getting better or worse.
How we built it
FastAPI + Python on the backend, React + Vite + Tailwind on the frontend. AI is Google Gemini 2.5 Flash via Vertex AI. MongoDB Atlas is the database, with mongomock as a fallback for local dev.
Atlas actually does real work for us here, not just storage. $vectorSearch on a prompt corpus and a curated exploit corpus, $rankFusion for hybrid vector + lexical search on the tool registry, a time-series collection with $setWindowFields for rolling averages of attack-surface growth, a Database Trigger that fans out critical findings into an alerts collection via change streams, and a second trigger that redacts secrets server-side on insert. Voyage AI (voyage-3-large) powers the embeddings.
Challenges we ran into
We faced some debugging issues with Atlas Triggers. The docs tell you to call context.services.get("mongodb-atlas"). That works if your data source is literally named that. Ours was named Cluster0, so everything silently failed with Cannot access member 'db' of undefined, which doesn't obviously point at a config problem. We also hit a smaller bug where $rankFusion returns a nested object on real Atlas but a plain number on mongomock, which meant NaN in the UI until we normalized it.
The GitHub App side was its own mess: webhooks from GitHub don't reach localhost, so we wrote helper scripts around ngrok and smee-client (github_webhook_url.sh, ngrok_http_8000.sh) and spent real time chasing why deliveries were 2xx but nothing was posting, which turned out to be the private-key path.
Accomplishments that we're proud of
Mostly that the full chain works end-to-end. A pull request that adds something like @tool def run_shell(cmd): subprocess.run(cmd, shell=True) comes back as a critical TOOL_PARAM_TO_SHELL finding — CWE-78, OWASP LLM06, a dataflow trace, and a Gemini-generated fix, all on the PR itself. Behind it, a single scan lights up five Atlas surfaces at once: scan in, change-stream trigger writes an alert, tool registry updates, time-series point lands, and the vector embedding refreshes for hybrid search.
The numbers held up too. Our three-layer benchmark runs 151 labeled prompts through static rules, a TF-IDF classifier, and the live API. The ML layer hit 0.972 F1 under cross-validation, and on held-out edge cases the full API got 100% recall where a Gemini-only baseline was at 50%. Redaction is also defense-in-depth: Python strips secrets before persistence, an Atlas trigger strips them again on insert, and both write to audit_logs with a _written_by field so we can prove which layer caught what.
What we learned
We got hands-on with a lot of tooling we hadn't touched before. Google Gemini via Vertex AI was new; getting strict JSON output across eight specific finding types took a few iterations to feel solid. Also MongoDB Atlas turned out to be a lot more than a document store — triggers, change streams, time-series collections, $vectorSearch, and $rankFusion all live behind the same connection string, and once we committed to the platform a lot of logic we'd normally write in Python moved to the DB layer.
On the process side, two things helped the most. Mongomock fallbacks let the whole pipeline run in-process with no network, so iteration stayed fast and the test suite stayed under 30 seconds — critical when you're making a change every few minutes. And leaning on a labeled benchmark from day one (151 prompts, held-out edge cases, F1 instead of accuracy) kept us from shipping changes that felt like improvements but weren't.
What's next for PromptShield
- Inter-procedural dataflow so taint follows across function boundaries and class fields instead of giving up at the function edge, plus a real JavaScript/TypeScript tracker (JS is regex-only today).
- Treat
.cursor/rules/*,CLAUDE.md, andAGENTS.mdas first-class indirect-injection surfaces. - Expand the agent-exploit corpus past the current seed set by ingesting real CVEs and LangChain/LlamaIndex issue threads.
- Generate the actual code fix from the same Gemini call that found the issue, not just a one-line remediation.

Log in or sign up for Devpost to join the conversation.