Inspiration
As AI agents become increasingly autonomous, they can browse the web, access databases, retrieve documents, and execute tools without human supervision. While powerful, these capabilities expose agents to new security threats such as prompt injection, jailbreaks, tool abuse, data exfiltration, and RAG poisoning. Existing solutions often focus only on input filtering and fail to secure the complete agent lifecycle. We built Prompt Shield to provide a dedicated security layer for the agentic future.
What it does
Prompt Shield is an open-source AI Agent Security Firewall that detects, blocks, and mitigates prompt injections, jailbreak attacks, system prompt extraction, data exfiltration, tool abuse, PII exposure, and other AI security threats. It combines 29 security detectors, 6 output security scanners, a self-learning threat vault, and a 3-Gate AgentGuard architecture to secure AI systems from user input to final output.
How we built it
We developed Prompt Shield using Python and FastAPI, with Swagger/OpenAPI for developer-friendly integrations. Semantic threat detection is powered by DeBERTa-v3, while ChromaDB stores attack embeddings for future similarity-based detection. We implemented a novel Smith-Waterman sequence alignment engine to identify paraphrased prompt injection attacks and designed a modular architecture that integrates with OpenAI, Anthropic, LangChain, CrewAI, MCP, and enterprise AI workflows.
Challenges we ran into
The biggest challenge was achieving high detection accuracy without generating false positives that could block legitimate users. Detecting indirect prompt injections hidden inside retrieved documents and tool outputs was another difficult problem. We also had to design a scalable architecture capable of protecting multiple AI frameworks while maintaining low latency and high throughput.
Accomplishments that we're proud of
- Built a working open-source AI security framework.
- Developed a 3-Gate AgentGuard architecture for end-to-end protection.
- Implemented Smith-Waterman sequence alignment for advanced attack detection.
- Created a self-learning threat vault that improves detection over time.
- Achieved 92.3% detection rate with a 96.0% F1 score.
- Integrated support for OpenAI, Anthropic, LangChain, CrewAI, MCP, and FastAPI.
- Added compliance reporting aligned with OWASP LLM Top 10, OWASP Agentic Top 10, and EU AI Act requirements.
What we learned
This project deepened our understanding of AI security, agentic systems, prompt injection defense, semantic threat detection, vector databases, and enterprise AI compliance. We learned that securing AI agents requires continuous monitoring and protection across the entire workflow rather than relying solely on input filtering.
What's next for Prompt Shield
Our roadmap includes building an enterprise-grade AI Agent Firewall, expanding the shared threat intelligence network, adding multimodal security scanning for text, images, and audio, strengthening runtime agent monitoring, and introducing advanced threat analytics. Our long-term vision is to establish Prompt Shield as the standard security layer for trustworthy and secure autonomous AI systems.
Built With
- ai-security
- anthropic-api
- chromadb
- crewai
- css
- deberta-v3
- docker
- docker-compose
- fastapi
- github-actions
- helm
- html
- injection
- kubernetes
- langchain
- mcp
- openai-api
- owasp-llm-security
- pii-detection
- prompt
- python
- rest-api
- semantic-embeddings
- smith-waterman-algorithm
- swagger/openapi
- vector-embeddings
Log in or sign up for Devpost to join the conversation.