Prompt Shield

Prompt Injection Attack → BLOCKED
Safe User Query → ALLOWED
Email / PII Detection → REDACTED
Swagger API Demonstration

Inspiration

As AI agents become increasingly autonomous, they can browse the web, access databases, retrieve documents, and execute tools without human supervision. While powerful, these capabilities expose agents to new security threats such as prompt injection, jailbreaks, tool abuse, data exfiltration, and RAG poisoning. Existing solutions often focus only on input filtering and fail to secure the complete agent lifecycle. We built Prompt Shield to provide a dedicated security layer for the agentic future.

What it does

Prompt Shield is an open-source AI Agent Security Firewall that detects, blocks, and mitigates prompt injections, jailbreak attacks, system prompt extraction, data exfiltration, tool abuse, PII exposure, and other AI security threats. It combines 29 security detectors, 6 output security scanners, a self-learning threat vault, and a 3-Gate AgentGuard architecture to secure AI systems from user input to final output.

How we built it

We developed Prompt Shield using Python and FastAPI, with Swagger/OpenAPI for developer-friendly integrations. Semantic threat detection is powered by DeBERTa-v3, while ChromaDB stores attack embeddings for future similarity-based detection. We implemented a novel Smith-Waterman sequence alignment engine to identify paraphrased prompt injection attacks and designed a modular architecture that integrates with OpenAI, Anthropic, LangChain, CrewAI, MCP, and enterprise AI workflows.

Challenges we ran into

The biggest challenge was achieving high detection accuracy without generating false positives that could block legitimate users. Detecting indirect prompt injections hidden inside retrieved documents and tool outputs was another difficult problem. We also had to design a scalable architecture capable of protecting multiple AI frameworks while maintaining low latency and high throughput.

Accomplishments that we're proud of

Built a working open-source AI security framework.
Developed a 3-Gate AgentGuard architecture for end-to-end protection.
Implemented Smith-Waterman sequence alignment for advanced attack detection.
Created a self-learning threat vault that improves detection over time.
Achieved 92.3% detection rate with a 96.0% F1 score.
Integrated support for OpenAI, Anthropic, LangChain, CrewAI, MCP, and FastAPI.
Added compliance reporting aligned with OWASP LLM Top 10, OWASP Agentic Top 10, and EU AI Act requirements.

What we learned

This project deepened our understanding of AI security, agentic systems, prompt injection defense, semantic threat detection, vector databases, and enterprise AI compliance. We learned that securing AI agents requires continuous monitoring and protection across the entire workflow rather than relying solely on input filtering.

What's next for Prompt Shield

Our roadmap includes building an enterprise-grade AI Agent Firewall, expanding the shared threat intelligence network, adding multimodal security scanning for text, images, and audio, strengthening runtime agent monitoring, and introducing advanced threat analytics. Our long-term vision is to establish Prompt Shield as the standard security layer for trustworthy and secure autonomous AI systems.

Built With

ai-security
anthropic-api
chromadb
crewai
css
deberta-v3
docker
docker-compose
fastapi
github-actions
helm
html
injection
kubernetes
langchain
mcp
openai-api
owasp-llm-security
pii-detection
prompt
python
rest-api
semantic-embeddings
smith-waterman-algorithm
swagger/openapi
vector-embeddings

Updates

Kishan Nishad started this project — Jun 17, 2026 01:42 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.