PromptShield

About the project Inspiration

As large language model applications become more common, so do attacks against them. We were inspired by the growing problem of prompt injection, jailbreaks, and system prompt leakage, especially because many demos focus only on the model response and not on the safety layer that should protect the model before it answers. We wanted to build something that felt less like “just another chatbot” and more like real infrastructure for AI systems.

What it does

PromptShield is a real-time AI firewall for LLM applications. It sits between a user and a vulnerable model, inspects each prompt, detects suspicious behavior such as instruction override, role hijacking, policy bypass, and prompt extraction attempts, and then assigns a risk score. Based on that score, it makes a firewall-style decision: ALLOW, WARN, or BLOCK.

The interface also explains why a prompt was flagged by showing matched attack categories, reason codes, highlighted attack phrases, a confidence score, and a vulnerable-vs-protected comparison. That makes the project not only functional, but also easy to demonstrate and understand in a live setting.

How we built it

We built PromptShield with a FastAPI backend and a Streamlit frontend.

On the backend, we created a lightweight prompt analysis pipeline that combines:

rule-based pattern detection for known attack phrases an LLM-based security classifier with structured JSON output a scoring system that combines pattern severity, classifier risk score, and confidence a policy layer that issues an ALLOW / WARN / BLOCK decision

On the frontend, we designed a cybersecurity-themed dashboard that presents the analysis in a clear and dramatic way for demos. Users can launch preset attack scenarios, inspect prompts, and see the full security decision pipeline in real time.

Challenges we ran into

One of our biggest challenges was making the system feel credible while still keeping it lightweight and hackathon-friendly. We had to balance rule-based detection, which is fast and explainable, with LLM-based analysis, which is more flexible but also less predictable. Another challenge was getting structured outputs from the model reliably enough for a clean demo.

We also spent a lot of time improving the UI and the demo flow so the value of the project would be obvious in under a minute. We wanted the project to feel like a real enterprise AI security product, not just a prototype.

What we learned

We learned a lot about prompt injection as a real security problem, about designing AI systems with a safety layer in front of the model, and about how much product presentation matters in a hackathon. We also learned how to build a simple but effective end-to-end system that combines backend logic, frontend storytelling, and model-driven analysis into one polished demo.

What’s next for PromptShield

In the future, we’d like to improve classifier reliability, expand attack coverage, add audit logs and policy modes, and build a larger evaluation dataset to measure detection quality more systematically. We also see potential for PromptShield to evolve into a deployable protection layer for copilots, internal assistants, and other enterprise AI systems.

Built with

FastAPI, Streamlit, Python, OpenAI API, Uvicorn, Pydantic, Requests, Python-Dotenv

Built With

fastapi
openaiapi
pydantic
python
streamlit
uvicorn

Submitted to

HooHacks 2026

Created by

I improved the design of the UI webpage, offered suggested and implemented the changes. I also got the website of the project deployed and uploaded out api to pipy for every developer to use.

Carlos Fernandez
Barbaros Zorlu
Virginia Tech CS
ANH-anhAnh Truong
Mohammad Rahim