Inspiration
Every LLM app that takes free-form user input is one paste away from a prompt-injection attack. I kept seeing teams reach for a heavy model-based guard, a paid API, or a regex they maintained themselves. Each of those is too much friction for a problem that wants a small, boring, dependency-free middle ground.
What it does
prompt-shield runs five focused pattern checks against any input string before it reaches your LLM. Each check is a pure function, returns structured findings with character spans, and feeds a single Shield facade that can either redact high-risk spans or hard-block the request.
The five rules:
- role_override: instruction-override and persona-flip language
- tool_call_inject: forged tool-call JSON or XML pasted into user input
- secret_extract: system-prompt and tool-list enumeration probes
- format_break: chat-template control tokens and closing role tags
- delimiter_smuggle: unicode bidi overrides and zero-width characters
Output is deterministic. You can snapshot-test it and run it in front of every request without changing latency or cost.
How I built it
Pure Python 3.10+. Zero runtime dependencies. The Shield facade composes rule modules from src/prompt_shield/rules/. Each rule scans the input, returns Finding objects, and the facade merges overlapping high-risk spans before redaction. RiskLevel is an IntEnum so callers can compare risk numerically. 79 tests pass including a corpus of 20 known injection strings drawn from public lists plus 7 benign controls plus per-rule edge cases.
Challenges I ran into
The hardest case was the disregard family of overrides. The pattern catches "ignore previous instructions" cleanly, but "disregard everything above" has no trailing anchor word like instructions or rules, so the first pass missed it. Fixed with a second pattern in role_override that matches the disregard-or-ignore-or-forget plus everything-or-all plus above-or-prior shape.
The bidi smuggling rule also had a unicode-versioning trap. Different Python builds normalize bidi controls inconsistently, so the rule operates on raw code points rather than normalized strings.
Accomplishments I'm proud of
Five rules, zero deps, 79 tests, deterministic output. The whole library is small enough to read in one sitting and adopt the same afternoon. That was the design constraint.
What I learned
Most prompt-injection defense literature is about model-based guards. The pattern-based corner is undersold because it covers only the obvious injections, but those obvious injections are exactly what shows up in production traffic. Boring catches a lot.
What's next for prompt-shield
A second rules pack for tool-poisoning attacks (where the agent's own tool output carries injected instructions). A small CLI that scans a transcript file and prints findings. A reference integration with the sibling library agentleash so the same risk signal can hard-stop a budget-constrained run.
Built With
- ai-safety
- anthropic
- bedrock
- guardrails
- llm
- openai
- prompt-injection
- pytest
- python
- security

Log in or sign up for Devpost to join the conversation.