PROMPT FORENSICS

Thumbnail
Landing
Banking scan
Email triage scan

Inspiration

Production LLM prompts get a code review from no one. Every system prompt embeds tool definitions, role policies, secrets, and untrusted-data envelopes — it's the most security-sensitive string in the codebase, and yet nobody lints it. I kept seeing the same pattern across hackathon projects, freelance gigs, and internal ops at the agencies I work with: "the bot got tricked" post-mortems where the actual bug was a prompt that should never have shipped. PROMPT FORENSICS is the static analyzer I wanted on day one of every LLM project I touched.

What it does

You give it a system prompt. It runs through 14 deterministic detectors across 8 vulnerability categories — instruction override, role hijack, delimiter injection, secret exposure, unsafe tool surface, PII handling, prompt leakage, indirect injection. Each finding has:

a span (where in the prompt it matched)
a severity (CRITICAL → INFO, weighted into a 0–100 risk score)
a rationale (why this matters in production)
a concrete remediation (what to ship Monday morning)

Then Claude Sonnet 4.6 reads the prompt and findings list and writes a forensic note explaining how two findings chain into a real attack path. The output is a security review a senior engineer could hand to a product team without rewriting it.

Six real-world prompt archetypes are pre-loaded as demo scenarios — banking support, HR assistant, customer chat, code-review bot, email triage, healthcare intake — each triggering 7–8 of the 8 categories.

How we built it

Two-layer architecture. Layer one is pure-function regex detection — no model, no I/O, replayable, auditable in a single file (src/lib/detectors.ts). Layer two is a Claude Sonnet 4.6 call with the system prompt cached via cache_control: ephemeral so re-runs cost ~10% of the first call. The six demo scenarios pre-compute their findings at module load and render as static HTML — every scan page is SSG, sub-second navigation, no API key needed to demo.

Stack: Next.js 16.2.6 App Router on Turbopack, React 19, Tailwind 4, Framer Motion for risk-meter and finding-card animations, Anthropic SDK 0.95 with Claude Sonnet 4.6. Hosted on Railway with a static prerender + dynamic route handler for the regenerate endpoint.

The UI is a forensic-console aesthetic: void-black background, signal-cyan accents, severity-coded prompt highlights (critical → crimson, high → ember, medium → amber, low → lime), a subtle scanline overlay, monospace-forward typography. Hovering a finding card highlights the matching span in the prompt body — and vice versa.

Challenges we ran into

GitHub Push Protection. Twice. The demo prompts embed example Stripe, GitHub, AWS, and Anthropic credentials. Even with DEMO000… placeholders, GitHub's secret scanner kept blocking the push — their entropy detector + prefix matching is more sensitive than my own detector regexes. The fix turned out to be elegant: assemble fake credentials from string fragments at runtime ("sk_" + "live_" + "DEMO…") so no single string literal in source contains a complete prefix. The detector regexes still match the assembled values at runtime — which is exactly the point of the demo.

Latency in live demos. Streaming AI responses look great in pitch videos but die when a judge re-runs the demo and waits 30 seconds for the first token. Solution: pre-compute all scenario results at module load, render statically, and only call Claude live for the optional "regenerate ↻" button. Every demo path is sub-second.

Click-only navigation. Constraint from day one: the demo had to be navigable purely by clicks — no form inputs in the demo path — so it could be auto-captured for the video walkthrough. That forced every interaction to have a card or button entrypoint and pushed me to design the scenario grid + scan pages as a clean state machine.

Accomplishments that we're proud of

Detectors fire across all 8 categories on every demo scenario. The reports are visually dense, not toy examples.
The deterministic core is ~250 lines of TypeScript. Anyone can audit the rules in 10 minutes. Security tooling that hides its logic behind a model loses trust; this one keeps the rules in the open.
Static prerendering of all 6 scan pages. Marginal cost per scan is near-zero. The whole site loads in under a second.
Shipped end-to-end (detection engine + UI + 6 scenarios + AI layer + Railway deploy + thumbnail + README) in a single afternoon while keeping visual quality at a level a security engineer would actually use in production.

What we learned

Indirect prompt injection is the unsolved problem of 2025–2026. It's not theoretical — multiple enterprise agents have eaten this exact attack in the past 18 months. Most existing tooling treats the prompt as a static asset; almost nobody treats it as an attack surface.
Anthropic's prompt cache (cache_control: ephemeral) is criminally underused. For tools that call with the same system prompt repeatedly, it cuts cost by ~85% on the second call. Free win.
Niche-theme angles beat AI-agent hype. I almost built a multi-agent debate demo instead — visually impressive, but engineers have seen 200 of those since 2024. Cybersecurity × AI is a less-crowded surface, and the "I need this at work on Monday" reflex from judges is what actually wins votes in the room.

What's next for PROMPT FORENSICS

GitHub App. Comments on every PR that touches a prompts/ directory. Same detectors, different surface.
npm package (@prompt-forensics/scan) — drop the engine into existing CI pipelines.
Upload mode — paste your own prompt, get a scan. Currently demo-only with pre-loaded scenarios for predictable judging.
Rule packs — PCI-DSS for fintech, HIPAA for healthcare, EU AI Act for compliance-first orgs. Selectable in the UI.
Diff mode — scan two versions of a prompt and surface what changed.
SARIF export — push findings into existing security dashboards (GitHub Advanced Security, Snyk).

The deterministic core is small and portable. Everything else is surface area.

Built With

anthropic
claude
framer-motion
nextjs
node.js
railway
react
regex
tailwindcss
turbopack
typescript

Submitted to

UOE Summer of Code

Created by

Built the entire project solo over one afternoon, detection engine, UI, six demo scenarios, AI forensic-note layer, deploy, and submission materials.

The interesting work was on the detection layer: a 14-rule regex engine across 8 vulnerability categories where every finding has a span, severity weight, rationale, and remediation, all in a single auditable file (src/lib/detectors.ts). The deterministic core is ~250 lines of TypeScript so anyone can audit the rules in 10 minutes, security tooling that hides its logic behind a model loses trust.

The biggest design call was making the demo 100% click-driven (no input forms in the demo path) so the video walkthrough could be auto-captured cleanly. That constraint forced a cleaner state-machine UI overall and ended up improving the product, not just the video.

Most time-consuming piece was the demo scenarios themselves: they had to look like real production prompts without being so realistic that GitHub's secret scanner would block the push (it did, twice, before I figured out to assemble fake credentials from string fragments at runtime).

Stack: Next.js 16 App Router + Tailwind 4 + Framer Motion + Anthropic SDK with Claude Sonnet 4.6 (cache_control: ephemeral) + Railway.

Roberto Llanos

Updates

Roberto Llanos started this project — May 18, 2026 12:23 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.