LLMshield AI: Stop Economic DDoS Attacks Before They Drain Your Budget

Your LLM API just got attacked. Random gibberish flooded your endpoint. Your token bill: $10,000 for garbage.

Traditional rate limiting won't save you. Attackers are smarter. They exploit token-based pricing to turn your API into a cash extraction machine.

LLMshield AI stops them cold.

🛡️ The Problem

LLM APIs are gold mines for attackers. They can:

Flood your API with random characters (token stuffing)
Hijack your system prompts (role hijacking)
Override your instructions (prompt injection)
Turn $100 monthly bills into $10,000 overnight

Rate limiting? It blocks legitimate users. Firewalls? Don't understand context. You need dynamic security that knows the difference between malicious attacks and legitimate high-entropy content (like code or technical queries).

💡 The Solution

LLMshield AI is a production-ready security proxy that sits between your clients and LLM APIs. It doesn't just block attacks—it outsmarts them with 7 layers of defense and intelligent compression that saves you money.

Seven security layers. Zero false positives. Full observability.

🎯 What It Does

Core Protection

1. Identity Fingerprinting - Track users by X-User-ID + IP. Spot suspicious patterns before they become attacks.

2. Regex Threat Detection - Surgical pattern matching catches role hijacking and instruction override attempts instantly.

3. Entropy Analysis - Shannon entropy calculation detects randomness. Clean text passes. Random gibberish gets blocked. Simple.

4. LLM-as-Judge - Suspicious but not clearly malicious? Let Gemini evaluate it. Gray-zone protection with intelligence.

5. Adaptive Compression - Smart compression using TheTokenCompany's bear-1 model. Compresses user input. Preserves system prompts. Saves tokens. Reduces costs.

6. Penalty Box - Bad actors get flagged. Higher compression for repeat offenders. Time-bound (1-hour TTL). Automatic.

7. Full Observability - Every request traced. Every security decision logged. FinOps metrics in Phoenix. Know exactly what happened and why.

Interactive Demo UI

Built a Streamlit interface that makes security visible:

Real-time chat with live security metrics
Pre-configured attack scenarios (try them yourself)
Session statistics and token savings tracking
Beautiful visualizations of threat levels and entropy scores

See the attacks. See the blocks. See the savings.

🔧 How I Built It

Architecture

Backend (FastAPI)

Modular service architecture: detector, evaluator, sieve, penalty
OpenTelemetry instrumentation on every security layer
Production-ready error handling with fallbacks

Frontend (Streamlit)

Interactive demo UI
Real-time metrics visualization
One-click attack scenarios

Observability (Arize Phoenix)

Distributed tracing for every request
FinOps metrics (token savings, compression ratios)
Security event tagging

The Security Flow

Request hits FastAPI endpoint
Extract user fingerprint (identity tracking)
Run regex threat detection (pattern matching)
Calculate Shannon entropy (randomness detection)
Classify: CLEAN (≤5.5) → compress, SUSPICIOUS (5.5-6.5) → LLM judge, HIGH (>6.5) → BLOCK
Check penalty box (adaptive compression levels)
Compress user input only (system prompts pinned)
Forward to LLM API
Return response + security metrics
Trace everything to Phoenix

Tech Stack

FastAPI - Fast, modern Python framework
Google Gemini API - LLM provider + evaluator
TheTokenCompany bear-1 - Semantic compression
Streamlit - Interactive demo UI
Arize Phoenix + OpenTelemetry - Distributed tracing
Python 3.8+ - Core language

🚧 Challenges & Solutions

Challenge 1: False Positives

Problem: Block legitimate high-entropy content (code snippets, technical terms)

Solution: Three-tier system + LLM-as-judge for gray zone. Clever thresholds. Extensive testing.

Challenge 2: System Prompt Security

Problem: Compression can't touch system prompts (they contain security guardrails)

Solution: Message separation + system prompt pinning + strict delimiters. User content only.

Challenge 3: Observability

Problem: OpenTelemetry spans need proper naming for Phoenix visualization

Solution: Careful span attribute conventions + custom spans for each security layer + FinOps metrics

🏆 What I'm Proud Of

1. Seven-Layer Defense - Identity fingerprinting, regex detection, entropy analysis, LLM-as-judge, adaptive compression, penalty box, and distributed tracing. All working together hand-in-hand.

2. Intelligent Compression - System prompt pinning means security never gets compromised. Adaptive aggressiveness (0.5 default, 0.8 penalty) means bad actors pay more.

3. Zero False Positives - Three-tier classification with LLM-as-judge fallback means legitimate users never get blocked. Malicious attacks always get caught.

4. Production-Ready - Modular architecture. Comprehensive error handling. Extensive documentation. Ready to deploy.

5. Beautiful Demo - Interactive UI that makes security visible. Try attacks. See blocks. Understand what's happening.

6. Full Observability - Every decision traced. Every metric logged. Phoenix dashboard shows everything.

📚 Key Learnings

Entropy is powerful - Shannon entropy detects randomness brilliantly. But you need careful thresholds and fallback evaluation.

LLM-as-Judge works - Using an LLM to evaluate suspicious prompts is effective. Adds latency but catches edge cases.

System prompts are sacred - Never compress them. Always pin them. Security depends on it.

Observability is essential - Phoenix + OpenTelemetry gives incredible visibility. Worth the integration effort.

Adaptive beats static - One-size-fits-all security doesn't work. Penalty boxes let you be strict with bad actors and lenient with legitimate users.