LLMshield AI: Stop Economic DDoS Attacks Before They Drain Your Budget
Your LLM API just got attacked. Random gibberish flooded your endpoint. Your token bill: $10,000 for garbage.
Traditional rate limiting won't save you. Attackers are smarter. They exploit token-based pricing to turn your API into a cash extraction machine.
LLMshield AI stops them cold.
๐ก๏ธ The Problem
LLM APIs are gold mines for attackers. They can:
- Flood your API with random characters (token stuffing)
- Hijack your system prompts (role hijacking)
- Override your instructions (prompt injection)
- Turn $100 monthly bills into $10,000 overnight
Rate limiting? It blocks legitimate users. Firewalls? Don't understand context. You need dynamic security that knows the difference between malicious attacks and legitimate high-entropy content (like code or technical queries).
๐ก The Solution
LLMshield AI is a production-ready security proxy that sits between your clients and LLM APIs. It doesn't just block attacksโit outsmarts them with 7 layers of defense and intelligent compression that saves you money.
Seven security layers. Zero false positives. Full observability.
๐ฏ What It Does
Core Protection
1. Identity Fingerprinting - Track users by X-User-ID + IP. Spot suspicious patterns before they become attacks.
2. Regex Threat Detection - Surgical pattern matching catches role hijacking and instruction override attempts instantly.
3. Entropy Analysis - Shannon entropy calculation detects randomness. Clean text passes. Random gibberish gets blocked. Simple.
4. LLM-as-Judge - Suspicious but not clearly malicious? Let Gemini evaluate it. Gray-zone protection with intelligence.
5. Adaptive Compression - Smart compression using TheTokenCompany's bear-1 model. Compresses user input. Preserves system prompts. Saves tokens. Reduces costs.
6. Penalty Box - Bad actors get flagged. Higher compression for repeat offenders. Time-bound (1-hour TTL). Automatic.
7. Full Observability - Every request traced. Every security decision logged. FinOps metrics in Phoenix. Know exactly what happened and why.
Interactive Demo UI
Built a Streamlit interface that makes security visible:
- Real-time chat with live security metrics
- Pre-configured attack scenarios (try them yourself)
- Session statistics and token savings tracking
- Beautiful visualizations of threat levels and entropy scores
See the attacks. See the blocks. See the savings.
๐ง How I Built It
Architecture
Backend (FastAPI)
- Modular service architecture: detector, evaluator, sieve, penalty
- OpenTelemetry instrumentation on every security layer
- Production-ready error handling with fallbacks
Frontend (Streamlit)
- Interactive demo UI
- Real-time metrics visualization
- One-click attack scenarios
Observability (Arize Phoenix)
- Distributed tracing for every request
- FinOps metrics (token savings, compression ratios)
- Security event tagging
The Security Flow
- Request hits FastAPI endpoint
- Extract user fingerprint (identity tracking)
- Run regex threat detection (pattern matching)
- Calculate Shannon entropy (randomness detection)
- Classify: CLEAN (โค5.5) โ compress, SUSPICIOUS (5.5-6.5) โ LLM judge, HIGH (>6.5) โ BLOCK
- Check penalty box (adaptive compression levels)
- Compress user input only (system prompts pinned)
- Forward to LLM API
- Return response + security metrics
- Trace everything to Phoenix
Tech Stack
- FastAPI - Fast, modern Python framework
- Google Gemini API - LLM provider + evaluator
- TheTokenCompany bear-1 - Semantic compression
- Streamlit - Interactive demo UI
- Arize Phoenix + OpenTelemetry - Distributed tracing
- Python 3.8+ - Core language
๐ง Challenges & Solutions
Challenge 1: False Positives
Problem: Block legitimate high-entropy content (code snippets, technical terms)
Solution: Three-tier system + LLM-as-judge for gray zone. Clever thresholds. Extensive testing.
Challenge 2: System Prompt Security
Problem: Compression can't touch system prompts (they contain security guardrails)
Solution: Message separation + system prompt pinning + strict delimiters. User content only.
Challenge 3: Observability
Problem: OpenTelemetry spans need proper naming for Phoenix visualization
Solution: Careful span attribute conventions + custom spans for each security layer + FinOps metrics
๐ What I'm Proud Of
1. Seven-Layer Defense - Identity fingerprinting, regex detection, entropy analysis, LLM-as-judge, adaptive compression, penalty box, and distributed tracing. All working together hand-in-hand.
2. Intelligent Compression - System prompt pinning means security never gets compromised. Adaptive aggressiveness (0.5 default, 0.8 penalty) means bad actors pay more.
3. Zero False Positives - Three-tier classification with LLM-as-judge fallback means legitimate users never get blocked. Malicious attacks always get caught.
4. Production-Ready - Modular architecture. Comprehensive error handling. Extensive documentation. Ready to deploy.
5. Beautiful Demo - Interactive UI that makes security visible. Try attacks. See blocks. Understand what's happening.
6. Full Observability - Every decision traced. Every metric logged. Phoenix dashboard shows everything.
๐ Key Learnings
Entropy is powerful - Shannon entropy detects randomness brilliantly. But you need careful thresholds and fallback evaluation.
LLM-as-Judge works - Using an LLM to evaluate suspicious prompts is effective. Adds latency but catches edge cases.
System prompts are sacred - Never compress them. Always pin them. Security depends on it.
Observability is essential - Phoenix + OpenTelemetry gives incredible visibility. Worth the integration effort.
Adaptive beats static - One-size-fits-all security doesn't work. Penalty boxes let you be strict with bad actors and lenient with legitimate users.
๐ฎ What's Next
- ML-Enhanced Detection - Train models to catch attacks better than rules
- Intelligent Rate Limiting - Dynamic limits based on threat scores
- Multi-LLM Support - OpenAI, Anthropic, and more
- Analytics Dashboard - Attack trends, cost savings, threat intelligence
- API Key Management - Per-user keys and quotas
- Webhook Alerts - Real-time notifications for high-threat attacks
- A/B Testing - Experiment with security configurations
- Cloud Deployment - Scale horizontally, load balance, production-hardened
๐ ๏ธ Technologies
FastAPI | Google Gemini API | TheTokenCompany API | Streamlit | Arize Phoenix | OpenTelemetry | Python | Uvicorn | Pydantic | SciPy | CacheTools | Cursor
Test the attacks:
- โ Normal Query - See clean requests pass
- ๐ฐ Token Stuffing - See compression in action
- ๐ด High Entropy - See instant blocks
- โ ๏ธ Suspicious - See LLM-as-judge evaluation
Built to stop economic DDoS attacks. Built to save your budget.

Log in or sign up for Devpost to join the conversation.