LLMshield AI: Stop Economic DDoS Attacks Before They Drain Your Budget

Your LLM API just got attacked. Random gibberish flooded your endpoint. Your token bill: $10,000 for garbage.

Traditional rate limiting won't save you. Attackers are smarter. They exploit token-based pricing to turn your API into a cash extraction machine.

LLMshield AI stops them cold.

๐Ÿ›ก๏ธ The Problem

LLM APIs are gold mines for attackers. They can:

  • Flood your API with random characters (token stuffing)
  • Hijack your system prompts (role hijacking)
  • Override your instructions (prompt injection)
  • Turn $100 monthly bills into $10,000 overnight

Rate limiting? It blocks legitimate users. Firewalls? Don't understand context. You need dynamic security that knows the difference between malicious attacks and legitimate high-entropy content (like code or technical queries).

๐Ÿ’ก The Solution

LLMshield AI is a production-ready security proxy that sits between your clients and LLM APIs. It doesn't just block attacksโ€”it outsmarts them with 7 layers of defense and intelligent compression that saves you money.

Seven security layers. Zero false positives. Full observability.

๐ŸŽฏ What It Does

Core Protection

1. Identity Fingerprinting - Track users by X-User-ID + IP. Spot suspicious patterns before they become attacks.

2. Regex Threat Detection - Surgical pattern matching catches role hijacking and instruction override attempts instantly.

3. Entropy Analysis - Shannon entropy calculation detects randomness. Clean text passes. Random gibberish gets blocked. Simple.

4. LLM-as-Judge - Suspicious but not clearly malicious? Let Gemini evaluate it. Gray-zone protection with intelligence.

5. Adaptive Compression - Smart compression using TheTokenCompany's bear-1 model. Compresses user input. Preserves system prompts. Saves tokens. Reduces costs.

6. Penalty Box - Bad actors get flagged. Higher compression for repeat offenders. Time-bound (1-hour TTL). Automatic.

7. Full Observability - Every request traced. Every security decision logged. FinOps metrics in Phoenix. Know exactly what happened and why.

Interactive Demo UI

Built a Streamlit interface that makes security visible:

  • Real-time chat with live security metrics
  • Pre-configured attack scenarios (try them yourself)
  • Session statistics and token savings tracking
  • Beautiful visualizations of threat levels and entropy scores

See the attacks. See the blocks. See the savings.

๐Ÿ”ง How I Built It

Architecture

Backend (FastAPI)

  • Modular service architecture: detector, evaluator, sieve, penalty
  • OpenTelemetry instrumentation on every security layer
  • Production-ready error handling with fallbacks

Frontend (Streamlit)

  • Interactive demo UI
  • Real-time metrics visualization
  • One-click attack scenarios

Observability (Arize Phoenix)

  • Distributed tracing for every request
  • FinOps metrics (token savings, compression ratios)
  • Security event tagging

The Security Flow

  1. Request hits FastAPI endpoint
  2. Extract user fingerprint (identity tracking)
  3. Run regex threat detection (pattern matching)
  4. Calculate Shannon entropy (randomness detection)
  5. Classify: CLEAN (โ‰ค5.5) โ†’ compress, SUSPICIOUS (5.5-6.5) โ†’ LLM judge, HIGH (>6.5) โ†’ BLOCK
  6. Check penalty box (adaptive compression levels)
  7. Compress user input only (system prompts pinned)
  8. Forward to LLM API
  9. Return response + security metrics
  10. Trace everything to Phoenix

Tech Stack

  • FastAPI - Fast, modern Python framework
  • Google Gemini API - LLM provider + evaluator
  • TheTokenCompany bear-1 - Semantic compression
  • Streamlit - Interactive demo UI
  • Arize Phoenix + OpenTelemetry - Distributed tracing
  • Python 3.8+ - Core language

๐Ÿšง Challenges & Solutions

Challenge 1: False Positives

Problem: Block legitimate high-entropy content (code snippets, technical terms)

Solution: Three-tier system + LLM-as-judge for gray zone. Clever thresholds. Extensive testing.

Challenge 2: System Prompt Security

Problem: Compression can't touch system prompts (they contain security guardrails)

Solution: Message separation + system prompt pinning + strict delimiters. User content only.

Challenge 3: Observability

Problem: OpenTelemetry spans need proper naming for Phoenix visualization

Solution: Careful span attribute conventions + custom spans for each security layer + FinOps metrics

๐Ÿ† What I'm Proud Of

1. Seven-Layer Defense - Identity fingerprinting, regex detection, entropy analysis, LLM-as-judge, adaptive compression, penalty box, and distributed tracing. All working together hand-in-hand.

2. Intelligent Compression - System prompt pinning means security never gets compromised. Adaptive aggressiveness (0.5 default, 0.8 penalty) means bad actors pay more.

3. Zero False Positives - Three-tier classification with LLM-as-judge fallback means legitimate users never get blocked. Malicious attacks always get caught.

4. Production-Ready - Modular architecture. Comprehensive error handling. Extensive documentation. Ready to deploy.

5. Beautiful Demo - Interactive UI that makes security visible. Try attacks. See blocks. Understand what's happening.

6. Full Observability - Every decision traced. Every metric logged. Phoenix dashboard shows everything.

๐Ÿ“š Key Learnings

Entropy is powerful - Shannon entropy detects randomness brilliantly. But you need careful thresholds and fallback evaluation.

LLM-as-Judge works - Using an LLM to evaluate suspicious prompts is effective. Adds latency but catches edge cases.

System prompts are sacred - Never compress them. Always pin them. Security depends on it.

Observability is essential - Phoenix + OpenTelemetry gives incredible visibility. Worth the integration effort.

Adaptive beats static - One-size-fits-all security doesn't work. Penalty boxes let you be strict with bad actors and lenient with legitimate users.

๐Ÿ”ฎ What's Next

  • ML-Enhanced Detection - Train models to catch attacks better than rules
  • Intelligent Rate Limiting - Dynamic limits based on threat scores
  • Multi-LLM Support - OpenAI, Anthropic, and more
  • Analytics Dashboard - Attack trends, cost savings, threat intelligence
  • API Key Management - Per-user keys and quotas
  • Webhook Alerts - Real-time notifications for high-threat attacks
  • A/B Testing - Experiment with security configurations
  • Cloud Deployment - Scale horizontally, load balance, production-hardened

๐Ÿ› ๏ธ Technologies

FastAPI | Google Gemini API | TheTokenCompany API | Streamlit | Arize Phoenix | OpenTelemetry | Python | Uvicorn | Pydantic | SciPy | CacheTools | Cursor

Test the attacks:

  • โœ… Normal Query - See clean requests pass
  • ๐Ÿ’ฐ Token Stuffing - See compression in action
  • ๐Ÿ”ด High Entropy - See instant blocks

- โš ๏ธ Suspicious - See LLM-as-judge evaluation

Built to stop economic DDoS attacks. Built to save your budget.

Built With

Share this project:

Updates