Inspiration

Every AI API has a hidden attack surface. Prompt injection, system prompt leakage, policy bypass -- these aren't theoretical risks. They happen every day when real users probe production AI systems. As AI APIs become the backbone of modern applications, developers need a way to test for adversarial failures before they ship -- not after users exploit them.

I was inspired by the chaos engineering philosophy: intentionally break things in controlled environments so they don't break in production. Tools like Chaos Monkey exist for infrastructure. But for AI APIs? Nothing. That gap inspired ShadowLab.

What it does

ShadowLab is an automated adversarial testing platform for AI APIs. Point it at any HTTP AI endpoint, click "Start Scan," and it will:

  • Generate adversarial attack prompts using DigitalOcean Gradient AI (prompt injection, system prompt extraction, policy bypass, encoding attacks, multi-language probes, and more)
  • Send each attack to your API and capture the response
  • Analyze every response with Gradient AI to detect security vulnerabilities -- not just keyword matching, but deep analysis for paraphrased leakage, roleplay compliance, and tone shifts
  • Iteratively refine attacks -- if your API defends against the first round, Gradient generates adaptive follow-up attacks to test deeper defenses
  • Produce a security report with a 0-100 safety score, severity-categorized findings, and AI-generated fix suggestions

Think of it as a "red team in a box" powered by DigitalOcean Gradient AI.

How I built it

Backend: FastAPI (Python) handles scan orchestration, attack generation, response evaluation, and safety scoring. The response judge uses a two-layer approach: fast heuristic rules for known patterns, plus Gradient AI deep analysis on every response. Frontend: Next.js + TypeScript dashboard with a scan form, real-time Gradient connectivity indicator, security score gauge, severity-filterable findings table, and remediation recommendations. DigitalOcean Gradient AI integration is the core of ShadowLab, used in four distinct ways:

  1. Attack Generation (GPT-OSS-20B) -- generates targeted adversarial prompts
  2. Vulnerability Detection (Llama 3.3 70B) -- analyzes every API response for security failures
  3. Attack Refinement (GPT-OSS-20B) -- generates adaptive follow-up attacks based on the target's defenses
  4. Fix Suggestions (Llama 3.3 70B) -- provides developer-friendly remediation This two-model design optimizes cost and performance: a lightweight model handles bulk generation, while a stronger reasoning model performs deep analysis. Deployment: DigitalOcean App Platform with Docker containers for both backend and frontend. Testing: 42 backend tests (pytest) and 38 frontend tests (Jest + React Testing Library) ensure reliability.

Challenges I ran into

  • False positives from echo APIs -- Some APIs echo back the user's prompt, which contains adversarial phrases. The response judge flagged these as leakage. I solved this with echo detection: if the response is just a mirror of the input, it's not a vulnerability.
  • LLM judge reliability -- Gradient AI's analysis needed structured prompts to produce consistent, parseable verdicts. I iterated on the prompt format until settling on a VERDICT/SEVERITY/REASON/FIX structure that reliably extracts actionable data.
  • Balancing speed and depth -- Calling Gradient AI for every response adds latency. The two-model approach (lightweight for generation, stronger for analysis) keeps scans under 30 seconds while maintaining analysis quality.
  • SSRF protection -- Since the scan accepts arbitrary URLs, I had to implement guards against Server-Side Request Forgery: blocking private IPs, localhost (unless explicitly allowed), and DNS rebinding attacks.

Accomplishments that I'm proud of

  • Four distinct Gradient AI integrations that each serve a genuine purpose in the pipeline -- not superficial usage
  • Iterative attack refinement that adapts to the target's defenses, making the testing more thorough than static attack sets
  • Two-layer vulnerability detection combining fast heuristics with AI deep analysis catches both obvious and subtle issues
  • Production-ready architecture with SSRF protection, rate limiting, persistent storage, comprehensive tests, and Docker deployment
  • The meta demo -- scanning DigitalOcean's own Gradient AI endpoint with ShadowLab (powered by Gradient AI) and getting a perfect safety score

What I learned

  • The two-model approach (lightweight + strong) is a powerful pattern for production AI applications. Not every task needs the most capable model.
  • Structured prompts with explicit output formats make LLM-as-a-judge far more reliable than free-form analysis.
  • AI API security testing is a genuine gap in the developer toolkit. The concept resonates with everyone I've shown it to.
  • DigitalOcean's Gradient AI Serverless Inference is remarkably easy to integrate -- the OpenAI-compatible API means minimal code changes.

What's next for ShadowLab -- Chaos Engineering for AI APIs

  • Real-time attack streaming -- Stream attack events and judge results as they complete (SSE/WebSocket) for a more interactive experience
  • CI/CD integration -- Fail builds or block deploys when the safety score drops below a threshold or critical vulnerabilities are found
  • Multi-generation attack evolution -- Use Gradient AI to mutate and evolve prompts across 3+ generations for even broader coverage
  • Comparative reporting -- Track safety score trends across scan history to measure improvement over time
  • Custom attack categories -- Let users define organization-specific attack types and evaluation criteria

Built With

Share this project:

Updates