ShadowLab – Chaos Engineering for AI APIs

Inspiration

Every AI API has a hidden attack surface. Prompt injection, system prompt leakage, policy bypass -- these aren't theoretical risks. They happen every day when real users probe production AI systems. As AI APIs become the backbone of modern applications, developers need a way to test for adversarial failures before they ship -- not after users exploit them.

I was inspired by the chaos engineering philosophy: intentionally break things in controlled environments so they don't break in production. Tools like Chaos Monkey exist for infrastructure. But for AI APIs? Nothing. That gap inspired ShadowLab.

What it does

ShadowLab is an automated adversarial testing platform for AI APIs. Point it at any HTTP AI endpoint, click "Start Scan," and it will:

Generate adversarial attack prompts using DigitalOcean Gradient AI (prompt injection, system prompt extraction, policy bypass, encoding attacks, multi-language probes, and more)
Send each attack to your API and capture the response
Analyze every response with Gradient AI to detect security vulnerabilities -- not just keyword matching, but deep analysis for paraphrased leakage, roleplay compliance, and tone shifts
Iteratively refine attacks -- if your API defends against the first round, Gradient generates adaptive follow-up attacks to test deeper defenses
Produce a security report with a 0-100 safety score, severity-categorized findings, and AI-generated fix suggestions

Think of it as a "red team in a box" powered by DigitalOcean Gradient AI.

How I built it

Backend: FastAPI (Python) handles scan orchestration, attack generation, response evaluation, and safety scoring. The response judge uses a two-layer approach: fast heuristic rules for known patterns, plus Gradient AI deep analysis on every response. Frontend: Next.js + TypeScript dashboard with a scan form, real-time Gradient connectivity indicator, security score gauge, severity-filterable findings table, and remediation recommendations. DigitalOcean Gradient AI integration is the core of ShadowLab, used in four distinct ways:

Attack Generation (GPT-OSS-20B) -- generates targeted adversarial prompts
Vulnerability Detection (Llama 3.3 70B) -- analyzes every API response for security failures
Attack Refinement (GPT-OSS-20B) -- generates adaptive follow-up attacks based on the target's defenses
Fix Suggestions (Llama 3.3 70B) -- provides developer-friendly remediation This two-model design optimizes cost and performance: a lightweight model handles bulk generation, while a stronger reasoning model performs deep analysis. Deployment: DigitalOcean App Platform with Docker containers for both backend and frontend. Testing: 42 backend tests (pytest) and 38 frontend tests (Jest + React Testing Library) ensure reliability.

Challenges I ran into

False positives from echo APIs -- Some APIs echo back the user's prompt, which contains adversarial phrases. The response judge flagged these as leakage. I solved this with echo detection: if the response is just a mirror of the input, it's not a vulnerability.
LLM judge reliability -- Gradient AI's analysis needed structured prompts to produce consistent, parseable verdicts. I iterated on the prompt format until settling on a VERDICT/SEVERITY/REASON/FIX structure that reliably extracts actionable data.
Balancing speed and depth -- Calling Gradient AI for every response adds latency. The two-model approach (lightweight for generation, stronger for analysis) keeps scans under 30 seconds while maintaining analysis quality.
SSRF protection -- Since the scan accepts arbitrary URLs, I had to implement guards against Server-Side Request Forgery: blocking private IPs, localhost (unless explicitly allowed), and DNS rebinding attacks.

Accomplishments that I'm proud of

Four distinct Gradient AI integrations that each serve a genuine purpose in the pipeline -- not superficial usage
Iterative attack refinement that adapts to the target's defenses, making the testing more thorough than static attack sets
Two-layer vulnerability detection combining fast heuristics with AI deep analysis catches both obvious and subtle issues
Production-ready architecture with SSRF protection, rate limiting, persistent storage, comprehensive tests, and Docker deployment
The meta demo -- scanning DigitalOcean's own Gradient AI endpoint with ShadowLab (powered by Gradient AI) and getting a perfect safety score

What I learned

The two-model approach (lightweight + strong) is a powerful pattern for production AI applications. Not every task needs the most capable model.
Structured prompts with explicit output formats make LLM-as-a-judge far more reliable than free-form analysis.
AI API security testing is a genuine gap in the developer toolkit. The concept resonates with everyone I've shown it to.
DigitalOcean's Gradient AI Serverless Inference is remarkably easy to integrate -- the OpenAI-compatible API means minimal code changes.

What's next for ShadowLab -- Chaos Engineering for AI APIs

Real-time attack streaming -- Stream attack events and judge results as they complete (SSE/WebSocket) for a more interactive experience
CI/CD integration -- Fail builds or block deploys when the safety score drops below a threshold or critical vulnerabilities are found
Multi-generation attack evolution -- Use Gradient AI to mutate and evolve prompts across 3+ generations for even broader coverage
Comparative reporting -- Track safety score trends across scan history to measure improvement over time
Custom attack categories -- Let users define organization-specific attack types and evaluation criteria

Built With

digitalocean-app-platform
digitalocean-gradient-ai
docker
fastapi
gpt-oss-20b
llama
next.js
python
react
sqlite
tailwind-css
typescript

Updates

Prabhakaran Jayaraman Masani started this project — Mar 11, 2026 07:04 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.