GAMBIT

Inspiration

The Arup deepfake incident — $25.6M stolen via a single video call in January 2024 — made it clear that social engineering scales with AI. Every enterprise is now deploying AI copilots with real authority: refund processing, database access, account changes. Only 34% have AI-specific security controls. We wanted to build the stress-testing lab that should exist before any of those copilots go live.

What It Does

GAMBIT runs two AI agents through 100 rounds of Split or Steal — a classic game theory scenario where cooperation pays but betrayal pays more. Both agents start with identical naive prompts and zero strategic priming. Through private reflection and lived experience, they independently discover deception, trust manipulation, and counter-deception. Nothing is pre-programmed. Everything emerges.

The simulation produces a full behavioral trace: when trust breaks (Round 6 is the observed inflection point), how exploitation patterns develop, and whether agents recover. GAMBIT then runs a skill distillation pipeline that converts these emergent patterns into deployable skill cards — prompt modules that harden Gradient-hosted agents against the exact social engineering behaviors observed in the run. You point GAMBIT at any Gradient model, stress-test it, and export the defenses when deception emerges.

Key results from a production run:

Mutual destruction rate: 86%
Cooperation rate: 6%
Deception Index: 22.9 / 100
First betrayal: Round 6

How We Built It

DigitalOcean Gradient™ AI is the core inference and agent infrastructure. All LLM calls route through the Gradient Serverless Inference endpoint using the OpenAI-compatible API (llama3.3-70b-instruct by default — swappable via env var with no code changes). Two Gradient Agents (gambit-player-a and gambit-player-b) represent the players; a third Game Master agent orchestrates turn routing natively through Gradient's agent routing. A Knowledge Base loaded with game theory content (Nash equilibrium, iterated prisoner's dilemma, tit-for-tat strategies) is attached to both player agents. Gradient Guardrails (Content Moderation + Jailbreak) ensure emergent deceptive behavior stays within safe boundaries.

Each round: 3 turns of conversation, independent split/steal choice, private reflection. Agents see their last 15 reflections but never the opponent's.

ElevenLabs renders agent conversations as voiced audio with emotion-mapped parameters and two distinct agent voices, with per-round alternating playback on the dashboard
Datadog LLM Observability traces every API call, every behavioral shift, and every round outcome
Braintrust logs each round with cooperation and deception scores for structured evaluation
Sentence Transformers powers the semantic similarity layer in the metrics pipeline
Pydantic models enforce strict type safety across all game state, agent memory, and skill bundle data
Static HTML/CSS/JS frontend — no framework overhead — with a split-screen agent dashboard, strategy analysis view (entropy curves, mutual information decay, exploitation windows), and distilled skill cards UI

What We Learned

Running the same experiment across model versions reveals a critical finding: stronger safety-trained models can completely suppress adversarial emergence. This means GAMBIT can quantify the adversarial resilience gap between model versions — if you swap the Gradient model in production, your security posture changes, and GAMBIT tells you by how much. This makes it a reusable audit tool, not a one-time demo.

Challenges

Getting agents to actually betray each other was harder than expected. Early runs produced pure cooperation regardless of prompt configuration. We iterated through memory window sizes, selfishness directives, reflection structures, and prompt modes (balanced_competitive, hard_max, legacy) before discovering that the model itself was the critical variable. We also had to build a cleanup pipeline for prompt leakage — raw LLM output sometimes included internal reasoning scaffolding that bled into the dialogue layer.

Built With

css
datadog
digitalocean
elevenlabs
httpx
javascript
numpy
pydantic
python
sentence-transformers

Updates

Hammad Arifeen started this project — Mar 15, 2026 07:11 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.