What it does

Sentinel is a guardian system that supervises a fraud-detection model. It watches the model's decisions in real time, detects when its behavior drifts, uses Gemini to diagnose the cause, recommends a fix, and applies it only after a human approves.

The loop closes in six steps:

  1. A trained fraud model scores each transaction — approve, escalate, or decline.
  2. Every decision streams to Arize for observability.
  3. A monitor tracks the decline rate and trips when it leaves the normal band.
  4. Gemini investigates: targeted attack, or broad model drift? It explains why.
  5. It recommends a concrete threshold change with a predicted effect.
  6. A human approves, the fix is applied, and the decline rate recovers.

Inspiration

Fraud models don't fail loudly — they rot quietly. A model that works today slowly degrades as fraud patterns shift, and the catch is a feedback delay: you don't learn an approval was fraud until a chargeback arrives weeks later. By then the damage is done, and the opposite failure — wrongly declining real customers — is completely invisible.

So we stopped trying to watch the outcome and started watching the model's own behavior. A sudden change in the decline rate is an early signal you can see in minutes, long before any chargeback exists. That single idea — supervise the AI's behavior, not its delayed ground truth — is the whole project. And it generalizes: the same guardian could watch any high-stakes decision model.

How we built it

  • Decision model — a scikit-learn classifier trained on a real public credit-card fraud dataset (~1.3M transactions), scoring on category, amount, and time of day.
  • Observability — every decision is logged to Arize AX as an OpenTelemetry trace via arize-otel.
  • InvestigationGemini 3 Flash on Google Cloud (Vertex AI) reasons over the live evidence and produces a human-readable diagnosis.
  • Guardian loop — Python orchestration: monitor → detect → investigate → recommend → human approval → apply → recover.
  • Interface — a Flask server streams the live loop to a web UI (live chart, agent-reasoning feed, and a real Approve button) over Server-Sent Events.
  • Deployment — containerless deploy to Render; runs live end to end.

The architecture has a clean seam between the loop and the integrations, so the whole system runs in a no-credential simulation mode and switches to live Arize + Gemini by flipping environment variables — nothing else changes.

Challenges we ran into

  • Making the drift real, not staged. Instead of a fake "drift = true" switch, we inject a surge of real fraud transactions concentrated in one category, so the model genuinely misbehaves and the decline rate climbs on its own.
  • Observability latency. Arize has an ingestion delay, so a record you just logged isn't immediately queryable. We log every decision to Arize for the dashboard while computing the fast loop signal in-memory from the same decisions.
  • Calibrating a real model. A trained classifier behaves differently from a toy one, so detection and recovery thresholds had to be tunable rather than hard-coded.
  • SDK reality. Gemini 3 preview is served only from the global Vertex endpoint, and the current Arize SDK sends data as traces rather than the older inference logger — both required adapting to the live APIs.

What we learned

  • The hard part of trustworthy AI isn't the model — it's knowing, in real time, when to stop trusting it.
  • Catching a leading behavioral signal beats waiting for lagging ground truth.
  • Autonomy and human control aren't opposites: the agent does all the detection and diagnosis, and the human makes only the one consequential decision.

What's next

  • Move the alarm fully into Arize monitors.
  • Point the same guardian at other decision systems (credit, content moderation, LLM agents).
  • Support multiple simultaneous drift signals beyond the decline rate.

Built with

Python · scikit-learn · Gemini 3 (Vertex AI) · Arize AX · Flask · Render

Built With

Share this project:

Updates