Cerberus FinSec: Insider Threat Guardian

main thumbnail image
stress tests
main screen
locked session
flags report
compliance panel ui
generating matrix

Inspiration

Insider threats cost financial institutions billions annually — yet most security tools only log activity after the damage is done. I wanted to build an agent that doesn't just monitor, but acts in real‑time. The Google Cloud Rapid Agent Hackathon was the perfect opportunity to combine Gemini 3 Flash, MongoDB Atlas, and a serverless architecture into a guardian that detects, explains, and autonomously blocks data exfiltration.

My initial idea was an HR assessment tool. But after talking to security engineers, I realized the same technology — tracking pastes, tab switches, copy attempts — could solve a much bigger problem: rogue employees copying proprietary trading code or client lists. That pivot turned a generic product into a high‑value financial security platform.

What it does

Cerberus FinSec is an agentic insider threat detection system for financial terminals.

Gemini 3 Flash generates compliance policies (AML, SOX, FINRA) as structured threat matrices.
A live terminal monitors every keystroke, paste, copy, and tab switch.
Behavioral + semantic analysis (85% Gemini / 15% counters) produces a real‑time risk score with 6 breakdown dimensions: data exfiltration, unauthorized access, policy violation, AML red flags, insider trading, and SOX non‑compliance.
When the risk score exceeds a threshold (76% in my demo), the agent locks the session — blocking all input, copy, and exfiltration.
Every event is persisted to MongoDB Atlas via a Model Context Protocol (MCP) server, creating an immutable audit trail.
A Flutter dashboard gives security officers a live view: animated risk gauge, event timeline with color‑coded severity, expandable behavioral flags, collapsible paste snippets, keystroke metrics, and full incident reports with one‑tap clipboard copy.

The system doesn't just alert — it acts autonomously, freezing the terminal until a compliance officer intervenes.

How I built it

Backend — Hono.js on Cloud Run
I chose Hono.js for its ultra‑lightweight TypeScript runtime (≈40ms cold start vs. ≈300ms for Express). The API has 7 route handlers (generate.ts, guardian.ts, review.ts, health.ts, identity.ts) mounted with global CORS and structured logging. The config.ts loader reads all secrets from environment variables (never hardcoded) and pre‑warms the Gemini client during boot by fetching ADC tokens ahead of the first request — this cuts cold‑start latency from 15s to ≈2s.

AI Layer — Gemini 3 Flash with Dual Auth + 3‑Retry Pipeline
The Gemini client (agents/gemini-client.ts) uses the @google/genai SDK with two authentication paths auto‑detected at startup:

Primary: GEMINI_API_KEY → Google AI Studio endpoint (generativelanguage.googleapis.com)
Fallback: GCP_PROJECT_ID + ADC → Vertex AI endpoint (aiplatform.googleapis.com)

Every Gemini call goes through a 3‑retry exponential‑backoff loop with jitter (1s → 2s → 4s), a 90‑second timeout, and a concurrency limiter (max 5 simultaneous requests) to stay under per‑minute quotas. A 4‑tier defensive JSON parsing pipeline (safeArray(), safeStringArray(), repairJson(), extractJsonObject()) handles Gemini's occasionally malformed output — preventing crashes when arrays arrive as strings.

Semantic Gatekeeper — Gemini as the Sole Validator
Instead of regex heuristics, every /generate request passes through a two‑stage Gemini classifier (classifyComplianceIntent): Stage 1 validates the input is meaningful language (rejecting gibberish, single words, keyboard mashing), Stage 2 verifies it describes a compliance audit request, returning confidence, detectedDomain, and detectedComplianceType. This is the only validation gate — no pattern matching, no keyword lists.

Telemetry Engine — Guardian Ingest Pipeline
POST /api/v1/guardian/ingest accepts batched micro‑events from the Flutter terminal. The guardian.ts route auto‑creates sessions in MongoDB if the referenced sessionId doesn't exist. A 4‑tier deduplication pipeline prevents redundant processing:

Code‑hash dedup: SHA‑256 of currentCode — skips Gemini if unchanged
Event fingerprint dedup: Rotating Set of last 128 event fingerprints (eventType + serialized payload) discards duplicates within the same second
Frontend payload dedup: Compares riskAssessmentId UUID before adding to timeline
Polling cache: _lastPolledRiskPayload suppresses re‑emission on poll cycles

Agentic Action — Session Lock State Machine
The Guardian maintains an explicit state machine: active → locked → escalated. When the anomaly risk index crosses the threshold, the backend sets status: "frozen", persists to MongoDB via store_suspicion_report, and returns 403 Forbidden with sessionFrozen: true for all subsequent ingest requests. The Flutter workspace transitions to read‑only mode with a lock overlay. Only POST /api/v1/guardian/sessions/:id/unlock (requiring reviewerId + unlockReason) restores the session.

Database — MongoDB Atlas + MCP Server (10 tools)
The MCP server (mcp-server/src/) uses the MongoDB Node.js native driver with Atlas connection pooling (maxPoolSize: 10, serverSelectionTimeoutMS: 5000). Ten MCP tools are exposed via an HTTP adapter at POST /tools/:toolName: store_compliance_matrix, get_compliance_matrix, list_compliance_matrices, create_session, store_session, list_sessions, get_session, delete_session, store_session_event, list_session_events, store_suspicion_report, list_suspicion_reports. The ensureIndexes() function creates compound indexes on {sessionId, createdAt} and {employeeId} at startup. Session state survives Cloud Run scale‑to‑zero via a 3‑tier recovery pipeline: in‑memory → MCP list_sessions MongoDB fallback → Flutter dual‑endpoint merge.

Frontend — Flutter Material 3 Dashboard on Firebase Hosting
The Flutter app (sandbox/frontend/) uses Provider for state management across 6 providers (GuardianProvider, GenerateProvider, ReviewProvider, HealthProvider, IdentityProvider, ThemeProvider). The split‑panel layout features a code workspace (left) with real‑time keystroke/paste/tab‑switch telemetry and a security timeline (right) with an animated risk gauge, dimension score bars, and an expandable incident dialog with tabbed views. WidgetsBindingObserver detects AppLifecycleState.paused for tab‑switch events. SSE streaming provides live risk updates; polling at 5‑second intervals serves as fallback with _lastPolledRiskPayload caching to prevent duplicates.

Infrastructure

Cloud Run (us‑central1): Hono API + MCP sidecar via entrypoint.sh concurrent launch
Cloud Scheduler: Keep‑warm ping to /health every 5 minutes to prevent cold starts during judging
Cloud Build: cloudbuild.yaml CI/CD pipeline
Firebase Hosting: Flutter web deployment
100% native fetch: Zero third‑party HTTP clients — every outbound call (Vertex AI, MCP sidecar, MongoDB Atlas) uses Node.js 22 fetch()

Challenges I ran into

1. The input validation trap

I initially tried to filter garbage prompts (keyboard mash, profanity) with regex heuristics. It rejected perfectly valid assessments because tokenisation merged punctuation‑stripped text into giant “words”. I scrapped all heuristics and made Gemini the sole gatekeeper — it evaluates isInputMeaningful, isAssessmentRelated, and confidence with surgical accuracy. Lesson: don't fight the LLM with pattern matching.

2. Gemini 3 Flash preview regional availability on Vertex AI

The hackathon required gemini-3-flash-preview. I initially configured Vertex AI as the primary path, but the preview model had inconsistent availability across regions during the preview period. Rather than compromise on the mandated model, I designed a dual‑auth architecture from day one — the Gemini client auto‑detects whether GEMINI_API_KEY is set and routes to the AI Studio endpoint (where the preview model was reliably available), while keeping Vertex AI + ADC as a fully‑configured fallback. The fallback to gemini-2.5-flash was coded defensively but never triggered in production — the AI Studio key path resolved everything seamlessly. Lesson: preview models are unpredictable on Vertex AI; always ship with a parallel auth path that goes direct to the model API.

3. Duplicate events from polling fallback

When SSE was unavailable, the frontend polled /sessions/:id every 5 seconds and re‑added the same risk payload, creating dozens of identical timeline entries. I fixed it by adding deduplication in the provider (compare riskAssessmentId UUID, generatedAt, risk score, flags) and caching the last payload in the API service (_lastPolledRiskPayload).

4. Session switching didn't clear the right panel

Switching sessions left the old risk gauge and timeline on screen. I added resetForNewSession() in GuardianProvider that clears events, cancels subscriptions, resets the polling cache, and reloads the new session's data.

5. Cold starts on Cloud Run

The first request after inactivity took 10–15 seconds — mostly from ADC token acquisition and Gemini client initialization. I implemented ADC pre‑warming in config.ts (prewarmGeminiClient() fetches the auth token during the boot phase) and a Cloud Scheduler job that pings /health every 5 minutes, keeping the container warm. This brought cold‑start response time under 3 seconds.

6. Stress test concurrency limits

100 concurrent Gemini calls hit Vertex AI's per‑minute quota (429 errors). I added a concurrency limiter (max 5 simultaneous Gemini requests, queued with Promise‑based semaphore) and ran stress tests in 5 waves of 5. For the hackathon, 25 concurrent calls succeeded with 0 failures.

Accomplishments that I'm proud of

Autonomous agentic action: The system doesn't just log — it locks the session on high risk, blocks copy/paste/keyboard input, and requires a documented unlock (reviewerId + unlockReason). The full state machine (active → locked → escalated) is persisted to MongoDB so no threat event is lost on restart.
Production‑grade resilience pipeline: Dual auth (AI Studio API key / Vertex AI ADC), 3‑retry exponential backoff with jitter, concurrency limiter (max 5), 4‑layer JSON repair (safeArray → safeStringArray → repairJson → extractJsonObject), 4‑tier event deduplication (code‑hash → fingerprint → payload UUID → poll cache), and 3‑tier session recovery (in‑memory → MongoDB → Flutter dual‑endpoint merge).
Full cloud deployment: Cloud Run (Hono API + MCP sidecar), Firebase Hosting (Flutter dashboard), MongoDB Atlas (10‑tool MCP grounding layer), Cloud Scheduler (keep‑warm), Cloud Build (CI/CD) — all integrated and deployed.
Zero failures in final test suites: 13/13 endpoint smoke tests passed, 18/18 telemetry event types + lifecycle passed, 50/50 concurrent burst requests with 0 failures. Average micro‑event ingest latency: 193ms. MongoDB Atlas ops/sec peaked at 4.3 (well under the 100 limit).
Clean, operator‑focused UI: Dark‑themed Material 3 dashboard with animated risk gauge (green → yellow → red interpolation), 6‑dimension risk breakdown bars, tabbed incident dialog (Flags / Incident tabs), collapsible paste snippets with source metadata, keystroke metrics grid, and one‑tap clipboard copy for audit reports.

What I learned

LLMs beat heuristics every time — Gemini's semantic classification of input meaningfulness and compliance relevance is far more accurate than any regex‑based filtering. Ship the LLM as the gatekeeper, not as a fallback.
Agentic behavior requires explicit state machines — I added a status field (active → locked → escalated) with conditional branching in the ingest pipeline. Without it, “locking” is just a boolean flag with no audit trail.
Preview models demand parallel auth paths — never assume a preview model will be available on Vertex AI in your target region. Shipping with both AI Studio API key and Vertex AI ADC from day one prevented a last‑minute scramble.
MongoDB Atlas + MCP is a powerful combination — 10 tools (store, retrieve, list, delete across sessions, events, matrices, and suspicion reports) turned the database into a true agent partner, not just a persistence layer. The MCP HTTP adapter made it deployable as a Cloud Run sidecar.
Serverless cold starts are real and solvable — pre‑warming ADC tokens at boot + a 5‑minute keep‑alive ping via Cloud Scheduler brought latency from 15s to under 3s. Without this, the live demo would have been painful.

What's next for Cerberus FinSec

Incident escalation workflow – add a 30‑second timer after lock; if no override, auto‑escalate to compliance officer via Slack webhook.
Historical baseline comparison – Gemini compares current behaviour against employee’s last 3 sessions to detect drift from established patterns.
Multi‑tenant support – separate organisations, each with their own compliance matrices and audit logs.
Real‑time WebSocket push – replace polling fallback with persistent WebSocket connections for instant risk updates to all open dashboards.
Custom model fine‑tuning – fine‑tune Gemini on financial regulatory texts (FINRA, SEC rules, AML directives) for sharper compliance matrix generation and threat detection.
Open‑source release – publish the MCP server and Gemini client as standalone reference implementations for the community.
Google Cloud Agent Builder deep integration – register all 7 Hono endpoints as Agent Builder tools with conversation‑aware context threading for multi‑turn compliance workflows.

Built With

cloud-build
cloud-scheduler
dart
docker
firebase-hosting
flutter
gemini-3-flash
google-ai-studio
google-cloud-agent-builder
google-cloud-run
google-cloud-sdk
hono.js
model-context-protocol-(mcp)
mongodb-atlas
node.js
typescript
vertex-ai