Inspiration: Fraud reviews are either too manual (slow, inconsistent) or too opaque (“the model said so”). We built a drop-in, explainable layer you can bolt onto an existing microservices app—no schema changes or risky rewrites—that shows the clear “why” behind each decision.
What it does:
• Watches new Bank of Anthos transactions via a read-only gateway. • Scores risk with Gemini using compact signals (amount spikes, geo velocity, time-of-day). • Explains the rationale in plain language and stores an audit record. • Calls existing BoA endpoints for actions (noop/notify/step-up/temp hold). • Dashboard: live table with time | amount | score | action | why.
How we built it:
Bank of Anthos remains untouched (namespace boa); FraudGuard runs in namespace fraudguard. Services (5 pods): mcp-gateway (FastAPI, read-only BoA façade, MCP tools), risk-scorer (Gemini), explain-agent (rationale + audit), action-orchestrator (score→action), dashboard (read-only). One reusable Helm/workload chart (Deployment/Service/HPA/RBAC/NetworkPolicy/PSA) with per-service values. Infra: Terraform (GKE Autopilot, Artifact Registry, WIF, Secret Manager, Budgets). CI/CD: GitHub Actions → build, Trivy, SBOM, push, helm upgrade via WIF (keyless).
Infra as code. Terraform modules in infra-gcp-gke for GKE Autopilot, Artifact Registry, Workload Identity Federation (WIF), Secret Manager, and Budgets.
CI/CD. GitHub Actions builds images → Trivy scan → SBOM → push to Artifact Registry → helm upgrade per service (matrix) using WIF (no long-lived keys).
Secrets. Gemini API key (or Vertex AI later) stored in Secret Manager, mounted with the GKE Secret Manager CSI; KSAs have least-privilege secretAccessor.
Security & ops. NetworkPolicies (deny-by-default), Pod Security (non-root, read-only FS), /healthz & /readyz, HPA, structured JSON logs with txn_id/trace_id.
Challenges we ran into
• Rule: “Do not modify BoA” → adapter layer and actions only via existing endpoints. • Secrets at scale: Secret Manager CSI + Workload Identity (no keys). • DRY deploys: one chart vs per-service differences (ports/env/RBAC/NP). • Latency vs explainability: compact prompts, human-readable rationale..
Accomplishments that we're proud of
• End-to-end on GKE Autopilot with zero BoA changes. • Explainable decisions (score + why) stored for audit, easy to demo. • Repro in ~10 minutes: single chart + per-service values + Make/Helm. • Keyless CI deploys using WIF, plus Trivy + SBOM in the pipeline.
What we learned
• Safe agentic patterns on Kubernetes (MCP façade + action-orchestrator). • Trade-offs: Gemini API key vs Vertex AI + Workload Identity. • A single workload chart is a sweet spot for microservice scaffolding. • Explainability improves trust for judges and users.
What's next for FraudGuard for Bank of Anthos
- Make Vertex path default; richer features; BigQuery/Feature Store; streaming (Pub/Sub)
- Alerting/notifications; policy-as-code for actions; audit viewer; expanded A2A/ADK/MCP tooling
Built With
- bash
- ci/cd-&-security:-github-actions-(oidc-workload-identity-federation)
- cloud-logging/monitoring
- google-cloud:-artifact-registry
- helm-(single-workload-chart-+-per-service-values)
- hpa
- kubernetes:-gke-autopilot
- languages/frameworks:-python-(fastapi)
- networkpolicy
- optional)
- pod
- pub/sub
- secret-manager-(+-csi)
- security
- trivy
- ui:-next.js-or-flask)
- vertex
Log in or sign up for Devpost to join the conversation.