Private user

Private user posted an update

Highlights

  • AI backend migrated from Flask + deprecated vertexai.generative_modelsFastAPI + google-genai SDK (Gemini 2.5 Pro).
  • Reliability/Throughput: per-pod RPM throttle, semaphore concurrency, truncated exponential backoff with jitter + Retry-After.
  • K8s hardening: clean base/overlays, proper probes, WI bootstrap for Vertex, and FinOps autoscaling.
  • UI fixed: rebuilt/published budget-coach-ui v0.1.1, corrected in-cluster service ports, and aligned with new /api/* backend.
  • Smokes green end-to-end: core, data, e2e, fraud, spending, coach all pass.

What changed

Backend (insight-agent)

  • New FastAPI app (src/ai/insight-agent/main_vertex.py) with endpoints:

    • POST /api/budget/coach
    • POST /api/spending/analyze
    • POST /api/fraud/detect
    • GET /api/healthz
  • Switched to google-genai client (Vertex mode), model: gemini-2.5-pro.

  • JSON-schema responses + ThinkingConfig budgets; deterministic (temperature=0.0).

  • DSQ-friendly controls via env:

    • GENAI_CONCURRENCY, GENAI_RPM, GENAI_MAX_TOKENS, GENAI_THINK_TOKENS.

Kubernetes

  • Service ports standardized: cluster port 80 → container 8080 (both mcp-server and insight-agent).
  • Dev overlay sets Vertex/env knobs; Vertex Dockerfile runs uvicorn.
  • Added HPA/VPA (Autopilot) manifests under kubernetes-manifests/finops/.

FinOps

  • Cloud Logging cost cut via exclusion filter on _Default sink.
  • Enabled Vertical Pod Autoscaling on cluster.
  • Added HPA/VPA for all app agents (userservice, transactionhistory, frontend, mcp-server, agent-gateway, insight-agent, etc.).
  • Insight-agent HPA set to minReplicas: 1 (keeps latency predictable for demos).

UI (Streamlit)

  • Image rebuilt & pushed: .../budget-coach-ui:v0.1.1.
  • Fixed stale deployment (judges overlay) & set envs:

    • INSIGHT=http://insight-agent/api
    • USERSVC=http://userservice:8080
    • Port fix: MCPSVC=http://mcp-server (service on 80; no :8080).
  • UI now transforms BoA txns to {date,label,amount} before POSTing to the new FastAPI APIs.

Validation

  • make smoke-fast and make smoke-e2e passed:

    • Fraud: high-quality structured findings with SAR recommendations.
    • Spending: top categories + unusual count.
    • Coach: budget summary + buckets + tips.
  • Manual UI checks confirm end-to-end flow after env + port fix.

Notables / Footguns avoided

  • 422 Unprocessable Entity earlier was due to stale UI posting old timestamp shape; fixed by deploying v0.1.1 UI.
  • ConnectTimeout came from calling http://mcp-server:8080 (service listens on 80). Env corrected.

Paths you’ll see in the repo

  • Backend: src/ai/insight-agent/* (FastAPI, prompts, Dockerfiles, k8s overlays)
  • UI: ui/* (Streamlit app, Dockerfile, overlays)
  • FinOps: kubernetes-manifests/finops/* (HPA/VPA)
  • Make targets: deploy, smoke, WI bootstrap, image pinning.

Quick commands (for reference)

# Update UI env to correct ports/paths
kubectl -n default set env deploy/budget-coach-ui \
  INSIGHT=http://insight-agent/api \
  USERSVC=http://userservice:8080 \
  MCPSVC=http://mcp-server

# Re-deploy judges UI overlay
make ui-judges-apply

Log in or sign up for Devpost to join the conversation.