Highlights
- AI backend migrated from Flask + deprecated
vertexai.generative_models→ FastAPI +google-genaiSDK (Gemini 2.5 Pro). - Reliability/Throughput: per-pod RPM throttle, semaphore concurrency, truncated exponential backoff with jitter +
Retry-After. - K8s hardening: clean base/overlays, proper probes, WI bootstrap for Vertex, and FinOps autoscaling.
- UI fixed: rebuilt/published budget-coach-ui v0.1.1, corrected in-cluster service ports, and aligned with new
/api/*backend. - Smokes green end-to-end: core, data, e2e, fraud, spending, coach all pass.
What changed
Backend (insight-agent)
New FastAPI app (
src/ai/insight-agent/main_vertex.py) with endpoints:POST /api/budget/coachPOST /api/spending/analyzePOST /api/fraud/detectGET /api/healthz
Switched to
google-genaiclient (Vertex mode), model:gemini-2.5-pro.JSON-schema responses +
ThinkingConfigbudgets; deterministic (temperature=0.0).DSQ-friendly controls via env:
GENAI_CONCURRENCY,GENAI_RPM,GENAI_MAX_TOKENS,GENAI_THINK_TOKENS.
Kubernetes
- Service ports standardized: cluster port 80 → container 8080 (both
mcp-serverandinsight-agent). - Dev overlay sets Vertex/env knobs; Vertex Dockerfile runs
uvicorn. - Added HPA/VPA (Autopilot) manifests under
kubernetes-manifests/finops/.
FinOps
- Cloud Logging cost cut via exclusion filter on
_Defaultsink. - Enabled Vertical Pod Autoscaling on cluster.
- Added HPA/VPA for all app agents (
userservice,transactionhistory,frontend,mcp-server,agent-gateway,insight-agent, etc.). - Insight-agent HPA set to minReplicas: 1 (keeps latency predictable for demos).
UI (Streamlit)
- Image rebuilt & pushed:
.../budget-coach-ui:v0.1.1. Fixed stale deployment (judges overlay) & set envs:
INSIGHT=http://insight-agent/apiUSERSVC=http://userservice:8080- Port fix:
MCPSVC=http://mcp-server(service on 80; no:8080).
UI now transforms BoA txns to
{date,label,amount}before POSTing to the new FastAPI APIs.
Validation
make smoke-fastandmake smoke-e2epassed:- Fraud: high-quality structured findings with SAR recommendations.
- Spending: top categories + unusual count.
- Coach: budget summary + buckets + tips.
Manual UI checks confirm end-to-end flow after env + port fix.
Notables / Footguns avoided
- 422 Unprocessable Entity earlier was due to stale UI posting old
timestampshape; fixed by deploying v0.1.1 UI. - ConnectTimeout came from calling
http://mcp-server:8080(service listens on 80). Env corrected.
Paths you’ll see in the repo
- Backend:
src/ai/insight-agent/*(FastAPI, prompts, Dockerfiles, k8s overlays) - UI:
ui/*(Streamlit app, Dockerfile, overlays) - FinOps:
kubernetes-manifests/finops/*(HPA/VPA) - Make targets: deploy, smoke, WI bootstrap, image pinning.
Quick commands (for reference)
# Update UI env to correct ports/paths
kubectl -n default set env deploy/budget-coach-ui \
INSIGHT=http://insight-agent/api \
USERSVC=http://userservice:8080 \
MCPSVC=http://mcp-server
# Re-deploy judges UI overlay
make ui-judges-apply
Log in or sign up for Devpost to join the conversation.