Intent Drift Radar

Inspiration

Long-running AI agents and copilots often fail when a user’s goal changes silently over time. Chat UIs don’t model “intent over time” or produce decisions agents can trust. I wanted to build something that treats Gemini 3 as a temporal reasoning engine - not a chatbot and outputs a deterministic, evidence-backed drift decision that downstream systems can act on (pause, re-plan, escalate). The idea was: what if we could detect intent drift the way we detect anomalies in logs with traceability and consensus? That led to Intent Drift Radar and the optional Ensemble Mode (low/medium/high thinking in parallel, majority vote, evidence agreement).

What it does

Intent Drift Radar takes a time-ordered signal stream (e.g. Day 1 … Day 5: notes, decisions, declarations) and answers: Did the user’s original intent drift? If so, why, when, and how confidently?

Single-run analysis: One Gemini 3 call returns baseline intent, current intent, drift yes/no, confidence, evidence tied to specific days, reasoning cards, and a compact drift signature for agent orchestration.
Ensemble Mode (optional): Three parallel Gemini calls (low / medium / high thinking) with majority voting and evidence bucketing (3/3, 2/3, 1/3 agreement). One consensus result, no extra “meta” model call.
Judge Mode / Quick Demo: Load a demo dataset and see a cached result instantly so judges can evaluate without waiting for live API calls. A callout offers “Run Ensemble (Live)” for discoverability.
Traceability: The UI links evidence and reasoning cards to timeline days - hover or click to see which days drove the decision. Copy summary, submit feedback (confirm/reject drift).

Built for builders of autonomous agents and copilots who need reliable intent-change detection, not another chat interface.

How I built it

Backend: FastAPI (Python 3.11). Single endpoint POST /api/analyze (one Gemini 3 call, configurable thinking level) and POST /api/analyze/ensemble (3 parallel calls, deterministic consensus). Prompt lives in docs/ai-studio/prompt.md; response is validated with Pydantic, then postprocessed (guardrails, drift signature normalization). Retry-with-repair on invalid JSON; model fallback on 404; per-call timeouts (25s single, 50s per run in ensemble).
Frontend: React 18 + TypeScript, Vite. Timeline panel (days + signals), analysis panel (drift banner, evidence, reasoning cards, mode label, “prove it” DevTools hint), evidence panel with day refs, feedback form. Ensemble: toggle in settings, optional callout above analysis when viewing cached demo, expandable “Ensemble breakdown” (per-mode table + evidence agreement chips).
Infra: Terraform for GCP — Cloud Run service (120s request timeout for ensemble), Artifact Registry, Secret Manager for GEMINI_API_KEY. Single Dockerfile serves the built frontend + uvicorn backend.
Docs: README (quick demo, judge checks, timeout notes), architecture doc, release notes. Judge check script for pre-submit validation.

All built and deployed as a solo developer: backend, frontend, ensemble logic, Terraform, and copy.

Challenges I ran into

Structured output reliability: Gemini sometimes returned valid-looking JSON that failed Pydantic (e.g. extra fields, wrong types). Added strict schema in the prompt, retry-with-repair (one retry with “fix this JSON” instruction), and postprocess guardrails (normalize drift signature, clamp confidence) so the API contract stays deterministic.
Ensemble timeouts: Running 3 Gemini calls in parallel hit Cloud Run’s default request timeout and sometimes per-call timeouts. Fixed by: increasing Cloud Run timeout to 120s in Terraform, raising per-call timeout to 50s for ensemble only, and documenting 504 behavior and curl checks in the README. Partial success (2/3 runs) still returns 200 with consensus.
Judge Mode without extra calls: Judges needed a fast path without triggering live Gemini. Implemented Quick Demo: load demo dataset, serve cached result from /api/demo, same UI and schema as live. Added X-IDR-Mode header and “prove it” instructions so evaluators can verify demo vs live in DevTools.
Discoverability of Ensemble: Wanted judges to see “you can also run Ensemble” without changing defaults or auto-calling the API. Added a small callout above the analysis panel when a cached result is shown, with a single “Run Ensemble (Live)” button that calls the ensemble endpoint and replaces the result.

Accomplishments that I'm proud of

Production-ready contract: Drift signature (IDR:v1|dir=…|span=…|e=…|conf=…), evidence day refs, and reasoning cards give agents and humans a clear, parseable decision layer — not free-form chat.
Ensemble consensus without a fourth call: Consensus is computed in-process (majority vote, median confidence, evidence bucketed by agreement). No extra “arbitrator” model; the UI shows both consensus and per-run breakdown.
Full traceability: Evidence and reasoning cards link to timeline days; pinned/hover state shows which days drove the decision. Copy summary and feedback (confirm/reject) close the loop for evaluation.
Deployed and evaluable: Live app on Cloud Run, Quick Demo for one-click judge flow, health/version endpoints, Terraform for reproducible infra, and a judge check script so the project can be validated end-to-end.

What I learned

Treating the model as a reasoning engine changes the design. Once we stopped thinking “chat” and started thinking “temporal decision over a signal stream,” prompt structure, schema, and postprocessing became the main levers for reliability.
Timeouts and parallelism need to be tuned together. Ensemble’s 3 parallel calls required raising both per-call timeout (so “high” thinking could finish) and the Cloud Run request timeout so the whole request didn’t 504 before consensus.
Judge experience matters. Quick Demo + cached result + one “Run Ensemble (Live)” callout let evaluators see the product in seconds and still try the advanced path without changing default behavior.

What's next for Intent Drift Radar

Feedback loop in the pipeline: Use confirm/reject + comment from the UI to refine prompts or fine-tune (e.g. store feedback and periodically retrain or adjust few-shot examples).
More signal types and windows: Support different baseline/current window sizes and signal types (e.g. actions, errors) for richer temporal reasoning.
Agent SDK: Small client library (e.g. “call Intent Drift Radar with this signal buffer, get drift decision + signature”) so other apps and agents can embed drift detection without building their own UI.
Observability: Structured logging and optional metrics (e.g. drift rate, confidence distribution) for production operators.

Built With

fastapi
gar
google-cloud-run
google-gemini-3-pro-api;-react-18
pydantic
python-3.11
secret-manager
typescript
vite;-terraform

Updates

Private user started this project — Jan 29, 2026 04:53 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.