Triage AI

Inspiration

Financial institutions process millions of complaints every year. Each one is manually read, categorized, risk-assessed, and routed by a human analyst — a process that takes 10-20 days to resolve a case, burns compliance budgets, and still misses critical escalations. Regulators in the UK, EU, and US are tightening resolution deadlines. The cost of failure is existential.

We asked: what if every complaint was handled with the precision of your best analyst, at the speed of a machine, with a full audit trail — automatically?

That question became Triage AI.

What it does

Triage AI is an autonomous, multi-agent complaint resolution platform built for financial services.

From the moment a complaint is submitted — via web form or voice — a coordinated pipeline of specialized AI agents takes over:

Intake Agent — extracts the core issue, product, and customer intent from raw text or transcribed voice input, with PII redaction built in
Classification Agent — categorizes the complaint using deterministic rules layered with Gemini reasoning, with contextual fallback for edge cases
Risk Agent — scores severity (low → critical), flags regulatory exposure, and surfaces escalations immediately
Compliance Agent — checks the case against financial regulation frameworks and internal policy rules
Routing Agent — assigns the complaint to the correct team based on classification, risk, and domain context
Resolution Agent — proposes an outcome with full reasoning, ready for one-click human approval
QA Review Agent — audits the full pipeline output for consistency before closing the case

What used to take days is done in under a minute. Every decision is logged, every agent call is traced, every token is accounted for.

Operators get a real-time intelligence canvas: live queue with risk rankings, AI confidence scores, cost telemetry, and a step-by-step trace of every agent decision. Teams only see cases assigned to them. Admins see everything.

How we built it

Agentic Core: Built on Google ADK (Agent Development Kit), each specialist agent runs as an independent LLM-backed tool with its own prompt and output schema. A custom supervisor orchestrates state transitions across the full pipeline, deciding which agent to invoke next based on current workflow state.

LLM Layer: All agents run on Google Gemini (gemini-3.0-flash-preivew as default, with gemini-2.0-flash-lite and gemini-1.5-flash as fallbacks) via a centralized LLMFactory that handles model selection, retries, and structured JSON output validation.

Knowledge & Retrieval: pgvector on PostgreSQL for semantic retrieval of past complaints, resolutions, and policy documents. HuggingFace sentence-transformers (BAAI/bge-small-en-v1.5) for local embeddings — no external embedding API needed.

Backend: Python + FastAPI + SQLAlchemy + PostgreSQL. Every agent invocation writes a WorkflowStep record — node name, latency, token count, cost, input/output snapshots — giving full observability. Integrated with LangSmith for ADK agent tracing.

Voice Intake: ElevenLabs integration for voice-based complaint submission — transcribed and passed directly into the agent pipeline via a custom OpenAI-compatible endpoint.

Document Processing: Optional Google Document AI and Vertex AI Search integration for document extraction and managed retrieval.

Integrations: Jira REST API for automatic ticket creation on routed cases.

Evaluation Framework: A custom rubric-based judge benchmarks the classification pipeline against labeled datasets, tracking accuracy, precision, and recall — catching model drift before it hits production.

Frontend: Jinja2 + Tailwind CSS with a real-time queue, live trace streaming via SSE, and role-scoped views for admin and team users.

Challenges we ran into

Agent coordination without state bleed — ensuring each agent's output was schema-valid and didn't corrupt downstream agent context required strict structured output enforcement and retry logic at every transition
Google ADK async bridging — ADK's runner is async-first but parts of our FastAPI stack are synchronous; building a clean sync/async wrapper that didn't deadlock under load was non-trivial
Cost vs. quality tradeoff — running 7 agents per complaint adds up fast. We implemented a deterministic pre-classification layer to short-circuit Gemini calls where rule-based logic suffices, significantly cutting cost per case
Vector retrieval quality — tuning similarity thresholds for pgvector retrieval so agents pull genuinely relevant precedents without noisy context polluting the LLM prompt

Accomplishments that we're proud of

A fully autonomous end-to-end pipeline — a complaint submitted at one end emerges classified, risk-scored, compliance-checked, routed, and resolution-proposed at the other, with zero manual intervention
Per-step observability — every agent call has latency, token count, cost, input/output, and status recorded — full trace replay in the UI
Voice intake — a complaint can be spoken, transcribed via ElevenLabs, and processed through the full pipeline without touching a keyboard
Hybrid retrieval — local vector search over policies, past complaints, and resolutions gives every agent grounded, institution-specific context
Live LLM evaluation harness — a benchmark system that runs against labeled datasets and scores outputs with a deterministic judge, giving us confidence in every deployment

What we learned

Multi-agent systems live or die by their state contracts — clean, validated schemas between agents matter more than any individual prompt
Deterministic rules and LLMs are complementary — layering rule-based pre-classification under Gemini reasoning cuts cost and improves consistency without sacrificing flexibility on edge cases
Observability is not optional in agentic systems — without per-step tracing, debugging a 7-agent pipeline is nearly impossible. Instrument from day one
Google ADK is powerful but opinionated — working with its session and runner model requires upfront architectural commitment, but pays off in clean agent isolation

What's next for Triage AI

Multi-tenancy — full company isolation so multiple financial institutions run on the same platform with separate models, rules, and routing logic
Regulatory rule engine — a structured, updatable compliance rulebook mapped to FCA, CFPB, and GDPR frameworks, dynamically injected into the compliance agent at runtime
Proactive escalation alerts — Slack/email webhooks when a critical case is detected, before a human even opens the dashboard
Fine-tuned classification — train institution-specific classifiers on historical complaint data to eliminate LLM calls for well-understood complaint categories
Resolution feedback loop — capture human overrides on proposed resolutions and feed them back into the evaluation harness to continuously improve agent quality over time
Google Agent Engine deployment — move from local ADK runner to fully managed Google Cloud Agent Engine for production scale

Built With

atlassian
gemini
google
google-adk
jira
langsmith

Updates

Ayman Sayed started this project — Apr 26, 2026 05:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.