Kaori Retail Agent — Decisions from Your Sales Data

## Inspiration

Every retailer in Vietnam sits on the same goldmine and the same problem: years of sales, customer, and inventory data — and no one with the time to turn it into a decision by Monday morning. Hiring a data team isn't realistic for most SMEs and mid-market enterprises, so the data just sits there and decisions get made on gut feel.

We asked a sharper question: what if the data engineer was an agent? Not a chatbot that answers questions, but an agent that ingests raw business data, cleans it, reasons over it, and hands a store manager a ranked list of what to do this week — in plain Vietnamese, with the money impact attached.

## What it does

Kaori Retail Agent turns a retailer's raw sales data into prioritized, explainable next-best-actions — no data engineer required.

Upload anything — messy Excel/CSV of sales, customers, transactions.
Auto-pipeline (12-stage Medallion: Bronze → Silver → Gold) — schema detection, cleaning, Vietnamese-aware PII redaction, a 7-dimension quality gate, and semantic enrichment. Bad data is flagged, not silently trusted.
Agentic reasoning — surfaces revenue at risk (which customers are churning, which SKUs are bleeding margin) and generates next-best-actions ranked by money impact, each with an explanation and an audit trail.
Decide in manager language — outputs speak Vietnamese business terms ("doanh thu có nguy cơ mất", "khách cần giữ chân"), not "inference" and "dtype".

The North Star is deliberately blunt: revenue at risk that a human actually actioned.

## How we built it

Kaori is a production multi-tenant B2B platform, not a weekend prototype:

6 services — Java Spring Cloud Gateway + Auth, and Python FastAPI for the data pipeline, AI orchestrator, LLM gateway, and notifications.
Local-first LLM — Qwen 2.5 14B + BGE-M3 embeddings run on our own infra via Ollama by default, so customer data never leaves the system. External vendors are strictly opt-in, gated behind consent + PII masking.
CDFL reasoning engine ("học 1 hiểu 10" — learn one, understand ten) — a grounding layer with an |OR| coverage gate: if the agent lacks enough grounded knowledge to answer, it declines instead of hallucinating.
Memory palace + knowledge aging — the agent consolidates experience, reinforces what's verified, and decays what's stale.
PostgreSQL 15 + pgvector with row-level-security tenant isolation, plus Redis, Kafka, Temporal, MinIO, ClickHouse, and OpenTelemetry tracing throughout.

## Challenges we ran into

Grounding without hallucination. Our first coverage gate let the quantity of weak matches compensate for quality — so we switched the |OR| gate to max-aggregation: one strong grounded citation now beats ten weak ones.
Running an LLM locally, fast enough. Keeping inference inside a 30s budget meant bounding LLM calls in the request path and degrading gracefully per-item instead of failing the whole run.
Vietnamese-aware privacy. Correctly redacting names, phones, and IDs in Vietnamese before any reasoning step.
Multi-tenant isolation as an invariant, not a hope. Making "zero cross-tenant leak" something we test on every query, not something we trust.

## Accomplishments that we're proud of

It's built to be deployed, not just demoed. Because this event is about deployment conversations, we built the governance an enterprise actually needs to say yes: EU AI Act compliance built in — risk classification per AI-use, human-oversight gates before high-risk side effects, machine-readable AI-output disclosure, Annex IV model cards, an incident register, and bias examination inside the quality gate.
Every automated decision is auditable — confidence, alternatives, and lineage are logged, so a manager can always ask "why did the AI say this?"
A real working platform — multi-tenant, privacy-first, with thousands of automated tests across services and a multi-language frontend (vi / en / ja / ko / zh).
An agent that knows when to say "I don't know" — the discipline to decline turned out to be the hardest and most valuable thing we shipped.

## What we learned

Teaching an agent a concept and letting it generalize beats hardcoding rules — a single "money" principle let it reason across cases we never explicitly coded. And the gap between an impressive demo and a deployable system is almost entirely trust: isolation, auditability, and the discipline to decline. Production-readiness isn't a feature you add at the end — it's the thing you design around from the first commit.

## What's next for Kaori Retail Agent — Decisions from Your Sales Data

A guided pilot with a Vietnamese retailer (the Retail track brief).
Self-hosted LLM tuning for Vietnamese retail vocabulary.
Deeper process-mining and adoption analytics to close the loop from decision to measured outcome.
Rolling out the full multilingual UI (i18n already in place across 5 languages) for regional ASEAN expansion.