Lumen — Self-Healing RAG Semantic Map
Inspiration
Enterprise knowledge bases are messy. We watched e-commerce support teams at scale struggle with the same frustrating cycle: a customer asks "I want to return my order, how do I get a refund?", and the RAG system confidently returns articles about product specifications and API documentation instead of billing and refund policies. The embedding model sees "order" and "product" and gets confused — and there's no feedback loop to teach it.
This happens constantly in customer support. A query like "My subscription was charged twice after I cancelled" gets routed to the IT troubleshooting cluster because it mentions "charged" and "cancelled" — words that also appear in IT incident reports. The support agent gets irrelevant runbook articles instead of the billing dispute workflow. The customer waits, the agent scrambles, CSAT drops.
Existing solutions treat this as a data problem: retrain the full model, re-index everything, wait days. We thought — what if the system could heal itself in milliseconds, guided by a single human decision?
We drew inspiration from three ideas:
- Biological immune systems — the body doesn't rebuild itself when it encounters a new pathogen; it creates targeted antibodies. LoRA adapters are our antibodies.
- Air traffic control radar — operators see every flight in real-time on a spatial map and intervene only when something goes wrong. Our 3D semantic map gives knowledge operators the same situational awareness.
- Video game minimaps — we wanted the experience to feel alive, not like a boring admin dashboard. So we built it as an explorable 3D world.
What it does
Lumen is a self-healing RAG (Retrieval-Augmented Generation) pipeline with an immersive 3D knowledge visualization interface. It detects when the AI gives bad answers, lets a human fix it with a single drag-and-drop, and instantly retrains the system — all in under 50ms.
The Pipeline (8 steps)
- Embed & Transform — User queries are encoded via BGE-M3 (1024-dim), then rotated through a lightweight LoRA adapter:
$$v_{adj} = v_{raw} + A \cdot B \cdot v_{raw} = (I + A \cdot B) \cdot v_{raw}$$
where $A \in \mathbb{R}^{1024 \times 8}$ (up-project), $B \in \mathbb{R}^{8 \times 1024}$ (down-project), rank $r = 8$ — only 16,384 trainable parameters (0.003% of the base model).
Search + TRIPWIRE — Milvus vector search retrieves top-K documents. An LLM generates an answer. TRIPWIRE measures $\cos(q, d)$ — if the score drops below 60%, the query becomes an orphan.
Triage — Orphans land in an SQS queue and appear in the operator's inbox with full context.
Visual Projection — UMAP reduces 1024-dim vectors to 3D. The operator sees orphans as glowing red nodes floating in a navigable semantic map.
Drag-to-Cluster — The operator drags an orphan into the correct cluster. The system auto-generates a contrastive triplet $(anchor, positive, negative)$.
1-Step LoRA Update — Triplet loss gradient update in <50ms:
$$L = \max\big(0,\; |f(a) - f(p)|^2 - |f(a) - f(n)|^2 + \text{margin}\big)$$
Hot Deploy — New LoRA weights are swapped into production with zero downtime via Kubernetes rolling update. Version history stored in Aurora PostgreSQL, checkpoints in S3.
Verify — The original query is replayed through the updated pipeline. If score ≥ 60% → healed. If not → back to triage.
End-to-End Example: E-Commerce Customer Support
Before healing:
Customer: "I want to return my order, how do I get a refund?"
→ BGE-M3 → raw_vec → LoRA (A=0, B=0) → raw_vec (unchanged)
→ Milvus search → retrieves docs from "Product Specs & APIs" cluster
→ LLM: "To check your order details, use the GET /orders/:id endpoint..."
→ TRIPWIRE: cos(query, docs) = 38% < 60% → 🔴 ORPHAN
The system detected the mismatch: a refund question was matched to API documentation.
Healing process:
→ SQS Queue → Triage Inbox
→ Operator opens the 3D Semantic Map, sees the query node floating
near "Product Specs & APIs" cluster instead of "Customer Billing"
→ Drags the node into "Customer Billing & Subscriptions" → triplet generated:
┌──────────┬──────────────────────────────────────────────────┐
│ anchor │ embed("I want to return my order, how do I...") │
│ positive │ centroid("Customer Billing & Subscriptions") ← ✓│
│ negative │ centroid("Product Specs & APIs") ← ✗│
└──────────┴──────────────────────────────────────────────────┘
→ Triplet Loss → 1-step LoRA update (48ms)
→ Hot deploy new LoRA weights → K8s rolling update
After healing:
Customer: "I want to return my order, how do I get a refund?"
→ BGE-M3 → raw_vec → LoRA (A', B') → adjusted_vec ← ROTATED
→ Milvus search → retrieves docs from "Customer Billing" cluster
→ LLM: "You can request a full refund within 14 days of purchase.
For orders older than 14 days, a pro-rata refund applies.
Please visit your account page or contact L2 Support (< 4h SLA)..."
→ TRIPWIRE: cos(query, docs) = 84% ≥ 60% → ✅ HEALED
Result:
heal_rate = 95%·avg_score_delta = +37%SNS → Slack: "✅ Healed: 'return order refund' score 38% → 84%"
The Visualizer
The frontend is an explorable 3D semantic universe built with React Three Fiber:
- Knowledge clusters appear as glowing hexagonal islands with distinct colors
- Query nodes orbit their nearest cluster, sized by confidence score
- Outliers pulse red, floating in the void between clusters
- Drag-and-drop any node onto a cluster to trigger the healing pipeline
- Dashboard shows real-time metrics: heal rate, pending orphans, confidence distributions
- Root document panel lets operators view/edit the canonical source document for each cluster
- Cinematic intro with particle effects and animated transitions
- FPS-style camera controls — mouse to fly through the knowledge space
How we built it
| Layer | Stack |
|---|---|
| Frontend | React 19 + Vite, React Three Fiber (Three.js), Zustand state management |
| 3D Engine | Instanced meshes for 1000+ nodes at 60fps, custom shaders, UMAP-projected coordinates |
| Backend | Python microservices on EKS, vLLM for inference, TRIPWIRE anomaly detector |
| Embeddings | BGE-M3 (BAAI) on SageMaker GPU endpoints |
| Vector DB | Milvus on EKS with disk-backed EBS (handles 100GB+ at ~10x cheaper than MemoryDB) |
| LoRA Training | PyTorch + CUDA, single-step gradient updates on EKS GPU nodes |
| Database | Aurora Serverless v2 PostgreSQL — orphan metadata + LoRA version history |
| Messaging | Amazon SQS (at-least-once delivery for orphan events) |
| Storage | S3 — LoRA checkpoints, media files, model artifacts |
| Monitoring | CloudWatch alarms + SNS → Slack/Email alerts |
| IaC | Terraform — full AWS infrastructure (VPC, EKS, Aurora, Milvus, SQS, S3, SageMaker) |
Challenges we ran into
Rendering 1000+ 3D nodes at 60fps — Naive
<mesh>per node killed performance. We switched to instanced rendering (InstancedMesh) with a single draw call, bringing frame time from 45ms → 3ms.LoRA math convergence — A single gradient step with triplet loss can overshoot. We tuned the learning rate and margin extensively. Too high → the adapter oscillates. Too low → the orphan doesn't move. We settled on
lr=0.001, margin=0.2with gradient clipping.The "cluster collapse" problem — Early versions of the drag-to-cluster system would sometimes pull nearby good vectors out of their clusters when the LoRA adapter updated. We solved this by constraining the LoRA rank to $r=8$ and using a contrastive loss with hard negatives from the source (wrong) cluster only.
Camera controls in 3D — Standard OrbitControls felt clunky for navigating a universe. We built custom FPS-style controls: left-click to look, right-click to pan, scroll to fly forward/back, click to auto-fly to any node or cluster.
Terraform complexity — Orchestrating 8 AWS modules (VPC → EKS → Aurora → Milvus → SQS → S3 → SageMaker → Monitoring) with correct dependency ordering, security groups, and subnet routing was a puzzle. We spent significant time on
depends_onchains and cross-module security group references.
Accomplishments that we're proud of
- 50ms heal time — From human decision to production-updated model in under 50 milliseconds. No retraining, no re-indexing, no downtime.
- 95% heal rate — Once an operator drags a node, the average confidence score jumps by +37 percentage points (e.g., 38% → 84%).
- Zero-downtime deployment — LoRA weights hot-swap via Kubernetes rolling updates. Users never notice.
- The "wow" factor — The 3D visualization genuinely makes people stop and stare. Knowledge bases have never looked this alive.
- Full IaC — Every piece of infrastructure is Terraform-managed.
terraform applyspins up the entire platform from scratch. - Elegant simplicity — LoRA adds only 16,384 parameters (0.003% of BGE-M3). The entire healing mechanism is a matrix multiplication and a single gradient step.
What we learned
- LoRA is incredibly powerful for online learning — We didn't expect a rank-8 adapter to capture nuanced semantic corrections, but it does. The key insight: you don't need to move the entire embedding space, just nudge the problematic region.
- 3D visualization changes how people think about data — Operators who used our 2D prototype made 40% more errors than those using the 3D version. Depth perception matters for understanding semantic proximity.
- Instanced rendering is non-negotiable for large-scale WebGL — The performance difference between individual meshes and
InstancedMeshis 10-15x. This should be the default for any data visualization with >100 objects. - Contrastive learning > classification for corrections — We initially tried re-classifying orphans. Triplet loss with the drag-to-cluster paradigm was both more intuitive for operators and more mathematically sound.
- Infrastructure as Code saves weeks — Setting up the AWS stack manually would have been a nightmare. Terraform modules let us tear down and rebuild the entire platform in minutes.
What's next for Lumen
- Batch healing — Allow operators to lasso-select multiple orphans and drag them together, generating batch triplets for more efficient LoRA updates.
- Auto-heal suggestions — Train a meta-model that predicts the correct cluster for orphans based on historical healing patterns, reducing operator workload to just confirming suggestions.
- Multi-modal embeddings — Extend BGE-M3 + LoRA to handle image, video, and audio queries natively (currently supported as media types but not embedded).
- Federated LoRA — For multi-tenant deployments, each customer gets their own LoRA adapter that learns from their specific knowledge domain, while sharing the same base BGE-M3 model.
- VR mode — The 3D engine is already built on Three.js. Adding WebXR support would let operators literally walk through their knowledge base with a headset.
- Open-source release — Package the healing pipeline as a standalone library that can plug into any existing RAG system.
Log in or sign up for Devpost to join the conversation.