AI3D memAdmin — DevPost Submission

Gemini Live Agent Challenge — Live Agents Category

Inspiration

AI agents fail silently. They fixate on narrow topics, drift away from relevant context, and degrade over time — and nobody can see it happening. Logs show individual calls but miss the spatial structure of cognition. We asked: What if you could see an AI agent's memory as a living 3D space, diagnose failures instantly, and fix them with your voice?

Current observability tools (LangSmith, Weights & Biases, Arize) show traces and metrics — flat data for a spatial problem. We built the tool we wished existed: a real-time 3D semantic atlas where every memory is a sphere, position is meaning, and an embedded agent can diagnose its own cognition.

What it does

AI3D memAdmin is a voice-driven 3D administration panel for AI agent semantic memory. It lets operators:

See — Every memory rendered as a sphere in 3D space. Nearby nodes are semantically similar. Clusters, edges, and trajectory paths emerge in real time.
Detect — Four trajectory patterns (EXPLORING, STABILIZING, REVISITING, MODALITY SHIFT) classify agent behavior live. When the agent fixates, a convergence sphere appears around the problem.
Diagnose — An embedded ADK agent reads its own trajectory, searches its own memory graph, and explains what's wrong — with clickable citations that zoom the camera to specific nodes.
Fix — Annotate, relabel, pin, hide, or delete memories directly. Inject cross-domain memories to rebalance retrieval. No retraining. No redeployment.
Speak — Gemini Live API enables full voice control. Search by voice, inject memories by voice, receive proactive alerts when trajectory patterns shift — the agent narrates what's happening without being asked.
Compare — A built-in "drifted" dataset (20 memories showing fixation/drift/forgetting) lets operators instantly see what broken agent memory looks like, then switch to a diverse dataset to see the contrast.
Replay — Timeline scrubber reconstructs any incident in event-time. Watch a fixation form. Watch the fix land. The incident report writes itself.

The 3-minute demo loop:

Load drifted dataset (20 fixated memories) → see them collapse into one cluster
Switch to diverse dataset (25 memories) → watch healthy clusters bloom across 3D space
Inject, search, and annotate memories with direct manipulation
Force a fixation (5 ML memories) → trajectory collapses to STABILIZING
Ask the agent what's wrong → it diagnoses itself with clickable citations
Switch to voice → search, fix, and receive proactive alerts by speaking
Rewind the timeline → replay the entire incident

How we built it

Frontend (React 18 + Three.js + React Three Fiber)

The 3D atlas renders semantic nodes as spheres positioned by 768-dimensional embeddings projected to 3D via Landmark Atlas PCA. Similarity edges (cosine >= 0.70), convex hull clusters, and trajectory paths update in real time via WebSocket. Nine HUD panels layer over the 3D viewport: StatusHUD, NodeInspector, TrajectoryHUD, EventInput, AgentChat, TimelineScrubber, MathTutorial, VoiceMic, and SubtitleBar. State flows through Zustand 5.0.

Voice Pipeline (Gemini Live API)

The browser captures microphone audio at 16kHz via ScriptProcessorNode, encodes to PCM Base64, and streams directly to Gemini Live API using ephemeral tokens (no backend relay — sub-second latency). The model responds with audio, tool calls, and transcriptions. Five voice tools (search_memory, get_trajectory_summary, get_atlas_stats, edit_node, ingest_text) give full atlas control by voice. Proactive alerts are delivered via sendClientContent — the model speaks about trajectory shifts without being asked.

Backend (FastAPI + Google ADK)

The semantic pipeline: embed (Gemini Embedding 2 — Google's newest SOTA embedding model, released March 2026, up to 3072D) → project (Landmark Atlas PCA with barycentric interpolation) → cluster (k-means, auto-k) → store (Vertex AI Vector Search or Qdrant) → track (5-point trajectory window, 4-pattern classifier) → broadcast (WebSocket hub to all connected clients).

The ADK agent (Gemini 2.5 Flash) has 7 tools for full read/write memory access. It can search its own embeddings, read its own trajectory, and diagnose its own fixation patterns — true agent self-awareness.

Deployment (Cloud Run)

Single-command deployment via infra/cloudrun/deploy.sh. Multi-stage Docker build (Node.js frontend → Python runtime). Cloud Run with session affinity, 1–5 autoscaling instances, 1GiB/2CPU.

Challenges we ran into

Gemini Live API + ADK incompatibility — Live API tool calls bypass ADK's ToolContext, so voice tools execute via REST while the ADK agent handles text chat. Two parallel tool execution paths in one app.
Stable 3D projection — t-SNE and UMAP recompute globally, causing all nodes to jump when a new memory arrives. We invented Landmark Atlas PCA: freeze landmark positions, then interpolate new nodes via barycentric weights from k-nearest landmarks. New memories land smoothly without moving existing nodes.
Client-direct audio — Server-relayed audio adds 200ms+ RTT, breaking conversational flow. Ephemeral tokens (v1alpha) let the browser connect directly to Gemini's WebSocket, but required dual AudioContexts (16kHz mic, default-rate playback) and careful sample rate handling.
Proactive voice alerts — Getting the model to speak without being asked required sendClientContent with turnComplete: true instead of sendRealtimeInput. The trajectory alert webhook sends context as a user turn, and the model responds naturally.
Event-time vs. wall-clock timeline — Wall-clock replay compresses bursts and stretches gaps. Event-time spacing gives each memory equal visual weight, making fixation patterns dramatically visible during replay.

Accomplishments that we're proud of

Drifted vs. healthy comparison — One click shows what broken agent memory looks like (collapsed cluster), another click shows healthy diversity. The visual contrast is immediate and compelling.
Agent self-diagnosis — An AI agent that reads its own memory graph, detects its own fixation, and prescribes repairs with clickable citations. This is genuinely novel — no existing tool does this.
3-second fixation detection — What takes hours to find in log files appears as a convergence sphere in 3 seconds.
Proactive voice monitoring — The agent narrates trajectory shifts without being asked. It's a monitoring system with a voice.
Full math transparency — Every algorithm documented in KaTeX with a researcher mode that shows live statistics. Nothing is a black box.
Sub-100ms injection-to-render — From API call to 3D animation: embed, project, cluster, broadcast, and animate in under 100ms.

What we learned

Embedding spaces have rich spatial structure that 2D scatter plots fail to capture. Using Gemini Embedding 2 (SOTA as of March 2026), we see 768-dimensional vectors projected to 3D with clusters and trajectory paths that reveal patterns invisible in metrics dashboards.
Voice changes the relationship with a monitoring tool. Speaking to the atlas and hearing it speak back creates an immediacy that typing can't match — especially for proactive alerts.
Trajectory detection (5-point sliding window, step-distance thresholds) is a surprisingly simple algorithm that yields powerful diagnostics. Four patterns (exploring, stabilizing, revisiting, modality shift) cover the most important agent behavior classes.

What's next for AI3D memAdmin

Multi-agent atlas — Multiple agents' memories in the same 3D space, with cross-agent edge detection
Production pipeline integration — Stream embeddings from LangChain, LlamaIndex, or any RAG pipeline
Automated remediation — Agent not only diagnoses but automatically annotates and rebalances
Audio/video/document embeddings — Gemini Embedding 2 (SOTA, March 2026) supports native multimodal input — render audio clips and video thumbnails as 3D nodes
Team collaboration — Real-time multi-user orbiting with cursor presence (WebSocket hub already supports multi-client sync)

Built With

Gemini 2.5 Flash (ADK agent)
Gemini 2.5 Flash Native Audio Preview (Live API voice)
Gemini Embedding 2 (SOTA multimodal embeddings, released March 2026, up to 3,072D)
Google ADK >= 1.0
Google Cloud Run
Vertex AI Vector Search
React 18 + React Three Fiber + Three.js
FastAPI + Python 3.11
Zustand + WebSocket
KaTeX

Architecture Diagram

Uploaded to DevPost Image Gallery / File Upload as architecture-diagram.png Also in repository at docs/architecture-diagram.png

Automated Cloud Deployment

Bonus Points: Automating Cloud Deployment

Deployment is fully automated via a single shell script: infra/cloudrun/deploy.sh

This script handles Cloud Build image creation, Artifact Registry push, and Cloud Run service deployment with all environment variables, resource limits, and session affinity configured. One command:
export GOOGLE_API_KEY="your-key"
bash infra/cloudrun/deploy.sh YOUR_PROJECT_ID us-central1

Reproducible Testing

See the Reproducible Testing section in README.md for a 10-step verification procedure covering all core features: atlas seeding, 3D visualization, memory injection, semantic search, memory editing, trajectory detection, agent self-diagnosis, voice control, timeline replay, and math tutorial.