Mnemosyne — A Human-AI Context Management System

Mnemosyne: the Greek Titaness of Memory, mother of the nine Muses.


Inspiration

Every chatbot interface today presents your conversation history the same way: a chronological list of sessions in a sidebar, each with a short title. You scroll down to find the one you want, open it, and scroll back through a long linear exchange to find the idea you were exploring three days ago. This is not how memory works.

Cognitive psychology has understood since the 1970s that human memory is organized around two complementary systems (Tulving, 1972):

  • Episodic memory — records of specific events, bound together by context, time, and the relationships between people and places.
  • Semantic memory — general knowledge and concepts, organized by meaning rather than chronology.

Modern AI retrieval systems are almost entirely semantic: embed a query, find the nearest vectors, return the top-$k$ results. This works well for factual recall but misses the relational, contextual richness of episodic memory — the how, where, and with-whom of knowing something.

Beyond retrieval, we were frustrated by a second problem: the linearity of the chat window itself. When you are deep in a conversation and a tangential question comes to mind, you have two choices — derail your current thread to explore it, or lose the thought entirely. There is no equivalent of a Git branch for your ideas.

Mnemosyne was born from these two frustrations: a memory system that thinks more like a human, and a conversation interface that lets you explore ideas the way your mind actually does.


What We Built

Mnemosyne is a two-surface application:

Surface 1 — The Memory Explorer

A full-screen, physics-inspired bubble visualization of your past chat history. Rather than a list, memories surface as floating clusters (bubbles) that group semantically and episodically related nodes together.

Each bubble contains a mix of node types extracted from past conversations:

Node type What it stores
Fact Declarative knowledge stated in conversation
Reflection Higher-order patterns and self-observations
Episode Specific conversational exchanges, linked by NEXT edges
Entity Named people, places, and concepts

Clicking a bubble opens a detail panel that shows the full node list, LLM-generated labels (short + full), and episode chains — sequences of past exchanges that can be navigated forward and backward in time.

Surface 2 — The Threaded Chat

A conversation interface built around branching threads, inspired by how Git manages diverging lines of work:

Main Thread  ────●────────────●────────────●─────▶
                              │
              Parallel ───────●────────────●─────▶  (saved to memory, can merge)
                                    │
                   Temporary ───────●─────▶          (ephemeral, not saved)
  • Main thread — the primary conversation, always persistent.
  • Parallel thread — a related sub-task that shares the same memory context. Saved to memory when ended. Can be merged back into the main thread.
  • Temporary thread — a quick side question. Not saved to memory, not merged. Disappears when ended.

An intent router (GPT-4o-mini classifier) watches the main thread and suggests branching when a message looks like it belongs in a separate context.


How We Built It

Memory Backend — GAAMA SDK

The memory layer is built on GAAMA, a graph-augmented memory system with three stores:

  • Vector store — semantic KNN retrieval over node embeddings
  • Graph store — concept graph with typed edges (NEXT, RELATED_TO, etc.)
  • Node store — structured node objects with typed payloads

Ingestion converts raw chat messages into TraceEvent objects, which GAAMA processes into episodic and semantic nodes and links them into the graph.

Bubble Clustering Algorithm

The core of the explorer is a Union-Find (UFDS) clustering algorithm:

Algorithm:

  1. Fetch top-$N$ seed nodes from the vector store for the query.
  2. For each seed, compute its $K$-hop neighborhood $\mathcal{N}_K(s)$ in the concept graph.
  3. For any two seeds $a, b$: union them if $$|\mathcal{N}_K(a) \cap \mathcal{N}_K(b)| > 0$$ This condition implies a graph path of length $\leq 2K$ between some member of set $A$ and some member of set $B$ via a shared intermediate node.
  4. Each disjoint set becomes one bubble, sized by its member count.

With $K=2$, seeds within graph distance $\leq 4$ of each other cluster into the same "topic island." The UFDS structure caches merged neighborhoods, making incremental additions $O(|\mathcal{N}|)$ per new node rather than $O(N^2)$ pairwise.

Bubbles are ranked by the top semantic score of their members, so the most relevant clusters appear first.

Context Packing

When starting a chat from the explorer, we run a budgeted context packing step that fills per-kind quotas from the most-relevant bubbles:

Facts:       up to 5
Reflections: up to 3
Episodes:    up to 3
Skills:      up to 2

The packed context is injected into the LLM system prompt as a <memory> block, giving the assistant grounded access to relevant past experience without flooding the context window.

Chat System

The chat backend is a FastAPI service with:

  • SSE streaming — raw HTTP streaming via sse-starlette, no WebSocket overhead
  • SQLite for chat state — sessions, threads, messages, intent logs in chat.sqlite, completely separate from the GAAMA memory graph
  • LRU session eviction — sessions are touched on access (last_accessed_at), and when a new session is created, old ones beyond the cap of 5 are ingested and evicted
  • Incremental ingestion — every 5 messages, uningested messages are flushed to GAAMA; parallel threads ingest on close; temporary threads never ingest

The frontend is Next.js 16 (App Router) + React 19 with a custom useChat hook that drives all thread switching reactively via useEffect — changing activeThreadId automatically triggers a fetch and load of the new thread's messages, eliminating the race conditions that plague callback-based approaches.


Challenges

Graph Memory is Hard to Visualize Intuitively

The GAAMA graph is rich but non-obvious. The biggest UI challenge was deciding what not to show. We went through several layouts — force-directed graphs, timeline views, tree structures — before settling on topic-island bubbles. The bubble metaphor maps naturally to how we talk about memory clusters ("that whole period when I was learning X"), and the physics-inspired floating reinforces that memories drift and cohere rather than stay fixed.

Thread Semantics Are Deceptively Subtle

The three-thread model (main / parallel / temporary) sounds simple but has many edge cases. What should happen when you end a thread? Merge? Ingest? Discard? We settled on a clear contract:

  • Temporary threads have no memory footprint. They are truly ephemeral — like a scratch pad.
  • Parallel threads are saved to memory when closed but their history does not flow into the main conversation unless explicitly merged.
  • Merging is the only operation that moves conversation history between threads.

Getting this right required coordinating across three layers: the UI (which buttons appear and when), the frontend logic (when to call ingest vs. close), and the backend (the session-level ingest skips temp threads).

React State Coordination Across Async Boundaries

Thread switching went through three iterations before working reliably. The first attempt used a prevThreadRef guard to prevent redundant fetches — this created subtle timing issues when multiple state updates batched together across await boundaries. The second used an imperative callback pattern — same problem. The final solution moved message fetching inside useChat as a useEffect on threadId. This means the hook is the single source of truth: change the ID, messages automatically update, always, from any call site.

Local-First with No Backend Tradeoffs

Keeping everything local (SQLite, local GAAMA graph, no cloud sync) is both a strength and a constraint. You get full transparency, no data leaving the machine, and the ability to fine-tune the memory algorithm directly. But it means no cross-device sync and the memory graph is only as rich as the conversations you've had on that machine. We see this as a deliberate design choice — memory should be yours.


What We Learned

  • Episodic + semantic is the right framing. Users don't remember conversations by topic alone — they remember them by context, sequence, and relationship. The NEXT-edge episode chains turned out to be unexpectedly powerful: being able to navigate a past exchange turn-by-turn felt genuinely different from keyword search.

  • The branch metaphor resonates. Developers who use Git immediately understood the thread model. Non-developers needed the "scratch pad vs. sub-task vs. main conversation" framing. Both groups appreciated that the chat interface didn't force them to scroll up and down a single long thread.

  • LLM context injection is an art. The difference between injecting all retrieved nodes and injecting a curated budget is dramatic. Too much context dilutes the signal; too little misses important grounding. The per-kind budget (facts, reflections, episodes, skills) with relevance ordering gave consistently better answers than naive top-$k$ injection.

  • UI polish matters as much as the algorithm. The streaming dots animation, the blue branch-line tree, the system message markers when threads branch — none of these are technically hard, but they make the difference between a tool that feels like a prototype and one that feels like a product.


Tech Stack

Layer Technology
Memory backend GAAMA SDK (graph + vector + node stores)
Chat backend FastAPI + sse-starlette + aiosqlite
LLM OpenAI GPT-4o (chat) + GPT-4o-mini (intent, titles) + Claude Haiku (bubble labels)
Frontend Next.js 16 (App Router) + React 19 + TypeScript
Visualization D3.js (bubble canvas) + Tailwind CSS
Storage SQLite × 2 (memory.sqlite for GAAMA, chat.sqlite for conversation state)
Tooling uv (Python), bun (Node)

Future Directions

  • Cross-device sync — GAAMA is local-first by design, but a conflict-free replication layer (CRDT over the graph) would allow sync without central storage
  • Richer thread visualization — the conversation tree view on the explorer home page hints at what a full branch-and-merge history visualization could look like
  • Memory editing — allowing users to annotate, correct, or redact specific nodes in the graph, making the memory system transparent and auditable
  • More memory node types — skills, habits, open questions, commitments — extending beyond facts and episodes toward a fuller model of personal knowledge

Built With

Share this project:

Updates