Mnemo Desk

An agent that remembers what matters, forgets what is stale, and gets measurably smarter every session.

Mnemo Desk is a Qwen-powered personal operations assistant for freelancers and solo operators, the people who retype their rates, client preferences, and recurring context into AI tools every single day. Instead of hoarding embeddings, it runs a memory governance layer: it learns from your corrections, retires facts when they go out of date, resolves conflicts by recency and confidence, and proves it gets better with a live before-and-after accuracy meter.

Inspiration

We kept watching freelancer friends paste the same context into AI tools every morning: their rate, how each client likes replies, which deadline is real this week. The tools either forgot everything or, worse, confidently repeated a number that changed months ago. Recent agent-memory research kept pointing at the same gap: storage is solved, but keeping memory correct as facts change is not. We wanted to build the agent that proves it learns from corrections and forgets on purpose.

What it does

Mnemo Desk is a personal operations assistant that extracts structured memories from everyday chat, supersedes stale facts when you correct it, resolves conflicts by recency and confidence, and forgets unreinforced facts on a schedule. A live accuracy meter runs a fixed question set through a memoryless Qwen baseline and through Mnemo side by side, so anyone can watch the score climb. Every answer cites the exact memories it used. It helps freelancers and one-or-two person businesses stop re-explaining their context.

How we built it

We built a single Next.js 14 app with TypeScript, Tailwind, and a shadcn/ui component layer. The interesting part is a typed memory governor that owns extraction, store and reinforce, supersession, confidence-weighted retrieval, conflict resolution, and decay-based forgetting, exposed through Next.js route handlers as MCP-style store, retrieve, supersede, and forget operations. Qwen 3.7 on Alibaba Cloud Model Studio does the reasoning over the OpenAI-compatible endpoint, with a deterministic local fallback so the demo never breaks.

Challenges we ran into

Tuning the forgetting policy was the hard one: decay aggressively enough to demo governed forgetting, but never delete a fact the judge set seconds ago. We solved it with a simulated clock plus per-type relevance windows that extend with reinforcement and pinning. The second was making the accuracy gain reproducible, so we scored a fixed question set and kept the baseline deterministic instead of running two live models.

Accomplishments that we're proud of

The supersession story actually works end to end: correct your rate and the old value goes stale, the new one gets cited, and the meter holds at the right answer. We are proud that the whole governance layer, supersession, decay, conflict resolution, citations, and an audit trail, is visible and auditable in the UI rather than hidden behind a vector database.

What we learned

We learned that the value in agent memory is mostly in governance, not retrieval. Modeling confidence, reinforcement, and a relevance window per memory gave us conflict resolution and forgetting almost for free. We also learned how much trust a citation chip buys: showing why a memory was used changes how people feel about an agent's answer.

What's next for Mnemo Desk

  • Swap the in-memory store for a managed Alibaba Cloud database with embeddings for fuzzy retrieval
  • Expose the store, retrieve, supersede, and forget tools as a published MCP server
  • Run the LongMemEval and LoCoMo suites to publish a measured gain over the baseline
  • Add multi-user workspaces so a small team shares one governed memory
  • Harden the poison-and-drift guardrail with an approval queue for suspicious writes

Built With

  • alibaba-cloud-model-studio
  • docker
  • next.js-14
  • qwen-3.7
  • radix-ui
  • react-18
  • shadcn-ui
  • tailwindcss
  • typescript
Share this project:

Updates