MEMORIA CLEW

Remember everything. Find anything.
“Let’s separate signal from hype with a cool head and a sharp knife.”

Inspiration

I read constantly to learn and build through repositories, documentation, newsletters, and technical writing, but recalling that knowledge when it matters is harder than collecting it.

I would read something genuinely useful, forget it weeks later, and then rediscover the same idea only after I had already struggled through a solution. Notes apps helped me store information, but they never helped me remember it at the moment it mattered.

Memoria Clew was inspired by that gap: the space between learning something valuable and being able to recall it when your context changes.


What it does

Memoria Clew is a personal knowledge system that captures everything you read — repos, README files, URLs, newsletters, and headlines — and then proactively recalls relevant knowledge based on your current work context.

Instead of relying on manual search or asking an AI vague questions, Memoria Clew:

  • Captures knowledge once
  • Stores it with structured summaries and tags
  • Monitors changes in your active context
  • Surfaces relevant past items automatically, with explanations

The goal is not to generate new information, but to resurface the right information you already encountered.


Features

✅ Capture + Archive

  • Paste URL or text snippet
  • LLM summary + tag extraction (Claude or Gemini, with automatic fallback)
  • Archive persisted via Firestore (user-scoped collections)
  • Inspectable structured records (title, summary, tags, source, timestamps)

✅ Context Seeding (GitHub)

  • GitHub context sync endpoint
  • Pulls lightweight signals from public repo metadata and READMEs
  • Treated as weak signal (biasing recall, not pretending to "understand your whole codebase")
  • GitHub repos automatically indexed in archive for discovery

✅ Recall Engine (Deterministic + Explainable)

  • Transparent scoring (tag overlap, query match, tool/tech match, recency boost)
  • Confidence thresholding to reduce noise
  • Explanations like: "Matches tags: REACT, TYPESCRIPT"
  • Fallback behavior to avoid empty UX
  • Case-insensitive matching with multi-factor scoring

✅ Pattern Analysis (Agent-Powered Intelligence)

  • Analyzes your complete research archive via LLM (Claude/Gemini)
  • Detects learning themes (what technologies/patterns you focus on)
  • Identifies knowledge gaps (what you haven't explored yet)
  • Generates recommendations (what to learn next based on your trajectory)
  • Exposed via Dashboard UI ("ANALYZE_PATTERNS" button) + MCP tool
  • All reasoning is logged and inspectable (no black-box decisions)

✅ Manual "Proactive" Scan (MVP)

  • One manual scan of a single source (HN RSS)
  • Produces a small set of context-matched items (top 5)
  • No background agents, no always-on crawlers

✅ Trust & Safety Guardrails

  • Rate limiting service + tests
  • Security middleware + tests
  • Explicit "suggestive not authoritative" posture
  • Per-IP rate limiting to prevent abuse

✅ Docker for Local Dev + Deployment Readiness

  • docker-compose.yml included (frontend + MCP server)
  • server/Dockerfile included (containerizable backend)

- frontend/Dockerfile.dev included (dev container)

LeanMCP Integration & Agent Capabilities

Memoria Clew includes a real MCP server powered by @leanmcp/core.

Tools Exposed

  1. memoria_recall - Surface relevant research based on current project context
    • Input: userId, projectTags, projectDescription, query
    • Output: Matched items with relevance scores and explanations
    • Use case: Agent helping you build asks "what have I researched about this?"
  2. memoria_patterns - Analyze research patterns and get learning recommendations
    • Input: userId
    • Output: Detected themes, gaps, recommendations
    • Use case: Agent asks "what should this developer learn next?"

Why This Approach

  • Agent capabilities are explicit, not hidden - you can see exactly what tools agents can invoke
  • Reasoning is transparent - every recall and every pattern analysis logs its scoring and reasoning
  • Integration is clean - tools use the same service layer as the UI
  • Audit trail is complete - all tool invocations are logged
  • No background autonomy - agents only act on explicit invocation, never silently

How It's Used

  • The backend boots REST endpoints (for the UI) and an MCP server (for tool invocation)
  • Agents can invoke tools to:
    • Recall relevant research: "What have I learned about Docker?"
    • Understand patterns: "What should I learn next?"
  • Example agent workflow: "I see you're building with React. You've researched async patterns heavily. You're ready for performance optimization. Here's what you captured about it..." This keeps agent behavior auditable and user-controlled—no background autonomy, only structured tool calls with visible reasoning.

How its built

Frontend

  • Built with React and TypeScript
  • Fully wired to real backend APIs for capture, archive, recall, and GitHub context sync
  • An HTML prototype defines the UI contract and visual system for the dashboard

Backend

  • Node.js service architecture
  • Endpoints for capture, archive, recall, and context ingestion
  • Deterministic logging to make system behavior observable

Recall Engine

  • Transparent scoring (tag overlap, query match, tool/tech match, recency boost)
  • Confidence thresholding to reduce noise
  • Explanations like: "Matches tags: REACT, TYPESCRIPT"
  • Fallback behavior to avoid empty UX
  • Case-insensitive matching with multi-factor scoring

AI Integration

  • Uses function-based LeanMCP API calls to interact with LLMs
  • Ensures structured inputs and outputs rather than opaque prompt chains
  • Multiple models were used during development to compare behavior and reliability

Infrastructure

  • docker-compose.yml included (frontend + MCP server)
  • server/Dockerfile included (containerizable backend)
  • frontend/Dockerfile.dev included (dev container)

Challenges I ran into

  • Cold start problem
    Solved by seeding context from high-signal sources like GitHub repos and README files instead of guessing user intent.

  • Avoiding over-automation
    We intentionally kept recall suggestive rather than authoritative to avoid false confidence.

  • Testing under time pressure
    Unit tests exist, but rapid iteration caused some failures late in development. We prioritized functional correctness and demo stability over perfect coverage.

  • Resisting hype
    The hardest challenge was not overclaiming. We focused on what actually works rather than what sounds impressive.


Accomplishments that I'm proud of

  • Built a real recall engine, not a mocked demo
  • Frontend is fully wired to working APIs
  • Recall results are explainable and deterministic
  • Failure modes are handled gracefully
  • The system never leaves the user with “no results found”
  • Dockerized and ready for deployment
  • Honest scope control within a hackathon timeline

What I've learned

  • Capturing information is easy; recalling it at the right time is hard
  • Explainability builds trust faster than clever AI behavior
  • Happy-path demos are acceptable if you are transparent about scope
  • AI-assisted development works best with clear constraints and guardrails
  • Memory systems should respect past effort, not replace it

What's next for Memoria Clew

  • Stronger recall signals using embeddings alongside heuristic scoring
  • Additional context sources like issue trackers and local notes
  • Improved search and filtering across the knowledge archive
  • Expanded accessibility coverage and end-to-end testing
  • User-controlled tuning of recall sensitivity and thresholds

Memoria Clew is intentionally small today — focused on recall, not hype — and designed to grow without losing trust.

Built With

Share this project:

Updates