Eywa | Devpost

Inspiration

I kept hitting the same wall building with AI agents: they forget everything between sessions. Every conversation started from zero, the agent re-learned the same facts, and when it did remember something I had no way to know why it believed it or where it came from. I tried the memory tools that exist. They are cloud-first: they send your conversations to a hosted API to store and recall, they need a key even to read, and on my own data the recall was weak with no provenance. None of them were built for private, local use. So I started building the memory layer I actually wanted: one that lives on my machine, answers with receipts, and that I fully own.

What it does

Eywa is a local-first memory engine for AI agents. Every fact it stores carries a receipt: the source message it came from, the time it was true, and what it replaced. Recall is deterministic and runs with zero LLM calls, so it is fast (median ~15ms, measured), costs nothing per query, and is fully auditable. The whole store is one directory you own. Reads need no API key and work offline. And forgetting is real: forget() erases a fact across every store, not just hides it.

How we built it

The engine is Python. Facts live in SQLite with bitemporal versioning, so a fact is superseded, not overwritten, and the history stays queryable. Vectors live in LanceDB, and a knowledge graph of entities and relations lives in memory with rustworkx. A query fans out three ways at once (dense vectors, BM25 keyword, and graph traversal), fused with reciprocal rank fusion and a cross-encoder rerank. The embedding model is a local ONNX model, so nothing leaves the machine. Contradictions are caught with a small NLI model and resolved by supersession. It ships as a Python SDK, a CLI with a terminal UI, an MCP server so Claude and Codex can use it as memory, and a REST service with a TypeScript client. At HackRome we built voice memory directly into the CLI and TUI: dictate a thought and it becomes a fact with a receipt, transcribed by local Whisper with no key (ElevenLabs optional for cloud quality). We also wired the OpenAI Codex CLI in as an extraction backend, so GPT-5.5 distills clean facts on the write side while reads stay local and keyless.

Challenges

we ran into Keeping the read path completely free of LLM calls while staying accurate took real work: the quality comes from fusing three retrieval signals and reranking, not from asking a model. True erasure was harder than it sounds, because a fact lives in four places (SQLite, vectors, graph, evidence) and all of them have to be cleaned in one cascade. Provisioning the local models with no API key, on any machine, took several iterations. And the graph itself pulled in noise like dates and numbers as entities, so during the event we audited it and started moving entity extraction onto the LLM. "Tests green" did not mean "it works", so we drove every surface like a real user and fixed the bugs the green suites missed.

Accomplishments and what we learned

We published the work as a paper (arXiv 2605.30771) with reproducible benchmarks: 90.19% on LoCoMo, 88.2% on LongMemEval-S, and 81.45% on BEAM. Retrieval makes zero LLM calls; the accuracy comes from what Eywa returns, not from a model guessing. The biggest lesson: provenance is the unlock. Once every fact carries a receipt you can audit it, correct it, and truly delete it, which is exactly what a guess-based memory cannot do. We also learned that local-first is not a preference for some teams, it is a requirement, which is why health, legal, and finance assistants are who this matters to most.

What's next

Multimodal memory, an optional hosted version for teams who would rather not run their own store, and moving the knowledge graph fully onto the LLM for cleaner entities. The local engine stays open source. We shipped Eywa publicly during the hackathon: pip install eywa-core, npm install @getagentseal/eywa, github.com/getagentseal/eywa, eywa.to.