Engram

Inspiration

Local LLM agents are only as good as what they remember, but most memory systems just throw away the oldest context when the window fills. Watching agents forget critical instructions mid-task while retaining irrelevant filler made it clear: recency is a terrible proxy for importance. We wanted to build something smarter.

What it does

Engram is a relevance-ranked memory layer for local LLM agents. When context fills, it scores every chunk by importance and evicts the least relevant, not the oldest. It then surfaces exactly which chunks were dropped, giving agents and developers full visibility into what the model no longer knows. Everything runs fully on-device on Nemotron and GX-10, with zero data leaving the local environment.

How we built it

We built Engram as a middleware layer that intercepts context before each inference pass. Each memory chunk is scored using a semantic importance function that weighs relevance to the current task, recency, and retrieval frequency. When the context budget is exceeded, the lowest-scoring chunks are evicted and logged. The system runs entirely on-device, leveraging Nemotron for inference and the GX-10 for efficient local compute.

Challenges we ran into

Defining "importance" is harder than it sounds. A chunk that seems irrelevant early in a conversation can become critical later, so we had to design a scoring function that's adaptive rather than static. Running the scoring pipeline without adding meaningful latency on-device was also a key constraint we had to engineer around.

Accomplishments that we're proud of

We're proud of building a memory layer that actually tells you what it forgot. Most systems silently drop context, Engram makes eviction transparent and inspectable, which is a meaningful step toward more reliable and debuggable local agents.

What we learned

Efficient memory isn't just a context window problem, it's an information prioritization problem. We also learned a lot about the constraints of on-device inference and how much headroom you need to leave for memory management without killing latency.

What's next for Engram

We want to make the scoring function pluggable so developers can define importance for their specific use case. Longer term, we're exploring letting the agent itself flag which memories matter, a step toward truly self-managing agent memory.