Cache Me If You Can

Inspiration

We are "Cache Me If You Can" who noticed that most AI chatbots are stateless: every conversation starts from scratch, with no memory of who you are or what you care about. We wanted to fix that by building a system that gives AI agents genuine long-term memory one that learns, retains, and reasons over time.

What it does

Cache Me If You Can is an AI chatbot with persistent, semantic long-term memory. Instead of forgetting everything between sessions, it stores memories in a vector database, retrieves relevant context on each query, and uses that context to give personalized, informed responses. It manages the full memory lifecycle creation, consolidation, and forgetting so memory stays relevant over time.

How we built it

We built a stateful orchestration pipeline using LangGraph for agent logic and Temporal for durable, fault-tolerant workflow execution. The frontend is Next.js, backed by Python services. Memories are stored as vector embeddings in PostgreSQL with pgvector, and we used the Gemini API for language understanding and generation. Everything runs containerized via Docker.

The architecture flows like this: a user query hits a Memory Retrieval Agent that embeds the query and searches the vector DB, passes relevant memory context to a Decision Agent, which generates a response and triggers a memory update back into PostgreSQL.

Challenges we ran into

Keeping the chatbot interface stateless while the backend manages stateful, durable context
Designing memory lifecycle logic (ie: when to create, consolidate)
Ensuring retryable, fault-tolerant workflows with Temporal so no memory writes are lost
Tuning semantic retrieval so the right memories surface at the right time

Accomplishments that we're proud of

Built a fully working end-to-end memory pipeline from user query to vector storage and retrieval
Implemented durable workflow execution so memory operations survive failures
Designed a system architecture that separates concerns cleanly across agents

What we learned

How to integrate vector databases (pgvector) for semantic search at the application layer
How durable execution frameworks like Temporal differ from traditional task queues
The nuances of memory lifecycle management recency, relevance, and decay

What's next for Cache Me If You Can

Human-in-the-Loop feedback — let users confirm or reject memory updates, improving embeddings over time
Smarter memory prioritization — score and rank memories by relevance, recency, and impact with decay