Inspiration
We are "Cache Me If You Can" who noticed that most AI chatbots are stateless: every conversation starts from scratch, with no memory of who you are or what you care about. We wanted to fix that by building a system that gives AI agents genuine long-term memory one that learns, retains, and reasons over time.
What it does
Cache Me If You Can is an AI chatbot with persistent, semantic long-term memory. Instead of forgetting everything between sessions, it stores memories in a vector database, retrieves relevant context on each query, and uses that context to give personalized, informed responses. It manages the full memory lifecycle creation, consolidation, and forgetting so memory stays relevant over time.
How we built it
We built a stateful orchestration pipeline using LangGraph for agent logic and Temporal for durable, fault-tolerant workflow execution. The frontend is Next.js, backed by Python services. Memories are stored as vector embeddings in PostgreSQL with pgvector, and we used the Gemini API for language understanding and generation. Everything runs containerized via Docker.
The architecture flows like this: a user query hits a Memory Retrieval Agent that embeds the query and searches the vector DB, passes relevant memory context to a Decision Agent, which generates a response and triggers a memory update back into PostgreSQL.
Challenges we ran into
- Keeping the chatbot interface stateless while the backend manages stateful, durable context
- Designing memory lifecycle logic (ie: when to create, consolidate)
- Ensuring retryable, fault-tolerant workflows with Temporal so no memory writes are lost
- Tuning semantic retrieval so the right memories surface at the right time
Accomplishments that we're proud of
- Built a fully working end-to-end memory pipeline from user query to vector storage and retrieval
- Implemented durable workflow execution so memory operations survive failures
- Designed a system architecture that separates concerns cleanly across agents
What we learned
- How to integrate vector databases (pgvector) for semantic search at the application layer
- How durable execution frameworks like Temporal differ from traditional task queues
- The nuances of memory lifecycle management recency, relevance, and decay
What's next for Cache Me If You Can
- Human-in-the-Loop feedback — let users confirm or reject memory updates, improving embeddings over time
- Smarter memory prioritization — score and rank memories by relevance, recency, and impact with decay
Built With
- docker
- fastapi
- gemini
- langgraph
- nextjs
- pandas
- pgvector
- postgresql
- python
- temporal
Log in or sign up for Devpost to join the conversation.