Inspiration
I wanted an assistant that actually remembers — one that feels like a real digital butler, not a forgetful chatbot. The goal was to build a local, privacy-first AI that recalls past chats and uses tools intelligently.
What it does
Agentic AI with RAG stores every chat session, summarizes it, embeds it, and recalls relevant context when needed. It can find old notes, tell you what you discussed before, and use tools like checking time or searching memory — all locally.
How I built it
Built in Python using Ollama for local LLMs and embeddings.
- Sessions and summaries are saved as JSON files.
- Embeddings are generated with
nomic-embed-textand compared via cosine similarity for recall. - Tool calls (like
time_now,search_notes) are triggered through strict JSON, executed, and the results are fed back into the model.
Challenges I ran into
- Forcing the model to output only valid JSON for tool calls.
- Managing multi-session memory without bloating storage.
- Keeping summaries short but meaningful.
- Ensuring stable retrieval relevance using embeddings.
Accomplishments that I am proud of
- Created a fully local RAG-based agent that works offline.
- Built reliable semantic memory retrieval.
- Designed a streaming chat experience that feels natural.
- Achieved a stable and elegant architecture with persistent sessions.
What I learned
How to integrate retrieval-augmented memory, design safe tool-calling protocols, and make LLMs behave predictably in long conversations — all while maintaining privacy and full local control.
What's next for Agentic_AI_with_RAG
Next steps include:
- Adding voice input/output for a natural assistant feel.
- Expanding tool support (weather, system control, scheduling).
- Building a web dashboard for managing memory visually.
- Improving retrieval quality with advanced vector stores.
Log in or sign up for Devpost to join the conversation.