Agentic_AI_with_RAG

Inspiration

I wanted an assistant that actually remembers — one that feels like a real digital butler, not a forgetful chatbot. The goal was to build a local, privacy-first AI that recalls past chats and uses tools intelligently.

What it does

Agentic AI with RAG stores every chat session, summarizes it, embeds it, and recalls relevant context when needed. It can find old notes, tell you what you discussed before, and use tools like checking time or searching memory — all locally.

How I built it

Built in Python using Ollama for local LLMs and embeddings.

Sessions and summaries are saved as JSON files.
Embeddings are generated with nomic-embed-text and compared via cosine similarity for recall.
Tool calls (like time_now, search_notes) are triggered through strict JSON, executed, and the results are fed back into the model.

Challenges I ran into

Forcing the model to output only valid JSON for tool calls.
Managing multi-session memory without bloating storage.
Keeping summaries short but meaningful.
Ensuring stable retrieval relevance using embeddings.

Accomplishments that I am proud of

Created a fully local RAG-based agent that works offline.
Built reliable semantic memory retrieval.
Designed a streaming chat experience that feels natural.
Achieved a stable and elegant architecture with persistent sessions.

What I learned

How to integrate retrieval-augmented memory, design safe tool-calling protocols, and make LLMs behave predictably in long conversations — all while maintaining privacy and full local control.