Inspiration

The inspiration for DocuMind came from observing how students and professionals struggle with information overload in academic environments. Traditional search methods often return fragmented results, forcing users to piece together information from multiple sources manually. We envisioned an intelligent assistant that could understand context, retrieve relevant information from local documents, and provide comprehensive answers.

The idea came during late-night study sessions at NITC, where we found ourselves constantly switching between multiple PDFs, research papers, and web searches to find coherent answers to complex questions.

What We Learned

Building DocuMind taught us several cutting-edge technologies:

Technical Skills

  • Vector Embeddings: Mastered semantic similarity using sentence transformers and FAISS for efficient vector storage
  • RAG Architecture: Gained experience with Retrieval-Augmented Generation, combining retrieval systems with large language models
  • LLM Integration: Learned to work with Ollama and Llama 3.1, including prompt engineering and temperature tuning
  • Hybrid Search: Implemented fallback mechanisms combining local document search with Google Custom Search API

Development Skills

  • Problem decomposition and breaking complex AI workflows into manageable components
  • Performance optimization through caching strategies for vector stores
  • User experience design for complex backend systems

How We Built It

Architecture

DocuMind follows a RAG pipeline architecture:

User Query -> Document Retrieval -> Context Enhancement -> LLM Processing -> Response

Tech Stack

  • Frontend: Streamlit for user interaction
  • Embeddings: Sentence Transformers for semantic understanding
  • Vector Database: FAISS for similarity search
  • Language Model: Ollama with Llama 3.1
  • Document Processing: LangChain for text handling
  • Fallback Search: Google Custom Search API

Key Features

  • Intelligent document retrieval with semantic search
  • Web search fallback when local knowledge is insufficient
  • Hallucination detection using cosine similarity scoring
  • Persistent vector store caching for performance

Challenges We Faced

Technical Challenges

  1. GPU Memory Management: Running Llama 3.1 on Colab's T4 GPU required careful optimization
  2. Vector Store Persistence: Implemented FAISS index caching to avoid regenerating embeddings
  3. Response Quality: Developed similarity-based validation to ensure responses are grounded in source documents
  4. Server Management: Proper Ollama server lifecycle management in Colab environment

Solutions

  • Model quantization for GPU constraints
  • Smart caching strategies for faster response times
  • Hierarchical retrieval system with graceful fallback
  • Comprehensive dependency management and testing

Impact

DocuMind demonstrates how modern AI can enhance information retrieval while maintaining transparency about sources and reliability. It serves students, researchers, and professionals in making informed decisions faster and more accurately.

The project combines the reliability of local document search with the breadth of web knowledge, creating a practical tool for everyday knowledge work.

Built With

Share this project:

Updates