Inspiration
The inspiration for DocuMind came from observing how students and professionals struggle with information overload in academic environments. Traditional search methods often return fragmented results, forcing users to piece together information from multiple sources manually. We envisioned an intelligent assistant that could understand context, retrieve relevant information from local documents, and provide comprehensive answers.
The idea came during late-night study sessions at NITC, where we found ourselves constantly switching between multiple PDFs, research papers, and web searches to find coherent answers to complex questions.
What We Learned
Building DocuMind taught us several cutting-edge technologies:
Technical Skills
- Vector Embeddings: Mastered semantic similarity using sentence transformers and FAISS for efficient vector storage
- RAG Architecture: Gained experience with Retrieval-Augmented Generation, combining retrieval systems with large language models
- LLM Integration: Learned to work with Ollama and Llama 3.1, including prompt engineering and temperature tuning
- Hybrid Search: Implemented fallback mechanisms combining local document search with Google Custom Search API
Development Skills
- Problem decomposition and breaking complex AI workflows into manageable components
- Performance optimization through caching strategies for vector stores
- User experience design for complex backend systems
How We Built It
Architecture
DocuMind follows a RAG pipeline architecture:
User Query -> Document Retrieval -> Context Enhancement -> LLM Processing -> Response
Tech Stack
- Frontend: Streamlit for user interaction
- Embeddings: Sentence Transformers for semantic understanding
- Vector Database: FAISS for similarity search
- Language Model: Ollama with Llama 3.1
- Document Processing: LangChain for text handling
- Fallback Search: Google Custom Search API
Key Features
- Intelligent document retrieval with semantic search
- Web search fallback when local knowledge is insufficient
- Hallucination detection using cosine similarity scoring
- Persistent vector store caching for performance
Challenges We Faced
Technical Challenges
- GPU Memory Management: Running Llama 3.1 on Colab's T4 GPU required careful optimization
- Vector Store Persistence: Implemented FAISS index caching to avoid regenerating embeddings
- Response Quality: Developed similarity-based validation to ensure responses are grounded in source documents
- Server Management: Proper Ollama server lifecycle management in Colab environment
Solutions
- Model quantization for GPU constraints
- Smart caching strategies for faster response times
- Hierarchical retrieval system with graceful fallback
- Comprehensive dependency management and testing
Impact
DocuMind demonstrates how modern AI can enhance information retrieval while maintaining transparency about sources and reliability. It serves students, researchers, and professionals in making informed decisions faster and more accurately.
The project combines the reliability of local document search with the breadth of web knowledge, creating a practical tool for everyday knowledge work.
Built With
- faiss
- python
- streamlit
Log in or sign up for Devpost to join the conversation.