Offline Document QA Assistant

Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

Inspiration

The idea for the Offline Document QA Assistant stemmed from the need to securely process confidential documents without uploading them to external servers. Traditional question answering tools rely on cloud-based models, which poses privacy risks for sensitive data such as legal contracts, medical records or internal company documents.

What it does

Offline Document QA Assistant allows you to ask questions about uploaded files and receive answers and summaries without internet connectivity. The system runs fully locally: it uses open-weight generative models (gpt-oss) to generate responses and summarizations. It performs retrieval-augmented generation by converting documents into vector embeddings and using a vector search engine to retrieve relevant chunks for answering your query. You can interact via a simple command-line interface that displays the answer and shows the sources from which the answer was derived.

How we built it

We utilized the gpt-oss open-weight language models for local inference. Using HuggingFace Transformers and bitsandbytes we quantized the models to 4-bit and loaded them in memory on GPU or CPU.
Document ingestion uses Python libraries to read PDFs and text files, chunk them into overlapping segments, and embed each chunk using a local sentence transformer.
We store the embeddings in a FAISS index; for each question, we retrieve the most relevant chunks based on cosine similarity and then construct a prompt with the retrieved context for the generative model.
We orchestrate these components with LangChain to manage the chains and memory.
The entire stack is containerized to run offline on any machine with at least 8GB of RAM.

Challenges and learning

Running large models locally required careful memory optimization and quantization. Building an end-to-end pipeline of ingestion, indexing, retrieval and generation taught us a lot about vector stores, chunking strategies, and prompt engineering. We also learned how to tune the gpt-oss models to produce concise, grounded answers.

What's next

We plan to extend the interface with a desktop GUI and add support for incremental indexing as documents change. We also want to explore multilingual support and fine-tuning the model on domain-specific corpora. Longer term, we will incorporate smaller models to run on CPU-only machines to make offline QA accessible to more users.

What's next for Offline Document QA Assistant

Built With

faiss
huggingface
langchain
openai-gpt-oss-models
python
pytorch

Updates

M O started this project — Sep 05, 2025 07:35 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.