Inspiration

We were driven by the need to extract actionable insights from long, complex documents—be they legal briefs, research reports, or financial statements. The goal? Transform static files into dynamic, queryable resources.

What it does

📄 Document Analysis: Uses Azure Document Intelligence to extract text, tables, structure (titles,

🔍 Semantic Search: Converts content into embeddings using NVIDIA NV‑EmbedQA‑E5‑v5

, indexes them in FAISS for lightning-fast similarity search

How we built it

Azure Document Intelligence (Layout model): Extracts structured data—text, tables, hierarchies

NVIDIA NV‑EmbedQA‑E5‑v5 embeddings: High-quality, instruction-tuned model tailored for info retrieval tasks

FAISS with HNSW indexing: Supports efficient k-NN retrieval in large embedding spaces

Exposed via a FastAPI server (endpoints /, /analyze, /query), wrapped in Docker for portability

Challenges we ran into

Long-document support – managing context without performance degradation

Precision vs. Fluency – extractive summaries can miss nuance; abstractive are costly to compute

Scaling embeddings + search – balancing embedding generation latency and FAISS index updates

Accomplishments that we're proud of

Full PDF + table extraction powered by Azure's layout model

Seamless integration: end-to-end pipeline from ingestion to semantic search

Highly accurate retrieval: embeddings + FAISS delivering relevant snippets

What we learned

Hybrid embedding strategy works best: extract structured data (like tables) then embed textual context

Azure Layout model excels at capturing document structure and tables learn.microsoft.com

FAISS HNSW indexing provides the speed and scalability needed for interactive applications

What's next for QuantaDoc

Multi-document / cross-document search across entire libraries

Custom summaries (e.g., legal briefs, executive summaries)

Integrations: Slack, Teams, Drive workflows for in-context retrieval

Optional GPU-backed FAISS for ultra-fast, large-scale indexing

Built With

Share this project:

Updates