Inspiration
We were driven by the need to extract actionable insights from long, complex documents—be they legal briefs, research reports, or financial statements. The goal? Transform static files into dynamic, queryable resources.
What it does
📄 Document Analysis: Uses Azure Document Intelligence to extract text, tables, structure (titles,
🔍 Semantic Search: Converts content into embeddings using NVIDIA NV‑EmbedQA‑E5‑v5
, indexes them in FAISS for lightning-fast similarity search
How we built it
Azure Document Intelligence (Layout model): Extracts structured data—text, tables, hierarchies
NVIDIA NV‑EmbedQA‑E5‑v5 embeddings: High-quality, instruction-tuned model tailored for info retrieval tasks
FAISS with HNSW indexing: Supports efficient k-NN retrieval in large embedding spaces
Exposed via a FastAPI server (endpoints /, /analyze, /query), wrapped in Docker for portability
Challenges we ran into
Long-document support – managing context without performance degradation
Precision vs. Fluency – extractive summaries can miss nuance; abstractive are costly to compute
Scaling embeddings + search – balancing embedding generation latency and FAISS index updates
Accomplishments that we're proud of
Full PDF + table extraction powered by Azure's layout model
Seamless integration: end-to-end pipeline from ingestion to semantic search
Highly accurate retrieval: embeddings + FAISS delivering relevant snippets
What we learned
Hybrid embedding strategy works best: extract structured data (like tables) then embed textual context
Azure Layout model excels at capturing document structure and tables learn.microsoft.com
FAISS HNSW indexing provides the speed and scalability needed for interactive applications
What's next for QuantaDoc
Multi-document / cross-document search across entire libraries
Custom summaries (e.g., legal briefs, executive summaries)
Integrations: Slack, Teams, Drive workflows for in-context retrieval
Optional GPU-backed FAISS for ultra-fast, large-scale indexing
Log in or sign up for Devpost to join the conversation.