The DOC Parsers

Inspiration

Medical benefit drug policy review is operationally painful. Analysts often need to answer practical questions like:

is a drug covered
does it require prior authorization
what step therapy applies
how do payer rules differ
what changed from the last version

The source documents are usually long, inconsistent PDFs filled with policy history, references, and administrative text. Generic PDF chat tools can summarize them, but they do not consistently return analyst-ready policy intelligence.

We built DOC Parsers to turn those PDFs into a structured workflow for Q&A, comparison, and change tracking.

What it does

DOC Parsers supports five main user flows:

1. Ask

Users ask a payer-specific question for a drug and receive:

a readable summary
a detailed explanation
evidence snippets
links back to the exact PDF pages

2. Compare

Users compare multiple payers for the same drug and see normalized differences across:

coverage
prior authorization
step therapy
site of care
effective date

3. Changes

Users compare two policy versions and get:

meaningful change summaries
field-level differences
version-aware evidence

4. Upload

Users can upload a new PDF into the local corpus so it can be indexed and analyzed.

5. History

Users can reopen stored runs and review:

prior requests
saved responses
evidence-backed explanations

How we built it

Frontend

React
Vite

The frontend separates the workflow into dedicated tabs:

Home
Ask
Compare
Changes
Upload
History

It also preserves local tab state with browser localStorage.

Backend

FastAPI

The backend handles:

document discovery
upload and ingest
retrieval orchestration
OpenAI extraction
comparison and change tracking
saved run history

Retrieval

PageIndex

This is one of the core differentiators.

Instead of treating a PDF as flat text, PageIndex helps us retrieve structured sections like:

Criteria for Initial Approval
Coverage Rationale
Policy
Scope of Policy

That improves downstream extraction quality because the model sees the policy logic instead of random chunks.

Extraction

OpenAI API

We use OpenAI to convert retrieved evidence into structured policy outputs that are useful for analysts, not just generic summaries.

Storage

SQLite for local operational storage and cached results
local filesystem for PDFs, cache, and PageIndex artifacts
Neo4j for cross-document graph relationships

Architecture Summary

High-level system flow:

user interacts with the React frontend
FastAPI receives the request
PageIndex retrieves the right policy sections from local PDFs
OpenAI extracts structured policy outputs
SQLite and local files store operational data and artifacts
Neo4j stores graph relationships across payers, policies, drugs, versions, requirements, and evidence

Challenges we ran into

payer policy PDFs are inconsistent in structure and quality
version tracking is hard when documents are very similar or partially duplicated
retrieval quality matters more than generic chunking because the wrong section leads to the wrong answer
graph persistence needs normalization to avoid noisy or duplicated relationships
keeping the system local-first while still supporting structured extraction and graph analysis required careful layering

Accomplishments that we're proud of

built a working end-to-end workflow across Ask, Compare, Changes, Upload, and History
integrated PageIndex into a real policy-analysis pipeline
connected the extracted outputs into Neo4j for graph-aware analysis
added PDF page links and evidence-backed results instead of freeform answers
preserved both backend history and frontend local session memory for demo and auditability.

What we learned

for document intelligence, retrieval quality is often more important than prompt complexity
domain-specific normalization is critical; analysts need structured outputs, not just summaries
graph persistence becomes much more useful once the evidence and requirement relationships are clean
local-first architecture makes a demo more stable and easier to reason about during development

What's next for The DOC Parsers

improve version-diff accuracy for near-duplicate policy versions
strengthen extraction for weaker payer documents
add richer graph visualizations in the product
expand the upload-to-index workflow for faster out-of-the-box onboarding
improve export and analyst reporting flows

Built With

Updates

Kartik Suhas Marathe started this project — Apr 05, 2026 01:25 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.