PolicyIQ

Inspiration

Every quarter, pharma market access teams face the same grind: download a dozen PDFs from Cigna, UHC, BCBS, Florida Blue, Priority Health, each formatted differently, each burying prior authorization rules and step therapy requirements in slightly different language, and manually transcribe them into a spreadsheet to compare coverage. A single drug like bevacizumab can have meaningfully different rules at every payer, and those rules change quarterly with zero automated notification.

We talked to analysts who described spending 4+ hours per drug per quarter just on this intake process, before any actual analysis even begins. The data exists. It's just locked in PDFs. We wanted to unlock it.


What It Does

PolicyIQ is an AI-powered medical benefit drug policy tracker that turns payer PDFs into structured, searchable, comparable data in about 20 seconds per document.

  • Ingest any payer PDF: drag-and-drop upload or paste a direct URL to a payer portal link. PolicyIQ parses it, extracts 15 structured fields (drug name, J-code, coverage status, prior auth, step therapy, ICD-10 codes, clinical criteria, site-of-care, quantity limits, access tier, and more), and stores it in a searchable database.
  • Multi-drug mega-document support: some payers (like Priority Health) publish a single Medical Drug List covering dozens of drugs. PolicyIQ detects this automatically and extracts each drug independently from its relevant section.
  • Cross-payer comparison: select any drug and see a side-by-side table across all payers, including a Rebate Tier Card showing preferred/non-preferred access status, the exact signal that matters for contracting decisions.
  • AI Q&A assistant: ask natural language questions like "Does Cigna require step therapy for bevacizumab?" and get a streamed, cited answer backed by actual policy text, not hallucination.
  • Change detection: when a new version of a policy is uploaded, PolicyIQ diffs it field-by-field and categorizes changes as Clinical (coverage, PA, step therapy, affecting patient access) vs. Administrative (dates, formatting, lower urgency).
  • CSV export: one click to download the full normalized database for use in existing analyst workflows.

How We Built It

Backend: Python 3.11 + FastAPI, chosen for fast iteration and zero-config static file serving. All data lives in a single SQLite file with FTS5 for BM25 full-text search, with no database infrastructure to manage during a hackathon.

PDF parsing: PyMuPDF (fitz) for layout-aware text extraction. It preserves column order and table structure far better than plain text extractors, which matters when clinical criteria span multi-column layouts.

LLM pipeline (the core): Two Groq models at different points in the pipeline:

$$\text{PDF bytes} \xrightarrow{\text{PyMuPDF}} \text{raw text} \xrightarrow{\text{smart truncate}} \text{12K chars} \xrightarrow{\texttt{llama-3.3-70b}} \text{JSON} \xrightarrow{\text{normalize}} \text{SQLite}$$

  • llama-3.3-70b-versatile for structured extraction, reliably following a strict 15-field JSON schema at temperature=0.1
  • llama-3.1-8b-instant for streaming Q&A and multi-drug detection, 4x faster for tasks where latency matters more than extraction precision

Smart truncation: Long PDFs can't fit in a single LLM context window. Instead of naive head-truncation (which drops clinical criteria buried in the middle), we use keyword-aware windowing. Each processed document is assembled from three parts: the first 3,000 characters (to capture header and drug identity information), the single 4,000-character sliding window with the highest density of clinical keywords like "prior auth," "step therapy," and "ICD codes," and the final 2,000 characters (to catch any trailing criteria). This ensures clinical criteria buried deep in long documents are always included, without ever exceeding the model's context limit.

Frontend: Vanilla HTML + CSS + JavaScript with no build step, no bundler, and instant reload during development. Streaming chat is implemented with the browser's native EventSource API consuming Server-Sent Events from FastAPI.


Challenges We Ran Into

JSON reliability from LLMs. Even with a tight system prompt, models occasionally return markdown-wrapped JSON or truncate mid-object. We built a two-attempt recovery loop: if json.loads() fails, we send the malformed output back to the model with a correction prompt at temperature=0.0. This brought parse reliability from ~85% to ~99%.

Multi-drug mega-documents. Priority Health's 2026 Medical Drug List covers dozens of drugs in one PDF. A naive single-extraction call returns only the first drug it finds. We solved this with a two-pass approach: first use the fast 8B model to detect all drug names, then anchor each drug's extraction to the densest cluster of its mentions in the document, finding the best section center by:

$$c^* = \underset{c \in \text{positions}(d)}{\text{median}}$$

and extracting a 10,000-character window centered there.

FTS5 content table synchronization. SQLite's FTS5 in content= mode requires manual sync, as the virtual table doesn't auto-update when the base table changes. We hit stale search results until we understood that INSERT INTO policies_fts(policies_fts) VALUES('rebuild') is required on startup, and individual row deletions/insertions must be mirrored explicitly.

SSE streaming across FastAPI + browser. FastAPI's StreamingResponse buffers by default at the NGINX/proxy layer. The X-Accel-Buffering: no header was required to get true token-by-token streaming. Without it, the entire response would arrive at once, defeating the point.


Accomplishments We're Proud Of

  • A real, working PDF-to-structured-data pipeline that handles documents we'd never seen before, not just the sample PDFs we trained the prompt on
  • A smart truncation algorithm that consistently captures clinical criteria sections that would be lost with naive truncation, without ever exceeding the LLM context budget
  • Multi-drug segmentation that correctly handles biosimilar aliasing (Mvasi, Zirabev to bevacizumab) vs. truly distinct molecules
  • A clean, zero-infrastructure stack that any analyst could run locally with one command, with no Docker, no cloud setup, and no database migration scripts

What We Learned

  • Constrained extraction beats open-ended summarization. Telling the model "fill this exact schema" with hard rules (coverage_status must be one of four values, prior_auth_required must be boolean) is far more reliable than asking it to summarize a document. The schema itself is a forcing function.
  • Two models beat one. Using a large model for heavy extraction and a small fast model for everything else (detection, Q&A, streaming) cuts latency on user-facing operations dramatically without sacrificing extraction quality.
  • SQLite FTS5 is underrated. BM25 full-text search with Porter stemming, zero setup, no external dependency. It handled every search workload we threw at it without a single query taking more than a few milliseconds.
  • Healthcare data is structurally messy by design. Payers don't standardize terminology, field names, or document structure, intentionally or not. Robustness to variation (not brittleness to it) is the core engineering challenge.

What's Next for PolicyIQ

  • Automated policy monitoring: scheduled crawls of payer portals to detect and alert on policy changes without manual re-upload
  • OCR support: scanned PDFs (image-only) are currently unsupported; integrating Tesseract or a cloud OCR API would close this gap
  • Rebate impact scoring: combine access tier data with claims volume estimates to quantify the revenue impact of each payer's preferred/non-preferred designation
  • Formulary expansion: extend beyond injectable/infused drugs to oral specialty and retail formulary PDFs, which have different document structures but the same underlying extraction problem
  • CRM integration: push policy change alerts directly into Salesforce or Veeva CRM so payer relations teams are notified in their existing workflow

Built With

Share this project:

Updates