Inspiration
For the entire build, the load-bearing constraint was FDA's revised CDS guidance (2026-01-06): a read-only documentation-quality tool stays outside medical-device regulation only if every finding exposes an independently reviewable basis the clinician can verify. We built the architecture around that constraint: sourceSpan, discreteConfidence, and reasoning on all 10 tools.
Mid-build, ACDIS and AHIMA published the 2026 Draft Query Practice Guidelines for AI-assisted CDI (public comment closes 2026-05-16). Section II requires technology-generated queries to cite "clinical indicators sourced from the health record, with location, free from subjective interpretation", plus, an organizational audit trail. The FDA requirement and the ACDIS draft are isomorphic. The architecture we built to stay outside FDA's lane is the architecture a CDI organization needs to satisfy ACDIS auditors. Hospitals lose ~$120K/year/physician to under-documentation; we took the lane the enterprise CDI players couldn't with a retrospective, read-only, document-framed perspective and the regulatory structure turned out to be a moat.
Copy-forward is a Joint Commission-flagged sentinel event contributor, not only an audit liability. An 81%-copied progress note with stale findings is a care coordination failure. Detect → alert physician → accurate status → better-informed handoffs → fewer missed transitions.
What it does
ClinicalLens exposes 10 MCP tools for clinical documentation quality analysis:
run_full_cdi_review— single-call orchestrator: chains completeness, quality, E/M support, specificity gaps, and query opportunities, then AI-synthesises a prioritised narrative. Zero-input mode (useLatestNote: true) when SHARP FHIR context is active.generate_documentation_review— AI-synthesised review with streaming progress notifications.assess_note_completeness— per-encounter-type CMS-element checks (20 elements, weighted)measure_note_quality— composite 0–100 score across 5 dimensions, RAG-bandeddetect_copy_forward— Jaccard similarity + temporal analysis on sequential notescheck_code_documentation_alignment— ICD-10 vs note content + problem-list cross-referenceassess_em_level_support— 2025 CMS MDM-criteria validationdetect_specificity_gaps— under-specified diagnoses (HCC / risk-adjustment surface)analyze_documentation_trends— longitudinal multi-note quality driftcheck_query_opportunities— ACDIS-compliant CDI query generation, LLM-judged for non-leading rubric compliance
Every finding emits sourceSpan (where), discreteConfidence (how certain), and reasoning (why) — nothing is pure score. Every tool also surfaces curated references[] and suggestedNextSteps[] on both the markdown and structured channels.
How we built it
The AI layer has one job rules cannot do: detecting whether a CDI query is leading. That judgment needs language understanding.
- Read-only on FHIR via SHARP context propagation. Three HTTP headers, no installer, no EHR write access.
- Citation anchoring on every finding.
sourceSpan + discreteConfidence + reasoningon all 10 tools — the clause satisfying FDA §3060(a)(1)(D) and 2026 ACDIS draft Section II simultaneously. - LLM-as-judge compliance gate. A cross-family judge scores every CDI query draft on the ACDIS 2022 6-criterion rubric and suppresses non-compliant drafts. Fail-open with audit trail. Without it, leading queries ship undetected.
- Regulatory-boundary output validator. Every LLM finding passes a runtime validator; CDS-leaning prose is suppressed into
suppressedFindings[]so it's gated, not deleted. The §3060(a)(1) admin-vs-device boundary made mechanical. - Single-call orchestrator + zero-input mode (v0.6.0/v0.6.1).
run_full_cdi_reviewchains five analyses + AI synthesis in one call.useLatestNote: truereadsX-Patient-IDfrom SHARP and auto-resolves the latestDocumentReference. No patient prompt needed. - Tiered model posture + prompt caching (v0.5.5+). Synthesis runs on a cheaper, faster model with
cache_control: { type: "ephemeral" }; reasoning models are reserved for the leading-language judge gate where accuracy beats latency.
Challenges we ran into
The most expensive single mistake was discovering, after v0.5.1 had shipped, that the typed structuredContent channel emitted grade: "C" while the human-readable Markdown rendered "Amber: improvable". The same tool speaking two positioning languages on its two channels. The fix wasn't more documentation; it was a type-signature change. (Captured as Insight #020 in our internal log: the structural type is the positioning surface.)
A second challenge: every architectural decision had to be traceable back to one of two regulatory frames (FDA §3060(a)(1) for the device boundary, or ACDIS 2022 + 2026 draft for the query-generation discipline). The framing kept us honest, "is this a positioning move that survives audit?" became the gating question for every PR.
Accomplishments we're proud of
- The FDA ↔ ACDIS isomorphism: the architecture we built to stay outside FDA device regulation is (honestly, by accident) the architecture a CDI organization will need to satisfy ACDIS auditors when the 2026 draft finalises.
- LLM-as-judge compliance gate running on every CDI query draft, with a fail-open audit trail.
- Two-layer regulatory enforcement, observed live in production. (1) The tool-side validator caught "differential diagnoses" in a live LLM finding and suppressed it with an audit entry. (2) When asked a pure CDS question, the chat agent self-declined without invoking any tool and answered with 4 documentation-framed gaps — a posture adopted from our consistent disclaimer placement, not our code. We engineered one layer; the second emerged.
- 735 unit/integration tests + 25-fixture synthetic-data gold set + static-source lints + 10-prompt production smoke test = four layers of regression defence.
What we learned
The single most generalisable lesson: positioning leaks through any surface that emits text a reviewer reads, including the type signature. Encoding a constraint in the README is a claim; encoding it in the type system is a proof. When the README says "we don't grade like a school transcript" and the type emits "A" | "B" | ... | "F", the type wins. v0.5.2 was the discipline pass that aligned the type-level surface with the positioning we'd already decided on. A future contributor cannot re-introduce letter grades without first changing the type which is what "structurally impossible" means.
What's next for ClinicalLens
- Finalise the 2026 ACDIS draft cross-walk once the public comment period closes (2026-05-16) then we can publish a one-page tool-by-tool mapping showing which output satisfies which Section II clause.
- Per-organization audit-trail export — extend
queryRegisterto a downloadable CSV/JSON suitable as ACDIS-audit evidence. - CCDS-O contractor validation pass — engage a US-credentialed Outpatient CDI specialist to run 10–15 synthetic notes through the production tools and grade output against ACDIS rubrics.
- Continue exploring monetization options — we've identified several avenues to a win-win solution for mid-tier providers who don't have budgets for the highest tier offerings
Try it yourself — demo flow for judges
- The MCP server is live at
https://clinical-lens-production.up.railway.app/mcp(Streamable HTTP). Health check:https://clinical-lens-production.up.railway.app/health→{"status":"ok","service":"clinical-lens","version":"0.6.0"}. - Add to a Prompt Opinion workspace: Workspace Hub → Add MCP Server → paste the URL → select "Streamable HTTP" → check "pass FHIR context" → click "Test" (auto-discovers all 10 tools) → Save.
- Start with the single-call full CDI review — the fastest way to see everything at once:
- "Run a full CDI review." (no patient name required — SHARP context auto-resolves the active patient and the latest
DocumentReference) - or, explicitly: "Run a full CDI review on [patient name]'s most recent note."
- "Run a full CDI review." (no patient name required — SHARP context auto-resolves the active patient and the latest
This triggers run_full_cdi_review, which chains completeness, quality scoring, E/M level support, specificity gaps, and CDI query opportunities, then synthesises all findings with AI into a prioritised narrative. One prompt, complete picture.
Optional: install the CCDS-O BYO Agent. Paste the system prompt from
private/ccds-o-byo-agent-prompt.md(sourced from this repo) into a Prompt Opinion BYO Agent slot to spin up a CCDS-O outpatient-CDI persona that drives the tools per ACDIS outpatient principles.Probe individual tools to test specific capabilities:
| What to test | Prompt to type | |---|---| | Note completeness | "Check whether [patient]'s note has all required documentation elements for a hospital admission." | | Quality score | "Score the documentation quality of [patient]'s most recent progress note." | | E/M level support | "Does [patient]'s note support the E/M level that was billed?" | | Copy-forward detection | "Compare [patient]'s last two notes — does it look like copy-forward?" | | ICD-10 alignment | "Do the diagnosis codes in [patient]'s note match what the documentation actually says?" | | Specificity gaps | "What diagnoses in [patient]'s note could be coded more specifically for HCC risk adjustment?" | | CDI query opportunities | "What CDI query opportunities exist in [patient]'s note? Draft the queries." | | Trends across visits | "Show me documentation quality trends across [patient]'s last several notes." | | Full AI synthesis | "Generate a comprehensive documentation review for [patient]'s most recent note." |
Built With
- claude
- express.js
- helmet
- mcp
- node.js
- railway
- typescript
Log in or sign up for Devpost to join the conversation.