ClinicalLens

Inspiration

For the entire build, the load-bearing constraint was FDA's revised CDS guidance (2026-01-06): a read-only documentation-quality tool stays outside medical-device regulation only if every finding exposes an independently reviewable basis the clinician can verify. We built the architecture around that constraint: sourceSpan, discreteConfidence, and reasoning on all 10 tools.

Mid-build, ACDIS and AHIMA published the 2026 Draft Query Practice Guidelines for AI-assisted CDI (public comment closes 2026-05-16). Section II requires technology-generated queries to cite "clinical indicators sourced from the health record, with location, free from subjective interpretation", plus, an organizational audit trail. The FDA requirement and the ACDIS draft are isomorphic. The architecture we built to stay outside FDA's lane is the architecture a CDI organization needs to satisfy ACDIS auditors. Hospitals lose ~$120K/year/physician to under-documentation; we took the lane the enterprise CDI players couldn't with a retrospective, read-only, document-framed perspective and the regulatory structure turned out to be a moat.

Copy-forward is a Joint Commission-flagged sentinel event contributor, not only an audit liability. An 81%-copied progress note with stale findings is a care coordination failure. Detect → alert physician → accurate status → better-informed handoffs → fewer missed transitions.

What it does

ClinicalLens exposes 10 MCP tools for clinical documentation quality analysis:

run_full_cdi_review — single-call orchestrator: chains completeness, quality, E/M support, specificity gaps, and query opportunities, then AI-synthesises a prioritised narrative. Zero-input mode (useLatestNote: true) when SHARP FHIR context is active.
generate_documentation_review — AI-synthesised review with streaming progress notifications.
assess_note_completeness — per-encounter-type CMS-element checks (20 elements, weighted)
measure_note_quality — composite 0–100 score across 5 dimensions, RAG-banded
detect_copy_forward — Jaccard similarity + temporal analysis on sequential notes
check_code_documentation_alignment — ICD-10 vs note content + problem-list cross-reference
assess_em_level_support — 2025 CMS MDM-criteria validation
detect_specificity_gaps — under-specified diagnoses (HCC / risk-adjustment surface)
analyze_documentation_trends — longitudinal multi-note quality drift
check_query_opportunities — ACDIS-compliant CDI query generation, LLM-judged for non-leading rubric compliance

Every finding emits sourceSpan (where), discreteConfidence (how certain), and reasoning (why) — nothing is pure score. Every tool also surfaces curated references[] and suggestedNextSteps[] on both the markdown and structured channels.

How we built it

The AI layer has one job rules cannot do: detecting whether a CDI query is leading. That judgment needs language understanding.

Read-only on FHIR via SHARP context propagation. Three HTTP headers, no installer, no EHR write access.
Citation anchoring on every finding. sourceSpan + discreteConfidence + reasoning on all 10 tools — the clause satisfying FDA §3060(a)(1)(D) and 2026 ACDIS draft Section II simultaneously.
LLM-as-judge compliance gate. A cross-family judge scores every CDI query draft on the ACDIS 2022 6-criterion rubric and suppresses non-compliant drafts. Fail-open with audit trail. Without it, leading queries ship undetected.
Regulatory-boundary output validator. Every LLM finding passes a runtime validator; CDS-leaning prose is suppressed into suppressedFindings[] so it's gated, not deleted. The §3060(a)(1) admin-vs-device boundary made mechanical.
Single-call orchestrator + zero-input mode (v0.6.0/v0.6.1). run_full_cdi_review chains five analyses + AI synthesis in one call. useLatestNote: true reads X-Patient-ID from SHARP and auto-resolves the latest DocumentReference. No patient prompt needed.
Tiered model posture + prompt caching (v0.5.5+). Synthesis runs on a cheaper, faster model with cache_control: { type: "ephemeral" }; reasoning models are reserved for the leading-language judge gate where accuracy beats latency.

Challenges we ran into

The most expensive single mistake was discovering, after v0.5.1 had shipped, that the typed structuredContent channel emitted grade: "C" while the human-readable Markdown rendered "Amber: improvable". The same tool speaking two positioning languages on its two channels. The fix wasn't more documentation; it was a type-signature change. (Captured as Insight #020 in our internal log: the structural type is the positioning surface.)

A second challenge: every architectural decision had to be traceable back to one of two regulatory frames (FDA §3060(a)(1) for the device boundary, or ACDIS 2022 + 2026 draft for the query-generation discipline). The framing kept us honest, "is this a positioning move that survives audit?" became the gating question for every PR.

Accomplishments we're proud of

The FDA ↔ ACDIS isomorphism: the architecture we built to stay outside FDA device regulation is (honestly, by accident) the architecture a CDI organization will need to satisfy ACDIS auditors when the 2026 draft finalises.
LLM-as-judge compliance gate running on every CDI query draft, with a fail-open audit trail.
Two-layer regulatory enforcement, observed live in production. (1) The tool-side validator caught "differential diagnoses" in a live LLM finding and suppressed it with an audit entry. (2) When asked a pure CDS question, the chat agent self-declined without invoking any tool and answered with 4 documentation-framed gaps — a posture adopted from our consistent disclaimer placement, not our code. We engineered one layer; the second emerged.
735 unit/integration tests + 25-fixture synthetic-data gold set + static-source lints + 10-prompt production smoke test = four layers of regression defence.

What we learned

The single most generalisable lesson: positioning leaks through any surface that emits text a reviewer reads, including the type signature. Encoding a constraint in the README is a claim; encoding it in the type system is a proof. When the README says "we don't grade like a school transcript" and the type emits "A" | "B" | ... | "F", the type wins. v0.5.2 was the discipline pass that aligned the type-level surface with the positioning we'd already decided on. A future contributor cannot re-introduce letter grades without first changing the type which is what "structurally impossible" means.

What's next for ClinicalLens

Finalise the 2026 ACDIS draft cross-walk once the public comment period closes (2026-05-16) then we can publish a one-page tool-by-tool mapping showing which output satisfies which Section II clause.
Per-organization audit-trail export — extend queryRegister to a downloadable CSV/JSON suitable as ACDIS-audit evidence.
CCDS-O contractor validation pass — engage a US-credentialed Outpatient CDI specialist to run 10–15 synthetic notes through the production tools and grade output against ACDIS rubrics.
Continue exploring monetization options — we've identified several avenues to a win-win solution for mid-tier providers who don't have budgets for the highest tier offerings

Try it yourself — demo flow for judges

The MCP server is live at https://clinical-lens-production.up.railway.app/mcp (Streamable HTTP). Health check: https://clinical-lens-production.up.railway.app/health → {"status":"ok","service":"clinical-lens","version":"0.6.0"}.
Add to a Prompt Opinion workspace: Workspace Hub → Add MCP Server → paste the URL → select "Streamable HTTP" → check "pass FHIR context" → click "Test" (auto-discovers all 10 tools) → Save.
Start with the single-call full CDI review — the fastest way to see everything at once:
- "Run a full CDI review." (no patient name required — SHARP context auto-resolves the active patient and the latest DocumentReference)
- or, explicitly: "Run a full CDI review on [patient name]'s most recent note."

This triggers run_full_cdi_review, which chains completeness, quality scoring, E/M level support, specificity gaps, and CDI query opportunities, then synthesises all findings with AI into a prioritised narrative. One prompt, complete picture.

Optional: install the CCDS-O BYO Agent. Paste the system prompt from private/ccds-o-byo-agent-prompt.md (sourced from this repo) into a Prompt Opinion BYO Agent slot to spin up a CCDS-O outpatient-CDI persona that drives the tools per ACDIS outpatient principles.
Probe individual tools to test specific capabilities:

| What to test | Prompt to type | |---|---| | Note completeness | "Check whether [patient]'s note has all required documentation elements for a hospital admission." | | Quality score | "Score the documentation quality of [patient]'s most recent progress note." | | E/M level support | "Does [patient]'s note support the E/M level that was billed?" | | Copy-forward detection | "Compare [patient]'s last two notes — does it look like copy-forward?" | | ICD-10 alignment | "Do the diagnosis codes in [patient]'s note match what the documentation actually says?" | | Specificity gaps | "What diagnoses in [patient]'s note could be coded more specifically for HCC risk adjustment?" | | CDI query opportunities | "What CDI query opportunities exist in [patient]'s note? Draft the queries." | | Trends across visits | "Show me documentation quality trends across [patient]'s last several notes." | | Full AI synthesis | "Generate a comprehensive documentation review for [patient]'s most recent note." |

Built With

claude
express.js
helmet
mcp
node.js
railway
typescript

Updates

Jason Tofte started this project — May 11, 2026 10:46 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.