TrueNorth Health — finding real care gaps, not just missing rows

What inspired us

Virtue Foundation doesn't lack good intentions or skilled volunteers — it lacks a trustworthy way to decide where a short-term medical mission does the most good. The hard part isn't plotting hospitals on a map; it's that web-scraped facility data is noisy and self-reported, and a blank cell can mean two completely different things: a genuine care desert (people, real need, no care) or a data gap (we simply haven't scraped the records). Treating those the same sends teams to the wrong place.

We were inspired by a simple, uncomfortable question: can we tell a real care desert apart from a data gap — and prove it — so a non-technical mission planner can act with confidence? That question shaped every design decision.

What we built

A map-first Databricks App (Streamlit) that, for any clinical capability (maternity, ICU, NICU, emergency, oncology, trauma) and geography, ranks districts by how big and how trustworthy their care gap is, then helps plan a volunteer mission.

The deterministic spine turns messy claims into auditable numbers:

Every facility's claimed capability is treated as a claim to verify, graded by corroborating its capability text against its own procedure/equipment text: high (claimed + corroborated), medium (claimed only), unverified (a flag with no supporting text). We cite the exact text behind every grade.
Trust-weighted supply rewards evidence over assertion: $$\text{TWS} = 1.0\,\cdot\,n_{\text{high}} + 0.6\,\cdot\,n_{\text{medium}}$$
Supply adequacy saturates (the 10th provider matters less than the 1st): $$\text{adequacy} = \frac{\text{TWS}}{\text{TWS} + k}$$
Care-gap (desert) score combines measured NFHS-5 demand with the unmet gap: $$\text{desert} = \text{demand} \times (1 - \text{adequacy})$$ where demand is honest — an NFHS-5 proxy where one exists, otherwise the district is ranked on supply scarcity alone and labeled as such.
A deployment optimizer ranks districts by need addressed per dollar from a chosen home base: $$\frac{\text{desert}}{\text{cost}}, \qquad \text{cost} = 0.70\,d_{\text{km}} + 60\,(\text{team}\times\text{days}) + 200\,(\text{drive_hrs}\times\text{team})$$ with $\text{drive_hrs} = d_{\text{km}} / 45$ so a closer district never costs more than a farther one — a monotonicity property we test explicitly.

The AI layer adds human judgment, never the numbers. Databricks ai_query (an open Foundation Model) reasons over each district's computed metrics to produce a one-line recommended action for the Foundation — grounded in what VF actually does (deploy a volunteer mission, train local providers, donate equipment, or send a needs-assessment scout), explicitly not building or staffing facilities. A single AI copilot orchestrates the deterministic tools and can run Genie text-to-SQL for ad-hoc questions — but it can only state numbers the tools or Genie returned.

How we built it (and the Databricks surface we used)

The whole thing runs natively on Databricks Free Edition:

Unity Catalog + Delta Sharing to read the shared Virtue Foundation dataset.
A serverless Lakeflow Job (DABs) that resolves every facility to a district by point-in-polygon (99.98% coverage), grades claims, computes the analytics, runs ai_query, and publishes curated UC Delta tables.
Lakebase (Postgres) for sub-10ms app reads and persisting user actions (saved scenarios, reviews, notes).
Databricks Apps to host the Streamlit UI, Genie for conversational analytics, Foundation Model API for the AI column and copilot, plus Secrets, the SDK, and a DABs bundle for one-command deploys.

What we learned

Honesty is a feature, not a caveat. The most valuable thing the tool does is say "this might be a scrape gap, not a desert — verify first." Surfacing uncertainty earned more trust than hiding it.
Put AI where it adds judgment, keep the spine deterministic. Decisions that affect funding must be reproducible and explainable; AI is perfect for translating those numbers into an action a coordinator can read in one line — and dangerous if it produces the numbers.
Ground the model in the organization. Generic LLM advice suggested building and staffing clinics; once we grounded the prompt in what Virtue Foundation actually does, the recommendations became things the Foundation can genuinely execute.

Challenges we faced

No population data anywhere in the three source tables, so absolute need is unknowable. We kept need relative and configurable rather than fabricating a denominator — and made that limitation visible.
Noisy, self-reported claims. ~4,600 facilities flagged maternal care, but only ~2,000 had text that actually corroborated it. The claims-grading layer is what makes the coverage numbers defensible.
Free-Edition limits, met head-on. Databricks-served Claude/Gemini are rate-limited to zero, so the copilot uses an external key via the same OpenAI-compatible client while the AI column uses an open served model. The Lakebase bulk-load OOMs on serverless, so the Job owns the Delta/AI half and Lakebase loads via a local step. Genie's space-creation schema is undocumented — we recovered it from a populated sample space via the raw REST API.
Subtle correctness bugs. An early cost model made a closer district cost more (travel time wasn't derived from distance); we fixed it and added a regression test that asserts cost monotonicity for every origin.

The result is a planner that doesn't just show where facilities are — it shows where care is genuinely missing, how much we can trust that, and where a volunteer team turns the most unmet need into impact per dollar.

What's next

A richer mission-cost model. Today the cost model uses road transport and a flat per-diem: $\text{cost} = 0.70\,d_{km} + 60\,(\text{team}\times\text{days}) + 200\,(\text{drive_hrs}\times\text{team})$. Next we'd add multi-modal transport (air / train / road, chosen by distance and availability), place-specific lodging & boarding (a night in a metro costs very differently from a rural district), and live routing for any origin rather than a precomputed set — turning cost-per-impact from a good estimate into a near-quotable budget.
From relative to absolute need. The source data has no population, so need is currently relative. With a population denominator we'd switch to per-capita rates and surface absolute people-reached.
Semantic facility matching (Mosaic AI Vector Search). Embedding search to match a mission's specific needs to the best-fit facilities, beyond keyword corroboration.
Always-fresh data + trajectory. Schedule the ingest Job and fold in NFHS-6 trends so a district's gap shows whether it's improving or worsening.
A multi-turn copilot. Carry context across questions for true follow-up planning.