Databricks Copilot: Medical Desert Map

Built By

Built by Steven Yang, Ji Chen, Ayush Mishra, Changbin Gong

Inspiration

Healthcare access data is often messiest in the places where planning decisions matter most. A facility list can say a hospital exists, but a planner still needs to know whether the claims are source-backed and whether the district-level gap is real.

We built Databricks Copilot: Medical Desert Map for that workflow: a confidence-aware planning assistant for non-technical healthcare planners, NGO coordinators, and analysts working across India.

What it does

The app helps users answer three operational questions:

Where should we act first? The Map and Top Care Gaps views rank districts by health need, provider scarcity, and uncertainty.
Can we trust the supply evidence? Provider cards show source-backed claims, automated check status, and trust tiers instead of treating every facility row as ground truth.
What should happen next? The product suggests actions such as deploy/build, call or verify, fix records, route referrals, or monitor.

Key workflows include:

a district-level medical desert map copilot
a Top Care Gaps shortlist for doctor deployment planning,
provider-claim cards with source links and service chips, verified with external datasets
filters for districts with provider claims versus no mapped claims,
compact uncertainty explanations for health need, gap confidence, and provider trust,
explanations for statistical methodology.

How we built it

We built the project as a Databricks App with a Streamlit front end and a Databricks-style lakehouse data flow.

The data pipeline organizes the messy facility records into:

Bronze: raw facility and geography inputs.
Silver: normalized provider, address, PIN code, district, and service-claim features.
Feature tables: facility evidence features, geo-quality indicators, supply estimates, and district health-need inputs.
Gold: app-facing outputs such as care-gap rankings, provider trust, intervention recommendations, conformal sets, and review queues.

The decision layer combines several statistical and ML components:

CatBoost supply imputers: Capacity and doctor-count gaps are imputed with CatBoost regressors trained on observed rows. The models use native categorical handling, log1p target transforms, 5-fold out-of-fold validation, cohort-median baselines, clipped predictions, and MLflow logging. Capacity imputation improved MAE by about 13.6% versus the cohort-median baseline; doctor-count imputation was more conservative, improving MAE by about 1.5%.
Bayesian-style provider validity posterior: Facility evidence is scored from source URLs, geography consistency, service-claim evidence, missingness, contradiction flags, contact evidence, recency, and semantic quality.
Calibrated provider trust: District cards now show a simple High/Medium/Low trust tier. The score blends row-level Bayesian validity posterior with an empirical-Bayes-smoothed hard-check pass rate, so a thin sample such as 0/1 checks does not collapse into a misleading visible 0%.
Wilson confidence intervals: District and facility rates use Wilson 95% intervals for finite-row uncertainty, including check-pass rates, review-needed rates, critical supply-gap rates, and planning category volumes. These intervals are more stable than naive normal intervals for small or extreme proportions.
Split-conformal coverage: Facility trust sets are wrapped with split-conformal prediction sets over the validity posterior. The current artifact targets alpha = 0.10, uses a nonconformity score s = 1 - validity_posterior, and reaches about 91.8% empirical coverage on the automated proxy-valid class. Because the calibration target is trustworthy_supply_signal, the app labels this as provisional proxy coverage, not human-verified accuracy.
Active review queues: The system ranks facilities and districts where one more source check, geocode check, or claim review is most likely to change the decision.

Operationally, we used:

Unity Catalog / Databricks SQL for governed tables and queryable outputs,
MLflow for model and scoring-policy tracking,
Databricks Apps for the live application,
Streamlit for the planner workflow, and
Cloud Run as an additional public demo deployment path.

Challenges we ran into

The main challenge was not rendering a map; it was avoiding false certainty.

Specific issues included:

facility claims were noisy, duplicated, incomplete, or contradicted by geography,
many rows lacked capacity or doctor counts,
no large human-verified gold label set existed for facility truth,
India addresses and PIN-code joins needed careful normalization,
geocoding had to be treated as source agreement rather than ground truth, and
the UI had to stay understandable for a planner rather than exposing every intermediate data artifact.

That pushed us toward a product posture where the app says what it knows, what it does not know, and what a planner should verify before staffing.

Accomplishments that we're proud of

We turned messy facility records into a live, multi-view planning app rather than a static dashboard.
We separated health need, gap confidence, and provider trust so users can distinguish real care gaps from weak local evidence.
We added provider-claim evidence cards with service chips, source links, and check status.
We implemented CatBoost supply imputers, Wilson confidence intervals, Bayesian-style trust scoring, and split-conformal uncertainty sets.
We kept the product honest: proxy trust is never presented as measured medical truth.