Built By

Built by Steven Yang, Ji Chen, Ayush Mishra, Changbin Gong

Inspiration

Healthcare access data is often messiest in the places where planning decisions matter most. A facility list can say a hospital exists, but a planner still needs to know whether the claims are source-backed and whether the district-level gap is real.

We built Databricks Copilot: Medical Desert Map for that workflow: a confidence-aware planning assistant for non-technical healthcare planners, NGO coordinators, and analysts working across India.

What it does

The app helps users answer three operational questions:

  • Where should we act first? The Map and Top Care Gaps views rank districts by health need, provider scarcity, and uncertainty.
  • Can we trust the supply evidence? Provider cards show source-backed claims, automated check status, and trust tiers instead of treating every facility row as ground truth.
  • What should happen next? The product suggests actions such as deploy/build, call or verify, fix records, route referrals, or monitor.

Key workflows include:

  • a district-level medical desert map copilot
  • a Top Care Gaps shortlist for doctor deployment planning,
  • provider-claim cards with source links and service chips, verified with external datasets
  • filters for districts with provider claims versus no mapped claims,
  • compact uncertainty explanations for health need, gap confidence, and provider trust,
  • explanations for statistical methodology.

How we built it

We built the project as a Databricks App with a Streamlit front end and a Databricks-style lakehouse data flow.

The data pipeline organizes the messy facility records into:

  • Bronze: raw facility and geography inputs.
  • Silver: normalized provider, address, PIN code, district, and service-claim features.
  • Feature tables: facility evidence features, geo-quality indicators, supply estimates, and district health-need inputs.
  • Gold: app-facing outputs such as care-gap rankings, provider trust, intervention recommendations, conformal sets, and review queues.

The decision layer combines several statistical and ML components:

  • CatBoost supply imputers: Capacity and doctor-count gaps are imputed with CatBoost regressors trained on observed rows. The models use native categorical handling, log1p target transforms, 5-fold out-of-fold validation, cohort-median baselines, clipped predictions, and MLflow logging. Capacity imputation improved MAE by about 13.6% versus the cohort-median baseline; doctor-count imputation was more conservative, improving MAE by about 1.5%.
  • Bayesian-style provider validity posterior: Facility evidence is scored from source URLs, geography consistency, service-claim evidence, missingness, contradiction flags, contact evidence, recency, and semantic quality.
  • Calibrated provider trust: District cards now show a simple High/Medium/Low trust tier. The score blends row-level Bayesian validity posterior with an empirical-Bayes-smoothed hard-check pass rate, so a thin sample such as 0/1 checks does not collapse into a misleading visible 0%.
  • Wilson confidence intervals: District and facility rates use Wilson 95% intervals for finite-row uncertainty, including check-pass rates, review-needed rates, critical supply-gap rates, and planning category volumes. These intervals are more stable than naive normal intervals for small or extreme proportions.
  • Split-conformal coverage: Facility trust sets are wrapped with split-conformal prediction sets over the validity posterior. The current artifact targets alpha = 0.10, uses a nonconformity score s = 1 - validity_posterior, and reaches about 91.8% empirical coverage on the automated proxy-valid class. Because the calibration target is trustworthy_supply_signal, the app labels this as provisional proxy coverage, not human-verified accuracy.
  • Active review queues: The system ranks facilities and districts where one more source check, geocode check, or claim review is most likely to change the decision.

Operationally, we used:

  • Unity Catalog / Databricks SQL for governed tables and queryable outputs,
  • MLflow for model and scoring-policy tracking,
  • Databricks Apps for the live application,
  • Streamlit for the planner workflow, and
  • Cloud Run as an additional public demo deployment path.

Challenges we ran into

The main challenge was not rendering a map; it was avoiding false certainty.

Specific issues included:

  • facility claims were noisy, duplicated, incomplete, or contradicted by geography,
  • many rows lacked capacity or doctor counts,
  • no large human-verified gold label set existed for facility truth,
  • India addresses and PIN-code joins needed careful normalization,
  • geocoding had to be treated as source agreement rather than ground truth, and
  • the UI had to stay understandable for a planner rather than exposing every intermediate data artifact.

That pushed us toward a product posture where the app says what it knows, what it does not know, and what a planner should verify before staffing.

Accomplishments that we're proud of

  • We turned messy facility records into a live, multi-view planning app rather than a static dashboard.
  • We separated health need, gap confidence, and provider trust so users can distinguish real care gaps from weak local evidence.
  • We added provider-claim evidence cards with service chips, source links, and check status.
  • We implemented CatBoost supply imputers, Wilson confidence intervals, Bayesian-style trust scoring, and split-conformal uncertainty sets.
  • We kept the product honest: proxy trust is never presented as measured medical truth.

Built With

Share this project:

Updates