Q_DBX_HCK — Virtue Health Equity Sentinel

Inspiration

India's health data is abundant but fragmented. The NFHS-5 survey captures rich demand-side signals — maternal care, child nutrition, infrastructure, behavioral risk — at the district level, while facility information lives in scattered, inconsistent, and often unverifiable sources. A health planner deciding where to invest next has no single place to ask: "Which districts need the most help, and can I trust what we claim to have on the ground there?"

We were struck by a recurring problem in public-health planning: tools either show demand (how sick a region is) or supply (how many facilities exist), but rarely both — and almost never with an honest signal about how reliable the underlying data is. A glossy dashboard that hides its own uncertainty can be worse than no dashboard at all. We wanted to build a decision aid that planners could actually trust, because it tells them exactly how much to trust each number.

What it does

Virtue Health Equity Sentinel ranks Indian districts by health-equity priority by blending two views into a single score:

Priority = Demand × (1 / Supply)

  • Demand — a 6-pillar District Health Index (Infrastructure, Empowerment, Maternal, Child Health, NCDs, Behavioral) built from NFHS-5 indicators, scored 0–100.
  • Supply — a trust-weighted measure of facility coverage, not a raw count: recency, evidence, field completeness, and known capacity all discount unreliable claims.

Planners can:

  • Explore an interactive India choropleth — recolor the map live by moving pillar-weight sliders, switch between Strategic (balanced), Tactical (maternal-focus), and Custom modes, and click any district to drill in.
  • Inspect facilities — see AI-extracted capability flags (ICU, NICU, maternity, etc.) with a verbatim citation showing the exact source sentence behind every claim.
  • Judge reliability at a glance — every score carries a 🟢/🟡/🔴 data-quality badge and a confidence band that widens when data is thin.
  • Save their work — notes, capability overrides, shortlists, and named weighting scenarios are persisted, scoped to the signed-in user.

How we built it

  • Platform: Databricks Apps, structured to the official streamlit-data-app-obo-user template.
  • Data layer: Precomputed Delta tables in Unity Catalog (district_pillar_scores, master_health_planner, facility_evidence_enriched, user_actions), queried through a SQL Warehouse.
  • Auth: On-behalf-of-user (OBO) — every query runs as the logged-in planner using their forwarded access token (X-Forwarded-Access-Token), so data access respects each user's own Unity Catalog permissions. The warehouse is bound as a declared resource via manifest.yaml.
  • Frontend: A multi-page Streamlit app (Heatmap, Facility Detail, Planner Desk, Methodology) with a shared sidebar that drives live re-scoring entirely in pandas, keeping map recolors well under a couple of seconds.
  • Mapping: Plotly choropleth over a GADM India district-boundary GeoJSON, with normalized district-name matching and click-to-drill navigation.
  • Evidence enrichment: An offline LLM pass extracts structured capability flags and citations from messy free-text facility descriptions, returning strict JSON with a confidence score.
  • Persistence: User actions written back to Delta with safely bound parameters.

Challenges we ran into

  • Auth model. Moving from a service-principal connection to OBO user-token auth changed the whole connection layer and the way we identify users — identity now comes from forwarded request headers, not a manual email box. Getting the Config() + user-token pattern right was the trickiest plumbing.
  • The 34 MB map. The full-resolution India district GeoJSON was huge — slow to sync to the deployed app and sluggish in the browser. We simplified it to ~1.6 MB (95% smaller) with no visible quality loss at national zoom, while keeping all 594 districts.
  • Name matching. District names differ across the NFHS, facility, and boundary datasets (older spellings, parenthetical suffixes, casing). We had to normalize aggressively on both sides so the choropleth join didn't silently drop districts.
  • Trustworthy AI extraction. Free-text facility descriptions are noisy and inconsistent. Forcing the model to cite verbatim evidence — and surfacing that citation in the UI — was essential to keep the tool honest rather than confidently wrong.

Accomplishments that we're proud of

  • Honesty by design. Every score shows its data-quality badge, confidence band, and evidence citation. The tool never asks to be trusted blindly.
  • Live, interactive prioritization. Planners reshape the entire national priority map in real time just by adjusting what they care about.
  • A genuine demand-meets-supply blend. Few tools combine survey-based need with trust-weighted facility coverage in one defensible score.
  • Clean Databricks-native architecture. OBO auth, Unity Catalog governance, and a template-compliant Apps deployment — secure and reproducible, not a one-off demo.
  • A 95% lighter map that deploys cleanly without sacrificing coverage.

What we learned

  • Uncertainty is a feature, not a footnote. Communicating how much to trust a number is as valuable as the number itself for real decision-making.
  • OBO auth is the right default for data apps — it keeps Unity Catalog as the single source of truth for who can see what, instead of duplicating permissions in app code.
  • Geospatial data needs deliberate weight management — full-resolution boundaries are almost never worth their size for a national view.
  • Most "bugs" live at the boundaries — paths, filenames, caches, and deploy sync — not in the core logic. The deployed environment is not your laptop.
  • Entity resolution is unglamorous but decisive — if district names don't match, the prettiest map shows nothing.

What's next for Q_DBX_HCK

  • Sharper district name resolution with fuzzy matching and an alias map so no district falls through the cracks.
  • On-demand evidence extraction inside the app, so newly added facilities are enriched live rather than only in batch.
  • Time-series trends — track how district priorities shift across NFHS rounds.
  • Collaborative planning — shared scenarios, team annotations, and exportable shortlists for funding proposals.
  • Broader pillars and sources — incorporate climate, sanitation, and supply-chain signals beyond NFHS.
  • Optimization layer — given a budget, recommend the highest-impact set of districts and interventions, not just a ranked list.

Built With

Share this project:

Updates