FacilityLens — About the Project

Inspiration

The brief was a CSV of 10,088 Indian healthcare facilities. The challenge was to build something useful for the Virtue Foundation, which deploys surgical teams to underserved populations across 72 low-income countries. They had a spreadsheet. They needed a decision tool.

The gap between those two things is wider than it looks. A field coordinator sitting in front of a list of 10,000 hospitals faces a hard question: "Which of these can I actually trust?" A hospital that claims cardiac surgery capability and lists one doctor isn't obviously wrong — until you notice it also reports 1,200 beds. The data contradicts itself, and there's no flag. A planner who doesn't know to look for that will send a surgical team to the wrong place.

That was the first insight: the data's own internal contradictions are the most dangerous thing in it, not the missing values. Missing values are visible. Internal contradictions are invisible unless you look for them.

The second insight came from the medical desert problem. India has districts with zero matched facilities in this dataset. Our first instinct was to call them deserts. It would have been wrong — many are simply not crawled. There's a difference between "we know there's no hospital here" and "we don't know if there's a hospital here." Conflating them felt like the kind of thing that gets a surgical team sent to the wrong place, or worse, an underserved area dismissed because the data happens to look adequate on a map.

That two-axis thinking — numeric score and explicit evidence quality label — became the architectural spine of the whole project.

What it does

FacilityLens is a Databricks App that helps non-technical healthcare planners explore 10,088 Indian healthcare facilities across four integrated tracks:

Trust Desk — Every facility gets a trust score (0–100) derived from five signals: source domain diversity, record completeness, text specificity, digital activity, and authoritative sources. A 20-point consistency penalty flags records where the data actively contradicts itself (a 1,200-bed hospital reporting one doctor). Scores appear with colour-coded uncertainty badges and clickable source URLs — the underlying facility text is always visible, never hidden behind the score.

Medical Desert Planner — A district-level map that overlays health burden (anaemia, diabetes, hypertension, unmet family planning, institutional births, cervical screening from NFHS-5) against hospital supply (trusted facilities per 100,000 population from Census 2011). Districts are tiered as Confirmed Desert, Likely Desert, Data Gap, or Adequate — and "Data Gap" is rendered grey, not red, because 35% of districts have no matched facility in this dataset. "We don't know" is a more honest answer than "we know it's bad."

Referral Copilot — Semantic search over facility capability text (powered by Databricks Vector Search with databricks-gte-large-en embeddings) combined with trust score and haversine proximity into a single composite rank. An honesty strip always renders above results: it announces weak result sets, missing Green-band facilities, and zero-match queries instead of silently returning whatever it found.

Data Readiness Desk — A stewardship surface showing a pre-computed column audit (the dataset's nine worst quality problems), a flag inbox where planner-flagged records surface for triage, and a Genie Space for long-tail SQL questions. The audit trail is append-only; the inbox self-clears via a window function on the latest action per facility.

Planner actions (notes, shortlists, flags, overrides, referral decisions) persist to a single user_reviews table. Nothing is ephemeral.

How we built it

FacilityLens is structured around a medallion pipeline (Bronze → Silver → Gold) with a Dash frontend reading gold directly via the Databricks SQL Warehouse.

The Data Pipeline

Bronze is read-only: the raw Virtue Foundation dataset plus the India Post pincode directory and NFHS-5 district health indicators.

Silver is where we fought the data. The bronze facilities table stores missing values as the literal string "null" — not SQL NULL, not an empty string, the word null. Every query needs a triple guard:

WHERE col != 'null' AND TRIM(col) != '' AND col IS NOT NULL

Free-text fields (capability, procedure, equipment, specialties) are JSON arrays stored as strings. We parse them with TRY_CAST(FROM_JSON(col, 'array<string>') AS array<string>) — about 99.7% parse cleanly; the 0.3% that fail silently become NULL rather than crashing downstream queries.

The pincode → district join has a 7.7% fan-out problem: a single pincode maps to multiple districts in the India Post directory. We deduplicate to FIRST(district) per pincode and carry a geo_confidence column (matched / ambiguous / unmatched) to every downstream table so the UI can surface honest uncertainty at the row level.

District name reconciliation across three datasets (facilities → pincode directory → NFHS-5 → Census 2011) required a district_name_mapping override table. NFHS-5 district names have trailing whitespace. Census 2011 uses romanized spellings that differ from administrative records. We always TRIM() before joining and maintain a curated override CSV for the ~40 residual mismatches.

Gold is four analytical tables — one per track — plus the Vector Search source projection.

Track 1 — The Trust Formula

The trust score is a weighted sum of five signals derived from the data's own behavior, not from intuition:

$$\text{overall_score} = 100 \times \max!\left(0,\, \min!\left(1,\; 0.40 \cdot s_\text{domain}

0.25 \cdot s_\text{complete}
0.20 \cdot s_\text{text}
0.10 \cdot s_\text{activity}
0.05 \cdot s_\text{auth} \right)\right) - 20 \cdot \mathbf{1}[\text{consistency_flag}]$$

Where:

Signal	Definition	Why
$s_\text{domain}$	$\min(n_\text{distinct_domains} / 15,\; 1)$	Domain diversity is the hardest signal to inflate — each domain represents an independent source, not just another Practo listing
$s_\text{complete}$	$n_\text{fields_populated} / 5$	Quality-of-record, independent of provenance
$s_\text{text}$	Binary: 1 if description > 50 chars AND \	capability\
$s_\text{activity}$	$(\text{recency_score} + \text{social_score}) / 2$	Web presence; small weight because it measures visibility, not quality
$s_\text{auth}$	Binary: 1 if `.gov.in` / `.nic.in` / `pubmed` / `wikipedia` source present	Rare (12.4% of facilities), earns a small weight as a tiebreaker

The consistency penalty ($-20$ points applied outside the weighted sum) targets a specific data failure mode: large hospitals that report numberDoctors = 1 because that's the pipeline's default when no doctor count was scraped. A 1,200-bed hospital with one recorded doctor isn't "low confidence" — the record actively contradicts itself. The penalty is large enough to push such facilities reliably out of Green (≥ 70) into Amber (40–69), while keeping them visible so planners can investigate rather than ignore.

Track 2 — The Desert Planner

The desert score is a two-axis model:

$$\text{desert_score} = \text{health_burden_index} - \text{hospital_supply_index} \;\in\; [-1, +1]$$

$$\text{health_burden_index} = \frac{1}{6}!\left(\frac{\text{anaemia}}{100} + \frac{\text{diabetes}}{100} + \frac{\text{hypertension}}{100} + \left(1 - \frac{\text{institutional_births}}{100}\right) + \frac{\text{fp_unmet}}{100} + \left(1 - \frac{\text{cervical_screen}}{100}\right)\right)$$

$$\text{hospital_supply_index} = \min!\left(\frac{\text{hospitals_per_100k}}{5},\; 1\right)$$

The supply index counts only hospitals (not clinics or dentists) with trust_score ≥ 40. Phantom-data facilities don't count as supply.

The grey "Data Gap" tier is load-bearing. 244 of 706 NFHS-5 districts (35%) have no matched facility. Calling those deserts would be a lie. The grey tier says: "we don't know" — which is a more honest and more useful answer than a false red.

Track 3 — The Referral Copilot

The referral composite ranking:

$$\text{final_score} = 0.40 \cdot s_\text{trust} + 0.40 \cdot s_\text{match} + 0.20 \cdot s_\text{proximity} - 30 \cdot \mathbf{1}[\text{band}=\text{Red}] - 15 \cdot \mathbf{1}[\text{consistency_flag}]$$

Where $s_\text{proximity} = 100 \cdot e^{-d / 50}$ for haversine distance $d$ in kilometers (decay constant 50 km: a facility at 50 km scores ≈ 37/100).

Trust and vector match are weighted equally (both 0.40) because under-rewarding match collapses the referral page into "show the high-trust hospitals" — which is already what Track 1 does. The search is powered by Databricks Vector Search with a mandatory state pre-filter and an over-fetch of 30 results for rerank headroom. Everything else — distance ranking, snippet extraction, honesty strip — is deterministic Python. No LLM rerank. Defensible in one tooltip.

Track 4 — Data Readiness Desk

A stewardship surface with three panels: a pre-computed column audit (9 rows covering the dataset's worst quality problems), a flag inbox that closes the loop between planner-flagged records and steward triage, and a Genie Space for long-tail questions. The inbox is self-clearing via a single SQL window function: once a steward writes verify / needs_fix / duplicate for a facility, that becomes the latest action and the row drops out of the inbox.

Challenges we ran into

The string "null" problem. The dataset stores missing values as the string "null" — not SQL NULL. Standard IS NOT NULL checks pass. Every null-guarding filter had to be written explicitly as col != 'null' AND TRIM(col) != '' AND col IS NOT NULL. Missing this on even one column produces subtly wrong aggregations that are hard to spot.

unique_id is not unique. The field that looks like a primary key has 11 duplicates. We derive a synthetic PK (facility_pk = MD5(unique_id || name || zip)) in the silver layer and use it everywhere.

The consistency_flag edge case. 68 facilities have capacity ≥ 200 and numberDoctors BETWEEN 1 AND 3. The largest is Medanta — a 1,391-bed hospital reporting one doctor. This isn't noisy data; it's a specific pipeline artifact where the FDR aggregation defaults numberDoctors to 1 when no count is scraped. Treating it as a missing value (score unaffected) was wrong; treating it as a hard exclusion was too aggressive. The 20-point penalty outside the weighted sum was the right calibration.

Census 2011 coverage gaps. Rows 121–150 of the district population file are missing (a contiguous gap at mid-population ~2.7–3M range). We surface them as population_unknown in the data gap reason rather than silently zeroing the denominator.

Genie iframe CSP. The workspace's frame-ancestors policy blocks rendering cloud.databricks.com/genie/* inside a Databricks App (the app runs on databricksapps.com — different origin). Rather than chasing a workspace-admin exemption, Track 4's Genie panel renders an "Open Genie Space" button that opens the curated space in a new tab. Same flow, one extra click, no blocked iframe.

The VS endpoint provisioning timeline. Vector Search delta-sync on 9,863 rows with databricks-gte-large-en takes 5–10 minutes for the initial sync. Building Track 3 required having the full silver + gold pipeline landed first, then provisioning the endpoint, then waiting, then smoke-testing with hand-picked clinical queries (cardiac, dialysis, fistula repair, MRI, general surgery). The build order matters and the wait time is non-negotiable.

Accomplishments that we're proud of

The grey tier held. Every instinct — product instinct, design instinct, judge-pleasing instinct — was to colour low-supply districts red. We made 244 of them grey instead, with the label "Data Gap." That was the right call, and it held through every review. "We don't know" is a more credible answer than a false red, and credibility is what the Virtue Foundation actually needs.

The honesty strip. Most referral tools bury weak results. Ours announces them — above the fold, before the results render. Three distinct states: normal, weak result set, zero matches. It's the most important piece of UX in the app and it took maybe 40 lines of Python.

Source domain diversity as the primary trust signal. Early drafts used source count. We caught the problem (five aggregators inflate counts without adding independent evidence), rewrote the formula around domain diversity, and the trust scores became meaningfully better. That's not a minor tweak — it changed the rank order of thousands of facilities.

A complete four-track submission. Trust Desk, Desert Planner, Referral Copilot, Data Readiness — all four tracks landed, all four are functional, all four have live data behind them. Under hackathon time pressure, scope discipline is the hardest accomplishment. We shipped everything we said we would.

Every claim is cited. The rubric says "cite the underlying facility text." Every score, every ranking, every recommendation in FacilityLens traces back to a source URL and the raw claim text. No claim exists without evidence.

What we learned

The data's honesty constraints are features, not limitations. We spent the first half of the build trying to work around missing values and crawl gaps. By the end, we'd turned them into the most visible parts of the UI: the grey Data Gap tier, the uncertainty badge, the honesty strip, the consistency warning banner. The temptation to paper over weak signals with confident-looking numbers is strong; resisting it was the right call.

Source count is a terrible trust signal. Five aggregators — Practo, Facebook, JustDial, Sulekha, IndiaMart — account for roughly half of all source mentions in the dataset. A facility with 40 Practo-adjacent listings looks well-sourced in a raw count. Domain diversity catches this; source count doesn't. That insight cost us one full EDA pass and rewrote the trust formula.

District name reconciliation at scale is always the second hardest problem. The facilities table, the pincode directory, NFHS-5, and Census 2011 all spell Indian district names differently. No one dataset is authoritative. The solution (a curated district_name_mapping override table populated by fuzzy-join residue inspection) is boring and correct. Budget time for it; it will take twice as long as you think.

The pincode lat/lon in the India Post directory is unusable. 12,007 of 165,627 rows are the literal string 'NA'. Parseable values include extremes like 233M and coordinates outside India. We had to derive the haversine distance origin for Track 3 from the facility coordinates themselves — building a pincode_centroid table by averaging the lat/lon of all in-bounds facilities sharing a pincode. Self-consistent, ~3,217 rows, accurate enough for a 50 km decay constant.

Deterministic beats clever. Every place where we considered adding an LLM (query expansion, rerank, claim attribution, specialty tagging), the deterministic alternative was faster to build, easier to explain to a judge, and honestly often more accurate for this specific domain. We kept one LLM surface: the Vector Search embedding model for semantic clinical matching. That's the piece that genuinely needs it — "obstetric fistula repair" shouldn't require the planner to know what keywords are in the database.

What's next for FacilityLens

Real-time facility updates. The current pipeline is a point-in-time snapshot. The next version would run the FDR pipeline as a scheduled DLT job, incrementally merging new facility records and recalculating trust scores as source data changes. Planners would see a "last updated" timestamp on every record.

Expanded geographic coverage. The Virtue Foundation operates across 72 low-income countries. The medallion architecture and trust formula are dataset-agnostic — the same pipeline could ingest facility data from sub-Saharan Africa, Southeast Asia, or Latin America with minimal schema changes. The district name reconciliation problem gets harder, not easier, but the approach scales.

Collaborative triage workflows. The current user_reviews table supports individual planner actions. A team-facing version would add assignment, comment threads, and resolution status — turning the flag inbox into a proper stewardship queue for the Foundation's field coordinators.

Trust score feedback loop. When a planner marks a facility as verified or needs_fix after a site visit, that ground truth should flow back into the trust score. A small Bayesian update on completeness_score or authoritative_source_bonus for human-verified records would make the scores improve with use rather than staying static.

Richer health burden indicators. NFHS-5 covers six indicators. The next version would incorporate disease burden data (tuberculosis incidence, surgical case volume estimates) and infrastructure gaps (road access, nearest tertiary centre) to give the desert score more clinical grounding for the Foundation's specific mission.