Inspiration

A healthcare planner, NGO coordinator, or analyst is handed 10,000 scraped, uneven healthcare-facility records. Half the "capabilities" are unverified claims, key fields are sparse (capacity is present in ~25% of rows, doctor counts in ~36%), and some values are physically impossible. The questions they actually need answered are simple to ask and hard to answer responsibly:

Can this facility really do what it claims?
Where are the real care gaps — not just the places we happened to measure poorly?
Where should this specific patient go right now?
And before anyone spends money: will building more facilities actually fix the health outcome, or is that just a correlation?

You cannot act on hope. You need evidence behind every claim, an honest confidence number, and the discipline to tell a correlation from a cause. That is facilitiesHelp.io.

Every number in the app is computed live from the complete dataset — no sampling, no placeholders, no fabricated statistics — and every claim cites the facility's own text.

# What it does — all 4 tracks + 2 bonus tracks, in one app

The hackathon offered four tracks and teams could pick just one. facilitiesHelp.io delivers all four, plus two bonus tracks, in a single coherent app where every answer carries both its evidence and its uncertainty.

## Track 1 — Facility Trust Desk (Can this facility do what it claims?)

Grades every specialty a facility lists — all 2,580 distinct specialties, 117,993 graded claims in total — as STRONG, PARTIAL, WEAK / SUSPICIOUS, or CLAIMED. Each claim shows a 0–100 confidence score, the ground-truth signals that produced the grade, the cited source text and links, a SHAP-style evidence attribution bar, an evidence→verdict graph, and a human override saved to an audit log (app_user_actions).

## Track 2 — Medical Desert Planner (Where are the real gaps, and how sure are we?)

Trust-weighted aggregation to the district level across all 2,518 capabilities, which crucially separates a real care gap from a data-poor district:

DATA-POOR — fewer than 3 facilities found: we don't know, and we say so.
APPARENT CARE GAP — facilities exist but fewer than 2 carry STRONG evidence.
EVIDENCED SUPPLY — enough corroborated capacity.

It includes a facilities table (capacity · doctors · equipment), an interactive map, and a causal layer that asks the honest planning question: will building more facilities actually move the outcome?

## Track 3 — Referral Copilot (Where should a patient go?)

A non-technical person doesn't need to know clinical terms. They can reach the right care three ways:

Describe it in plain words — "left foot hurting" → Orthopedic Surgery, "eye paining" → Ophthalmology (tested across 20 common complaints).
Upload a photo — a clinic board, a prescription, or even a photo of the condition itself (an infected eye, a swollen leg). Our multimodal Llama 4 Maverick model on Databricks Model Serving reads the image and routes to the right specialty — using the same serving endpoint as the text agents, no extra service.
Or pick directly from any of the 2,580 specialties.

Then "dialysis near Guntur" geocodes any Indian city or district from the 165,627-office India-Post directory (not a hardcoded list) and returns an evidence-attached, distance-ranked shortlist (great-circle/haversine distance), each with a one-tap Google Maps Directions link. It adds two things most referral tools don't:

"You may also need" — related capabilities ranked by facility co-occurrence P(B|A) (e.g., facilities offering Cardiology also offer Internal Medicine ~93% of the time).
Care pathway — start from a health concern (e.g., child stunting), get routed to the right specialist nearby, then see causally-related care as a clearly-labelled "maybe", driven by NFHS district correlations (stunting travels with child anaemia, r = +0.33; wasting, r = +0.25).

Every clinical suggestion carries a population correlation ≠ individual diagnosis / not a medical diagnosis caveat.

## Track 4 — Data Readiness Desk (What to fix before planning?)

Surfaces the structural problems that would silently corrupt any analysis — impossible values, internal contradictions, capacity recorded with zero doctors, and column-misaligned rows — ranked by review leverage. For any flagged record it shows the specific evidence and the likely root cause (e.g., column misalignment during scraping).

## Bonus — Ask the Data

One assistant for everything. It answers in plain language with cited evidence and an honest confidence level, renders the relevant chart, and — on demand — lets native Databricks AI/BI Genie write and run the SQL over the gold tables. It is grounded in the gold tables and refuses to invent numbers when it has no data. For real-world reach it answers in 11 languages (English, Hindi, Telugu, Tamil, Bengali, Marathi, Kannada, Gujarati, Spanish, French, Arabic) and can read the answer aloud (text-to-speech).

## Bonus — Data Science Lab

Interactive EDA across all three datasets, plus an interactive d3.js causal-graph studio: drag the nodes, switch between six causal modules (the full NFHS map, the wealth-confounding fork, the trust evidence→verdict graph, the care-pathway co-occurrence graph, supply-vs-demand, and the maternal/adolescent chain), and hover any edge to see the evidence behind it.

# The trust engine — how a grade is actually assigned

The rubric is deterministic and fully auditable — it reads only the facility's own record:

STRONG = a matching clinical specialty code AND equipment/procedure evidence in the free text AND a second independent source.
PARTIAL = exactly one type of evidence present.
WEAK / SUSPICIOUS = the claim contradicts the facility type (e.g., a small clinic claiming ICU/NICU).
CLAIMED = the capability appears only in a structured field, with no corroboration.
Confidence (0–100) = min(95, 70 + 5 × number of independent sources) — read as a probability the claim is genuine, not a guarantee.

We validated that this grade reflects evidence, not reputation: a logistic model reproduces the grade from its evidence signals (5-fold AUC = 1.00), while a model predicting STRONG from facility metadata alone (size, doctor count, web presence) scores only AUC ≈ 0.57 — barely above chance. The grade is earned by cited evidence, not by hospital fame.

# The causal layer — the part most teams skip

On 706 NFHS-5 districts we ran the full causal ladder rather than stopping at correlation: PC structure-learning → a Bayesian network → multi-method effect estimation (OLS with state fixed-effects, Double-ML, propensity-score matching) → E-value sensitivity analysis, plus statistical ML (penalized regression, multilevel models, GAMs, quantile regression, conformal prediction) and geometric deep learning (a spatial graph neural network).

The honest headline:

Sanitation ↔ child stunting looks strong (r = −0.51) but collapses to ≈ 0 once we adjust for household wealth — confounded, not causal.
Female schooling → less child marriage (−0.65) survives adjustment — likely causal.
ANC4 antenatal visits → institutional birth (+0.60) survives — likely causal.
Women's overweight → high blood pressure (+0.57) strengthens under adjustment — causal.
Facility count is only weakly linked to health outcomes after adjustment — so "build more" is often not the right lever; demand-side levers (schooling, antenatal care) frequently matter more.

# Solving the real supply–demand problem

Healthcare is a supply-and-demand problem: supply is the facilities and what they can actually do; demand is the population's real health needs. facilitiesHelp.io matches them honestly in three steps — (1) measure real supply, not claimed supply (the ML-validated trust grade); (2) find real demand gaps, not measurement gaps (DATA-POOR vs APPARENT CARE GAP); and (3) test whether adding supply actually helps (the causal layer — facility count is only weakly linked to outcomes). The answer is never just "build here," it's: here is where trusted care really exists, here are the real gaps, and here is whether more supply will move the outcome — so money goes to evidence, not hope.

# How we built it

Everything runs on Databricks Free Edition.

Data / medallion: a Unity Catalog medallion (bronze → silver → gold) on a serverless SQL Warehouse. Silver dedups facilities by cluster_id, assembles the evidence text, counts independent source types, and joins facilities → PIN → district.
Gold (the decision layer): gold_facility_capability_trust, gold_facility_specialty (2,580 specialties, 117,993 graded claims), gold_facility_contact, gold_all_gaps / gold_district_gaps (2,518 capabilities), gold_referral, gold_specialty_cooccur, gold_condition_corr, gold_data_readiness, gold_nfhs, gold_pin_state, and app_user_actions.
App: a single Streamlit Databricks App serving all six tracks, reliability-first — renders instantly from cached gold queries; every AI call is on-demand.
AI: Model Serving (Llama 4 Maverick) powers three text agents, a grounded copilot that refuses to fabricate numbers, an 11-language translator, and — through the same endpoint — multimodal vision that reads an uploaded photo and routes it to the right specialty. AI/BI Genie answers natural language as SQL.
Built with the Databricks agent skills (databricks aitools).

# Challenges we ran into

The data is genuinely messy: duplicates, misspellings like pharmacy/farmacy, impossible bed counts, corrupt GPS, sparse fields. Our principle: flag, never hide; show "not reported," never silently impute.
Correlation ≠ causation. The sanitation↔stunting result is the perfect trap — strong and intuitive, but confounded by wealth. Building the full causal ladder was the hardest analytical work.
Keeping the LLM honest. We ground every answer in the gold tables and make the assistant refuse and ask for specifics when it has no matching data.
Rendering an interactive causal graph inside Databricks Apps. A CDN library was blocked by the app's content-security policy, so we inlined d3.js and built a draggable force-directed layout.

# Accomplishments that we're proud of

Accomplishments that we're proud of

All four tracks plus two bonus tracks in one coherent, reliability-first app.
117,993 evidence-graded claims across 2,580 specialties — complete coverage, no sampling.
A validation that the grade reflects cited evidence, not hospital fame (metadata-only AUC ≈ 0.57).
A full causal layer on 706 NFHS districts (PC, OLS+FE, Double-ML, PSM, E-values), statistical ML (GAM, quantile, multilevel, conformal), and geometric deep learning (spatial GNN).
A care pathway that connects cause → related needs.
Multimodal & multilingual access — describe a symptom or upload a photo (vision via Llama 4 Maverick), answers in 11 languages, spoken aloud — one Databricks endpoint for text and images.
A fully Databricks-native, live app with every claim cited.

What we learned

The hardest part of "for good" data work isn't prediction — it's honesty: showing the evidence, communicating uncertainty, and telling a correlation from a cause. The most valuable thing the app does is sometimes to say "we don't know" (DATA-POOR) instead of guessing.

What's next for facilitiesHelp.io

Stronger, temporal causation — linking NFHS-4 (2015–16) to NFHS-5 (2019–21) turns our cross-section into a panel, enabling difference-in-differences, fixed-effects panels, and event-study designs.
Causal forecasting — Bayesian structural time-series to forecast outcomes and the counterfactual effect of a planned facility, plus demand forecasting.
Stronger identification — instrumental variables, regression-discontinuity, synthetic control.
Faster reads + exact citations — Lakebase (Postgres + pgvector).
Connectivity — FHIR interoperability, real-time gold refresh, bring-your-own-data, and low-connectivity (SMS / offline-first) delivery.
Sharper desert detection — 2SFCA accessibility scoring and facility-placement optimization.

Built With

causal
computer-vision
databricks
databricks-ai-bi-genie
databricks-apps
databricks-model-serving
databricks-sql-warehouse
delta-lake
gnn
inference
lakebase
llama-4-maverick
llama-4-maverick-vision
ml
multimodal
nvidia
pandas
plotly
python
scikit-learn
statsmodels
streamlit
text-to-speech
unity-catalog

Updates

koushik Telaprolu started this project — Jun 16, 2026 04:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.