Inspiration
A healthcare planner, NGO coordinator, or analyst is handed 10,000 scraped, uneven healthcare-facility records. Half the "capabilities" are unverified claims, key fields are sparse (capacity is present in ~25% of rows, doctor counts in ~36%), and some values are physically impossible. The questions they actually need answered are simple to ask and hard to answer responsibly:
- Can this facility really do what it claims?
- Where are the real care gaps — not just the places we happened to measure poorly?
- Where should this specific patient go right now?
- And before anyone spends money: will building more facilities actually fix the health outcome, or is that just a correlation?
You cannot act on hope. You need evidence behind every claim, an honest confidence number, and the discipline to tell a correlation from a cause. That is facilitiesHelp.io.
Every number in the app is computed live from the complete dataset — no sampling, no placeholders, no fabricated statistics — and every claim cites the facility's own text.
# What it does — all 4 tracks + 2 bonus tracks, in one app
The hackathon offered four tracks and teams could pick just one. facilitiesHelp.io delivers all four, plus two bonus tracks, in a single coherent app where every answer carries both its evidence and its uncertainty.
## Track 1 — Facility Trust Desk (Can this facility do what it claims?)
Grades every specialty a facility lists — all 2,580 distinct specialties, 117,993 graded claims in total — as STRONG, PARTIAL, WEAK / SUSPICIOUS, or CLAIMED. Each claim shows a 0–100 confidence score, the ground-truth signals that produced the grade, the cited source text and links, a SHAP-style evidence attribution bar, an evidence→verdict graph, and a human override saved to an audit log (app_user_actions).
## Track 2 — Medical Desert Planner (Where are the real gaps, and how sure are we?)
Trust-weighted aggregation to the district level across all 2,518 capabilities, which crucially separates a real care gap from a data-poor district:
- DATA-POOR — fewer than 3 facilities found: we don't know, and we say so.
- APPARENT CARE GAP — facilities exist but fewer than 2 carry STRONG evidence.
- EVIDENCED SUPPLY — enough corroborated capacity.
It includes a facilities table (capacity · doctors · equipment), an interactive map, and a causal layer that asks the honest planning question: will building more facilities actually move the outcome?
## Track 3 — Referral Copilot (Where should a patient go?)
A non-technical person doesn't need to know clinical terms. They can reach the right care three ways:
- Describe it in plain words — "left foot hurting" → Orthopedic Surgery, "eye paining" → Ophthalmology (tested across 20 common complaints).
- Upload a photo — a clinic board, a prescription, or even a photo of the condition itself (an infected eye, a swollen leg). Our multimodal Llama 4 Maverick model on Databricks Model Serving reads the image and routes to the right specialty — using the same serving endpoint as the text agents, no extra service.
- Or pick directly from any of the 2,580 specialties.
Then "dialysis near Guntur" geocodes any Indian city or district from the 165,627-office India-Post directory (not a hardcoded list) and returns an evidence-attached, distance-ranked shortlist (great-circle/haversine distance), each with a one-tap Google Maps Directions link. It adds two things most referral tools don't:
- "You may also need" — related capabilities ranked by facility co-occurrence P(B|A) (e.g., facilities offering Cardiology also offer Internal Medicine ~93% of the time).
- Care pathway — start from a health concern (e.g., child stunting), get routed to the right specialist nearby, then see causally-related care as a clearly-labelled "maybe", driven by NFHS district correlations (stunting travels with child anaemia, r = +0.33; wasting, r = +0.25).
Every clinical suggestion carries a population correlation ≠ individual diagnosis / not a medical diagnosis caveat.
## Track 4 — Data Readiness Desk (What to fix before planning?)
Surfaces the structural problems that would silently corrupt any analysis — impossible values, internal contradictions, capacity recorded with zero doctors, and column-misaligned rows — ranked by review leverage. For any flagged record it shows the specific evidence and the likely root cause (e.g., column misalignment during scraping).
## Bonus — Ask the Data
One assistant for everything. It answers in plain language with cited evidence and an honest confidence level, renders the relevant chart, and — on demand — lets native Databricks AI/BI Genie write and run the SQL over the gold tables. It is grounded in the gold tables and refuses to invent numbers when it has no data. For real-world reach it answers in 11 languages (English, Hindi, Telugu, Tamil, Bengali, Marathi, Kannada, Gujarati, Spanish, French, Arabic) and can read the answer aloud (text-to-speech).
## Bonus — Data Science Lab
Interactive EDA across all three datasets, plus an interactive d3.js causal-graph studio: drag the nodes, switch between six causal modules (the full NFHS map, the wealth-confounding fork, the trust evidence→verdict graph, the care-pathway co-occurrence graph, supply-vs-demand, and the maternal/adolescent chain), and hover any edge to see the evidence behind it.
# The trust engine — how a grade is actually assigned
The rubric is deterministic and fully auditable — it reads only the facility's own record:
- STRONG = a matching clinical specialty code AND equipment/procedure evidence in the free text AND a second independent source.
- PARTIAL = exactly one type of evidence present.
- WEAK / SUSPICIOUS = the claim contradicts the facility type (e.g., a small clinic claiming ICU/NICU).
- CLAIMED = the capability appears only in a structured field, with no corroboration.
- Confidence (0–100) = min(95, 70 + 5 × number of independent sources) — read as a probability the claim is genuine, not a guarantee.
We validated that this grade reflects evidence, not reputation: a logistic model reproduces the grade from its evidence signals (5-fold AUC = 1.00), while a model predicting STRONG from facility metadata alone (size, doctor count, web presence) scores only AUC ≈ 0.57 — barely above chance. The grade is earned by cited evidence, not by hospital fame.
# The causal layer — the part most teams skip
On 706 NFHS-5 districts we ran the full causal ladder rather than stopping at correlation: PC structure-learning → a Bayesian network → multi-method effect estimation (OLS with state fixed-effects, Double-ML, propensity-score matching) → E-value sensitivity analysis, plus statistical ML (penalized regression, multilevel models, GAMs, quantile regression, conformal prediction) and geometric deep learning (a spatial graph neural network).
The honest headline:
- Sanitation ↔ child stunting looks strong (r = −0.51) but collapses to ≈ 0 once we adjust for household wealth — confounded, not causal.
- Female schooling → less child marriage (−0.65) survives adjustment — likely causal.
- ANC4 antenatal visits → institutional birth (+0.60) survives — likely causal.
- Women's overweight → high blood pressure (+0.57) strengthens under adjustment — causal.
- Facility count is only weakly linked to health outcomes after adjustment — so "build more" is often not the right lever; demand-side levers (schooling, antenatal care) frequently matter more.
# Solving the real supply–demand problem
Healthcare is a supply-and-demand problem: supply is the facilities and what they can actually do; demand is the population's real health needs. facilitiesHelp.io matches them honestly in three steps — (1) measure real supply, not claimed supply (the ML-validated trust grade); (2) find real demand gaps, not measurement gaps (DATA-POOR vs APPARENT CARE GAP); and (3) test whether adding supply actually helps (the causal layer — facility count is only weakly linked to outcomes). The answer is never just "build here," it's: here is where trusted care really exists, here are the real gaps, and here is whether more supply will move the outcome — so money goes to evidence, not hope.
# How we built it
Everything runs on Databricks Free Edition.
- Data / medallion: a Unity Catalog medallion (bronze → silver → gold) on a serverless SQL Warehouse. Silver dedups facilities by
cluster_id, assembles the evidence text, counts independent source types, and joins facilities → PIN → district. - Gold (the decision layer):
gold_facility_capability_trust,gold_facility_specialty(2,580 specialties, 117,993 graded claims),gold_facility_contact,gold_all_gaps/gold_district_gaps(2,518 capabilities),gold_referral,gold_specialty_cooccur,gold_condition_corr,gold_data_readiness,gold_nfhs,gold_pin_state, andapp_user_actions. - App: a single Streamlit Databricks App serving all six tracks, reliability-first — renders instantly from cached gold queries; every AI call is on-demand.
- AI: Model Serving (Llama 4 Maverick) powers three text agents, a grounded copilot that refuses to fabricate numbers, an 11-language translator, and — through the same endpoint — multimodal vision that reads an uploaded photo and routes it to the right specialty. AI/BI Genie answers natural language as SQL.
- Built with the Databricks agent skills (
databricks aitools).
# Challenges we ran into
- The data is genuinely messy: duplicates, misspellings like pharmacy/farmacy, impossible bed counts, corrupt GPS, sparse fields. Our principle: flag, never hide; show "not reported," never silently impute.
- Correlation ≠ causation. The sanitation↔stunting result is the perfect trap — strong and intuitive, but confounded by wealth. Building the full causal ladder was the hardest analytical work.
- Keeping the LLM honest. We ground every answer in the gold tables and make the assistant refuse and ask for specifics when it has no matching data.
- Rendering an interactive causal graph inside Databricks Apps. A CDN library was blocked by the app's content-security policy, so we inlined d3.js and built a draggable force-directed layout.
# Accomplishments that we're proud of
Accomplishments that we're proud of
- All four tracks plus two bonus tracks in one coherent, reliability-first app.
- 117,993 evidence-graded claims across 2,580 specialties — complete coverage, no sampling.
- A validation that the grade reflects cited evidence, not hospital fame (metadata-only AUC ≈ 0.57).
- A full causal layer on 706 NFHS districts (PC, OLS+FE, Double-ML, PSM, E-values), statistical ML (GAM, quantile, multilevel, conformal), and geometric deep learning (spatial GNN).
- A care pathway that connects cause → related needs.
- Multimodal & multilingual access — describe a symptom or upload a photo (vision via Llama 4 Maverick), answers in 11 languages, spoken aloud — one Databricks endpoint for text and images.
- A fully Databricks-native, live app with every claim cited.
What we learned
The hardest part of "for good" data work isn't prediction — it's honesty: showing the evidence, communicating uncertainty, and telling a correlation from a cause. The most valuable thing the app does is sometimes to say "we don't know" (DATA-POOR) instead of guessing.
What's next for facilitiesHelp.io
- Stronger, temporal causation — linking NFHS-4 (2015–16) to NFHS-5 (2019–21) turns our cross-section into a panel, enabling difference-in-differences, fixed-effects panels, and event-study designs.
- Causal forecasting — Bayesian structural time-series to forecast outcomes and the counterfactual effect of a planned facility, plus demand forecasting.
- Stronger identification — instrumental variables, regression-discontinuity, synthetic control.
- Faster reads + exact citations — Lakebase (Postgres + pgvector).
- Connectivity — FHIR interoperability, real-time gold refresh, bring-your-own-data, and low-connectivity (SMS / offline-first) delivery.
- Sharper desert detection — 2SFCA accessibility scoring and facility-placement optimization.
Built With
- causal
- computer-vision
- databricks
- databricks-ai-bi-genie
- databricks-apps
- databricks-model-serving
- databricks-sql-warehouse
- delta-lake
- gnn
- inference
- lakebase
- llama-4-maverick
- llama-4-maverick-vision
- ml
- multimodal
- nvidia
- pandas
- plotly
- python
- scikit-learn
- statsmodels
- streamlit
- text-to-speech
- unity-catalog
Log in or sign up for Devpost to join the conversation.