Inspiration
We kept running into the same problem: a healthcare planner trying to pick facilities has tons of data and no way to trust it. Scraped listings make a hospital that writes "world-class, state-of-the-art care" look identical to one that actually documents "1.5T MRI, 22-bed ICU, caesarean section." We wanted to give planners one place to see every facility in their district, judge whether claims are backed by real evidence, verify what matters, and fix what's wrong — instead of trusting marketing copy.
What it does
The Facility Trust Desk scores Indian healthcare facilities on documented clinical substance, not how good their description sounds. Each facility–capability pair gets an explainable trust score, a STRONG → NO_CLAIM signal, a dashboard to explore by state/district, a plain-English Genie search, and an approval workflow so verifiers and managers can authenticate records and improve accuracy over time.
How we built it
- The LLM substance score is a weighted average of the four 0/1 evidence flags:
$$S_{llm} = 0.30 \cdot f_{equip} + 0.25 \cdot f_{proc} + 0.30 \cdot f_{cap} + 0.15 \cdot f_{desc}$$
The final score blends substance, social presence, and human verification:
$$Final = 0.50 \cdot S_{llm} + 0.20 \cdot S_{social} + 0.30 \cdot S_{human}$$
When human verification is unavailable, we re-normalise over the remaining weights so the score still spans 0–100%:
$$Final_{no_human} = \frac{0.50 \cdot S_{llm} + 0.20 \cdot S_{social}}{0.70}$$
All weights live in code, never inferred by the model, so scores are reproducible and auditable.
Challenges we ran into
- "Substantive" isn't "has a number." Our first prompt scored "ranked #1, 10,000+ patients" as strong and missed "performs haemodialysis." We rewrote the definition around clinical specificity.
- ai_query doesn't exist on Databricks Free Edition — our SQL scoring broke, so we moved to the Python SDK serving-endpoint API, which worked.
- We ran out of free compute mid-run, which forced us to test on small samples and design the pipeline to run in resumable batches. ## What we learned For a trust tool, over-crediting a marketed record is worse than under-crediting a vague one, so we kept the LLM's job small and auditable. We also learned Genie is only as good as its grounding — the column comments and ranking rules are what make its answers reliable — and that data completeness and data credibility are genuinely different problems.
Built With
- databrickapps
- databrickssdk
- genie
Log in or sign up for Devpost to join the conversation.