Inspiration

India's healthcare system faces a silent crisis beneath the data: government and NGO facility registries are full of self-reported capabilities — "We have an ICU," "We offer Maternity care" — with no verification behind them. When a patient in a rural district is directed to the nearest hospital during an emergency, the listing may be a lie. Equipment inventories, staff records, and procedure histories often flatly contradict what a facility claims to offer. We wanted to build a system that treats data quality as a patient-safety issue — and fix it at scale.

What it does

ArogyaTrust is an AI-powered healthcare facility trust platform built on Databricks.

Trust Scoring — For every facility in the Virtue Foundation registry, it cross-references the facility's description, specialty list, equipment inventory, and procedure records using a weighted keyword-matching engine. It then calls Meta-Llama-3.1-70B-Instruct to generate per-capability verdicts ("True / False / Unclear") backed by exact supporting phrases from the facility's own data. Each facility is assigned a trust tier: Strong Evidence, Partial Evidence, or Weak/Suspicious.

Referral Copilot & Human Overrides — A natural-language facility search resolves a geographic anchor from free text (e.g., "Maternity hospital near Patna") and ranks nearby facilities by a composite of trust score and proximity. Field reviewers can flag AI-detected contradictions and submit structured corrections that write back to Delta Lake, closing the human-in-the-loop feedback cycle.

How we built it

We built a three-stage medallion ETL pipeline in Databricks notebooks using PySpark and Delta Lake:

The app itself is served as a Databricks App — a Streamlit frontend backed by a FastAPI REST API querying Delta Lake via the Databricks SQL Warehouse Statement Execution API. Interactive maps are built with PyDeck and Plotly Express.

What's next for ArogyaTrust

Continuous trust decay — Trust scores should degrade over time if a facility hasn't been re-verified. We want to add a recency-weighted scoring pass that penalizes stale or unconfirmed data. Photograph and document verification — Integrate facility-uploaded images or PDF equipment certificates into the evidence pipeline via a vision model, adding a third verification channel beyond text. State health ministry dashboards — Package the district-level risk data as a read-only dashboard for state health departments to use for resource allocation planning. Expanded coverage — The trust engine is dataset-agnostic. We want to run it against other public-health facility registries across South and Southeast Asia.

Built With

Share this project:

Updates