Medical Desert Planner

Inspiration

600 million Indians live in districts where healthcare supply doesn't match health need. But existing gap analyses hide data quality issues, treat noisy data as ground truth, and give planners overconfident recommendations. We wanted to build a tool that says "we're not sure" when the data is weak — because in healthcare planning, a confident wrong answer is worse than an honest uncertainty flag.

What it does

Scores every Indian district using a per-capita gap metric: health need (NFHS-5 z-scored composite) minus facility supply per 100K people
Displays an interactive map of 706 districts colored by care-gap severity
Provides confidence flags (🟢 measured / 🟡 real but sparse / 🔴 low confidence) on every district
Surfaces the actual facility records (name, capability, specialties) behind each score — not just numbers
Includes an AI Analyst (Llama 3.3 70B via Databricks Foundation Models) that answers natural-language questions with cited evidence
Persists planner actions (shortlists, notes) for workflow continuity
Audits its own data quality in a Data Readiness tab

How we built it

Data: Virtue Foundation FDR facilities dataset (6,663 records) + NFHS-5 district health indicators (706 districts, 12 indicators) + Census 2011 population + India Post pincode directory for geocoding
Scoring: Z-scored need composite minus per-capita supply, with scipy cKDTree for facility-to-district geocoding
Stack: Python, Streamlit, Plotly, Pandas, PyArrow
AI Agent: Databricks Foundation Model serving (databricks-meta-llama-3-3-70b-instruct) called via REST API with managed identity auth
Data layer: Databricks Unity Catalog for live facility queries, pre-computed parquets for district gaps
Deployment: Databricks Apps (zero-infrastructure managed hosting)
Persistence: SQLite (portable) with Lakebase/Postgres-ready upsert layer

Challenges we ran into

Facility dataset has messy address_stateOrRegion values (cities mixed with states) — solved with geocoding by coordinates instead of text matching
14.5MB facility parquet exceeded Databricks workspace 10MB file limit — pivoted to live Unity Catalog SQL queries at runtime
Databricks Apps CSP blocks external map tile CDNs — switched from interactive MapLibre to Plotly's built-in scatter_geo
SDK's serving_endpoints.query() had a serialization bug with dict messages — bypassed with direct REST API calls using SDK-managed auth
67 post-2011 split districts have no Census population — rather than guessing, we flag them separately and rank by need only

Accomplishments we're proud of

Honest uncertainty: Every district has a confidence flag; we never present weak evidence as fact
Every claim is cited: Drill into any score and see the NFHS indicators + facility records behind it
61 million people identified in the 40 worst-scored districts — actionable for planners
AI grounded in data: The LLM agent gets 160+ real facility records as context, not just vibes
Full data audit: The app scores its own datasets for completeness and known biases

What we learned

Data quality IS the product in healthcare planning — surfacing uncertainty builds more trust than hiding it
Pre-computing scores in parquet + live-querying details from Unity Catalog is the right split for performance
Databricks Apps managed identity auth requires REST API calls (SDK has edge cases)
Per-capita normalization is essential — without it, large states always "win" the gap ranking regardless of actual density

What's next

Add public PHC/CHC facility data (government sources) for a complete supply picture
State-level GeoJSON choropleth for visual policy reports
Multi-scenario planning: "What if we add 5 facilities to district X?"
Integration with Databricks Workflows for automated monthly data refresh

Built With

databricks
databricks-apps
foundation-models
llama-3.3-70b
pandas
plotly
pyarrow
python
scipy
streamlit
unity-catalog

Submitted to

Databricks Apps & Agents for Good Hackathon 2026

Created by

Worked on the infra for the hackathon (GH etc.) and Databricks for the first time (beyond the tutorial). Hustled with the Virtue Foundation, to understand the use case, with Databricks mentor team for the issues encountered and the MLH team.

Nitin Muppalaneni
Sri Harsha Madireddy
Darren Moore
goddarm74 Mike Goddard

Updates

Sri Harsha Madireddy started this project — Jun 16, 2026 05:50 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.