Inspiration
Data is the critical foundation for every step that comes after it. The inspiration to work on this comes from our own experience with how messy healthcare data can be.
What it does
Data Readiness Desk builds a heuristic trust score that triages a scraped, LLM-assembled dataset of ~10,000 hospitals, judging whether each facility truly offers the specialties it claims. It flags the merged, mislabeled, and unsupported records, ranks them by impact, and routes them to human review — with an AI assistant for plain-English questions.
How we built it
A deterministic Python pipeline cleans the data and computes a transparent, claim-level trust score, while a Databricks LLM step extracts the supporting evidence. It's served from Lakebase (Postgres) through a Databricks App, with a Genie + web-search OpenAI assistant on top.
Challenges we ran into
The hardest part was keeping the score auditable instead of a black box, and stopping unsupported "list-only" claims from scoring as trustworthy. We also navigated a mid-build switch to Lakebase Autoscaling and a Postgres dialect bug that only surfaced after deployment.
Accomplishments that we're proud of
A trust score that's both explainable and grounded in real evidence, with a coherence signal that catches merged records other approaches miss. And a fully working end-to-end Databricks app — Genie, Lakebase, and an OpenAI agent — deployed and live.
What we learned
Choosing the right unit of analysis — the claim, not the row — was most of the battle, and letting coherence gate everything turned "looks sketchy" into a precise signal. We also learned to use the LLM only where it's strong, reading messy text, and keep the scoring math transparent.
What's next for Data Readiness Desk
Continuous reverse-ETL refresh and field-level correction capture in the reviewer flow. Ultimately, feeding accepted reviewer decisions back into the source to close the data-quality loop.
Built With
- agents
- apps
- databricks
- genie
- lakebase
Log in or sign up for Devpost to join the conversation.