Data Readiness Desk

Inspiration

Data is the critical foundation for every step that comes after it. The inspiration to work on this comes from our own experience with how messy healthcare data can be.

What it does

Data Readiness Desk builds a heuristic trust score that triages a scraped, LLM-assembled dataset of ~10,000 hospitals, judging whether each facility truly offers the specialties it claims. It flags the merged, mislabeled, and unsupported records, ranks them by impact, and routes them to human review — with an AI assistant for plain-English questions.

How we built it

A deterministic Python pipeline cleans the data and computes a transparent, claim-level trust score, while a Databricks LLM step extracts the supporting evidence. It's served from Lakebase (Postgres) through a Databricks App, with a Genie + web-search OpenAI assistant on top.

Challenges we ran into

The hardest part was keeping the score auditable instead of a black box, and stopping unsupported "list-only" claims from scoring as trustworthy. We also navigated a mid-build switch to Lakebase Autoscaling and a Postgres dialect bug that only surfaced after deployment.

Accomplishments that we're proud of

A trust score that's both explainable and grounded in real evidence, with a coherence signal that catches merged records other approaches miss. And a fully working end-to-end Databricks app — Genie, Lakebase, and an OpenAI agent — deployed and live.

What we learned

Choosing the right unit of analysis — the claim, not the row — was most of the battle, and letting coherence gate everything turned "looks sketchy" into a precise signal. We also learned to use the LLM only where it's strong, reading messy text, and keep the scoring math transparent.

What's next for Data Readiness Desk

Continuous reverse-ETL refresh and field-level correction capture in the reviewer flow. Ultimately, feeding accepted reviewer decisions back into the source to close the data-quality loop.

Built With

agents
apps
databricks
genie
lakebase

Updates

Praveen Sundaresan Ramesh started this project — Jun 16, 2026 05:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.