Inspiration

Real-world healthcare data is messy, incomplete, and often outdated. The Virtue Foundation challenge inspired us to tackle a critical problem: helping experts identify trustworthy healthcare facility records that ultimately influence patient access to care in middle- and low-income countries.

While AI agents can rapidly gather evidence and context from fragmented data sources, decisions that affect real people should remain in human hands. We believe the best outcomes come from combining AI-driven investigation with expert oversight, enabling decision-makers to act with greater confidence and transparency.

What it does

Facility Trust Operating System (FTOS) is a Databricks application that transforms messy healthcare facility records into trusted, actionable data.

The system automatically cleans, enriches, and verifies facility records, then presents recommendations for human review. Users can choose to:

  • Keep records as-is
  • Enhance records with additional information
  • Merge duplicate facilities
  • Purge erroneous records

During verification, a supervisory agent coordinates five specialized agents that independently evaluate trustworthiness across five domains:

  1. Source Authority
  2. Website Presence
  3. Contact Information
  4. Contextual Consistency
  5. Social Signals

The platform provides full transparency into each agent's findings, historical review decisions, and validation logs. Users can contribute additional evidence, defer decisions, or rerun validations before making a final determination.

How we built it

We began with extensive exploratory data analysis to understand the structure of the facility records and identify common data quality issues.

From there, we mapped the factors that contribute to trustworthiness and grouped them into five evaluation domains: source authority, website, contacts, context, and social signals. We developed specialized agents for each domain, iteratively refining their instructions and evaluation criteria to improve investigation quality and consistency.

Once the domain agents were performing reliably, we built a supervisory agent to orchestrate investigations and synthesize findings. Finally, we leveraged Lakebase and Databricks Apps to create a user-friendly interface that supports human review and decision-making.

Challenges we ran into

Handling incomplete data proved to be one of our biggest challenges. Missing information is not always a sign of an untrustworthy record. For example, a missing doctor count could indicate that the facility information is outdated, but it could also result from a temporary data collection or parsing failure.

Rather than treating missing values as inherently negative, we developed a contextual approach that evaluates missingness based on:

  • The likely causes of missing data for a specific field
  • The presence or absence of supporting information in related fields

This allowed us to make more nuanced trust assessments while avoiding unfair penalties for incomplete records.

Accomplishments that we're proud of

Our biggest achievement was successfully implementing a supervisory-agent architecture that delegates investigations to specialized agents and consolidates their findings into a single recommendation. This approach produced explainable results, improved coverage across trust domains, and kept humans in control of final decisions. It also demonstrated the potential of multi-agent systems for complex data quality workflows.

What we learned

This project gave us hands-on experience designing and orchestrating supervisory agents, refining agent instructions through iterative testing, and building production-ready applications on Databricks. We also gained valuable insights into Lakebase, Databricks Apps, and the challenges of evaluating trustworthiness when data is sparse, noisy, or incomplete.

What's next

We hope to continue improving the agents through larger and more diverse datasets, including facilities from additional countries beyond India. Future enhancements include expanding the evidence sources available to agents, improving trust-scoring accuracy, and incorporating expert feedback loops that continuously improve the system's recommendations over time.

Built With

Share this project:

Updates