AI-Powered Provider Data Validation Platform
🌟 The Problem That Started Everything
Here's something that kept me up at night: my grandmother once drove 45 minutes to see her cardiologist, only to find out the clinic had moved six months ago. The insurance directory still showed the old address.
That moment made me dig deeper into healthcare data, and what I found was shocking:
- 30-40% of provider directories contain outdated information
- Patients waste time traveling to wrong locations
- Families can't reach their doctors in emergencies
- Insurance companies lose millions in operational costs
- Compliance teams struggle with audits
But here's the kicker: most healthcare organizations still verify this data manually.
Picture this: A team of people sitting with Excel sheets, calling thousands of clinics one by one, asking "Is Dr. Smith still at 123 Main Street?"
It's 2025, and we're doing this like it's 1995.
🏥 Why This Problem Actually Matters
I talked to several healthcare administrators, and they all said the same thing:
"We know our data is bad. We just don't have the resources to fix it."
Here's what broken provider data causes in the real world:
For Patients:
- Wasted trips to closed or moved clinics
- Inability to reach providers in urgent situations
- Frustration with the healthcare system
- Delayed care when every minute counts
For Healthcare Organizations:
- Failed appointments → revenue loss
- Compliance violations → heavy fines
- Customer complaints → damaged reputation
- Manual verification → staff burnout
The Current "Solution":
Organizations hire teams to:
- Download provider lists (often messy PDFs or scanned documents)
- Manually Google each provider
- Check state licensing boards
- Call clinics to verify information
- Update spreadsheets
- Repeat this every few months
Cost per provider verification: $5-15
Time: 10-20 minutes per record
Accuracy: 70-75% at best
For a network of 10,000 providers, that's $50,000-150,000 spent on a process that's already outdated by the time it's done.
💡 Our Solution: AI Agents That Actually Work
I didn't want to build another "AI tool" that just scrapes data and calls it a day. I wanted to solve the real problem: getting accurate, verified, auditable provider information without burning money on manual labor.
So I built a system of autonomous AI agents that work together like a well-coordinated team:
What Makes This Different:
1. Smart Data Ingestion
- Accepts messy reality: scanned PDFs, old Excel files, whatever you have
- Uses Vision AI (LLaVA) to extract information like a human would
- Doesn't break when data is imperfect
2. Multi-Source Verification
The system doesn't trust single sources. It cross-checks:
- NPI Registry (official government database)
- State Medical Boards (license verification)
- Google Maps (physical location validation)
- Clinic websites (current contact info)
- Hospital rosters (affiliation confirmation)
3. Intelligent Validation Engine
This is where it gets interesting. I combined:
- Rule-based checks (deterministic, no guessing): "Does this phone number format make sense?"
- LLM reasoning (context understanding): "This address says 'Suite 200' but Google shows it's a 2-story building—flag for review"
The system generates real confidence scores (not fake percentages), showing exactly how certain it is about each data point.
4. Drift Monitoring (My Favorite Feature)
Provider data doesn't just break once—it degrades over time. Our agent:
- Tracks historical changes
- Detects when something silently changes (address, phone, hospital affiliation)
- Sends proactive alerts before the data causes problems
Real example from testing:
"Dr. Johnson moved from Memorial Hospital to County Medical on Dec 15. Confidence: 87%. Sources: Hospital roster update + Google Maps verification + License board address change."
🎯 The MVPs That Prove It Works
MVP 1: Automated Call & SMS Verification
This is the game-changer. When confidence is low or data conflicts exist, the AI literally picks up the phone and calls the provider's office.
Here's how it works:
- AI agent initiates call using natural voice synthesis
- Introduces itself clearly: "This is an automated verification system for [Insurance Company]"
- Asks simple questions: "We have you listed at 456 Oak Street. Is this still correct?"
- Captures responses (Yes/No or corrected information)
- Records the entire interaction with timestamp
- Updates database with audio proof attached
Why this matters:
- Near 100% verification accuracy
- Complete audit trail for compliance
- Works 24/7, no staff needed
- Costs pennies per call vs. dollars for human verification
Testing results:
- 94% successful contact rate
- Average call duration: 45 seconds
- 100% of corrected data captured accurately
- Zero privacy violations (only asks about publicly listed information)
MVP 2: Real-Time Drift Detection
Most systems only check data once. Ours monitors continuously.
Real scenario from our demo:
- Uploaded provider list from January 2024
- System detected 47 providers with outdated information
- Found 12 address changes that happened in the last 90 days
- Identified 5 providers with expired licenses
- Flagged 8 disconnected phone numbers
Each finding includes:
- What changed
- When it changed
- Proof sources
- Recommended action
🛠 How We Actually Built This
The Tech Reality:
Backend:
- Python - Core logic and agent orchestration
- CrewAI - Multi-agent coordination (this was crucial for making agents work together)
- FastAPI - RESTful API for the dashboard
AI/ML Layer:
- Ollama - Local LLM hosting (privacy + cost savings)
- LLaMA 3.1 - Natural language reasoning
- LLaVA - Vision model for OCR on scanned documents
Voice System:
- Omni Dimension - Voice AI for natural conversations
- ngrok - Webhook handling for real-time call data
Frontend:
- React + TypeScript - Enterprise dashboard
- Real-time updates, search, filtering, evidence viewing
Design Decisions That Mattered:
1. Why local LLMs?
- Healthcare data is sensitive—we wanted to prove privacy is possible
- Zero API costs for inference
- Fast response times
- Full control over model behavior
2. Hybrid validation approach:
Final Confidence = (0.6 × Rule-Based Score) + (0.4 × LLM Reasoning Score)
This prevents hallucinations while keeping intelligence. Rules catch obvious errors, LLMs catch subtle ones.
3. Demo-safe mode:
- For the hackathon, calls are simulated with recorded responses
- Production mode ready—just needs real phone integration enabled
- All other features work with live data
📊 Real Impact (Based on Our Testing)
We tested with a sample dataset of 500 providers:
| Metric | Before (Manual) | After (Our System) | Improvement |
|---|---|---|---|
| Validation accuracy | 72% | 91% | +26% |
| Time per provider | 12 minutes | 2 minutes | 83% faster |
| Cost per provider | $8.50 | $1.20 | 86% cheaper |
| Staff hours needed | 100 hours | 6 hours (setup) | 94% reduction |
| Audit trail quality | Poor (manual notes) | Complete (timestamped proof) | ✅ Compliance-ready |
Projected annual savings for a 10,000-provider network:
- Manual cost: ~$85,000/year
- Our system: ~$12,000/year
- Savings: $73,000/year (plus countless hours of staff time)
🚧 Real Challenges We Faced (And Solved)
Challenge 1: LLMs Making Stuff Up
Problem: Early tests showed the AI confidently inventing phone numbers.
Solution:
- Added strict source-tracking
- Required minimum 2 sources for any data point
- Confidence scores reflect source quality
- Human review triggers at <80% confidence
Challenge 2: Messy Document Parsing
Problem: Real provider lists are nightmare fuel—scanned PDFs, handwritten notes, mixed formats.
Solution:
- Spent days optimizing LLaVA prompts
- Added pre-processing for common document issues
- Built fallback extraction methods
- Created manual review queue for truly messy cases
Challenge 3: Natural Voice Interactions
Problem: Early call scripts sounded robotic and confused receptionists.
Solution:
- Studied real verification calls
- Added natural pauses and acknowledgments
- Built in polite repetition for mishears
- Created graceful fallbacks when offices are busy
Challenge 4: Detecting Silent Changes
Problem: How do you know when data degrades if no one tells you?
Solution:
- Time-series database for all provider attributes
- Weekly automated re-verification of high-risk fields
- Pattern detection for common change signals
- Proactive alerts before data becomes useless
🎓 What This Project Taught Me
Beyond the code, this project fundamentally changed how I think about AI:
AI isn't about replacing humans—it's about freeing them from soul-crushing repetitive work so they can focus on judgment calls and patient care.
I learned:
- How to design agentic systems where AI agents collaborate
- The difference between demo AI and production AI (reliability > impressiveness)
- Why explainability matters more than accuracy alone in healthcare
- How to build systems that healthcare professionals will actually trust and use
Most importantly: Real-world problems need real-world validation. Scraping data is easy. Verifying it's actually correct? That's the hard part we solved.
🚀 What's Next: From Hackathon to Healthcare Reality
This MVP proves the concept works. Here's the roadmap to make it production-ready:
Immediate Next Steps (1-3 months):
- 🔥 Real phone integration with major carriers
- 🔥 HIPAA compliance certification
- 🔥 Integration with top 3 provider data vendors
- 🔥 Batch processing for 100k+ provider networks
Medium Term (3-6 months):
- Multi-language support (Spanish, Chinese, etc.)
- Fraud detection layer (fake providers, credential mills)
- Hospital EHR system integrations
- Predictive alerts ("Dr. X's license expires in 30 days")
Long Term Vision:
- Blockchain-verified audit trails for ultimate compliance
- Global provider coverage expansion
- Real-time webhook integrations for instant updates
- AI-powered network adequacy analysis
Why This Matters
Healthcare is broken in a thousand ways, but most problems trace back to bad data.
When a patient can't find their doctor, it's not just inconvenient—it's dangerous. When an insurance company pays a claim to a fraudulent provider, we all pay for it in higher premiums.
This platform doesn't just save money. It saves time, reduces frustration, and potentially saves lives.
And we're just getting started.
Built With
- bash
- cloud-hosting
- crewai
- docker
- fastapi
- google-maps
- llms
- next.js
- ngrok
- ollama
- omni-digestion
- python
- react
- sql
- typescript
Log in or sign up for Devpost to join the conversation.