💡 Inspiration

Every year, over 50 million Indian families face hospitalization. Most leave not just with medical trauma - but with bills they cannot understand, cannot verify, and cannot challenge.

Hospital billing fraud in India is systemic. Medicines are billed at 10x MRP. Procedures are duplicated. Consumables are invented. Government pricing protections - NPPA drug price caps, CGHS schedules, PMJAY rate limits - exist on paper but are practically invisible to patients who need them most.

The patient sits across from a hospital administrator with a 3-page bill and zero leverage. Sanjeevani exists to change that power dynamic permanently.


🏥 What It Does

Sanjeevani is a fully agentic AI platform that audits hospital bills end-to-end - from a raw scanned image to a court-ready dispute report - in under 60 seconds, with zero manual effort from the patient.

It is not a chatbot. It is not a search tool. It is an autonomous multi-stage AI agent that perceives, reasons, cross-references, and acts - giving every Indian patient the capability of a healthcare lawyer in their pocket.

Core capabilities:

  • 📄 Upload any hospital bill - photo, scanned PDF, or image
  • 🔍 OCR pipeline extracts every line item with high accuracy
  • 🧬 Fuzzy matching resolves drug names, procedure codes, and consumables
  • 💰 Cross-references against NPPA, CGHS, and PMJAY government pricing databases
  • 🚨 Flags overcharges with exact price differentials and legal basis
  • 📋 Generates a structured, ready-to-submit dispute report in seconds

🔧 How We Built It

Agentic Pipeline - 5 Stages

Stage 1 - Document Intelligence Multi-stage OpenCV preprocessing corrects skew, removes noise, enhances contrast, and isolates text regions. Tesseract OCR with custom Hindi/English mixed-language models extracts every line item - medicine names, dosages, procedure codes, room charges, consumables, and surgeon fees.

Stage 2 - Semantic Normalization Raw extracted text is messy: "Augmentin 625" vs "Co-Amoxiclav 625mg" vs "AMC-625" are the same drug. A custom RapidFuzz normalization engine resolves brand names → generic names → INN across 40,000+ drug variants. Procedure codes are mapped against MedDRA and ICD-10 taxonomies.

Stage 3 - Multi-Database Cross-Reference Each normalized line item is cross-referenced against three government pricing authorities simultaneously:

  • 🔵 NPPA - National Pharmaceutical Pricing Authority (drug MRP caps)
  • 🟢 CGHS - Central Government Health Scheme (procedure & room rate schedules)
  • 🟠 PMJAY - Pradhan Mantri Jan Arogya Yojana (package rate limits)

Stage 4 - Agentic Reasoning Claude AI classifies each line item, resolves ambiguities, detects duplicate billing, identifies phantom charges, and contextualizes overcharges - reasoning across the full bill, not line by line.

Stage 5 - Dispute Report Generation Generates a legally formatted dispute document with an itemized overcharge table, regulatory citations, total recoverable amount, and a step-by-step patient action guide.

Layer Technology
OCR & Vision Tesseract, OpenCV, Pillow
NLP & Matching RapidFuzz, custom normalization pipeline
AI Models HuggingFace Transformers (Python)
Backend Node.js, FastAPI (Python)
Frontend React.js, mobile-first responsive UI
Deployment Vercel (frontend), Render (backend)

🚧 Challenges We Ran Into

Real-World OCR on Indian Hospital Bills Indian hospital bills are among the hardest documents to OCR - rubber stamps over prices, mixed scripts, handwritten annotations, thermal-printed faded text, and wildly inconsistent layouts across 50,000+ private hospitals. Building a robust preprocessing pipeline that handles this at scale was the first major hurdle.

Drug Name Normalization at Scale The same molecule appears under 200+ brand names across manufacturers, regions, and hospital procurement systems. A naive string match fails catastrophically. The fuzzy normalization engine uses phonetic matching + edit distance + synonym graphs to resolve drug identity with >94% accuracy.

Government Database Structuring NPPA, CGHS, and PMJAY publish data in inconsistent formats - PDFs, Excel dumps, HTML tables - updated on irregular schedules. Building automated scrapers, validators, and a versioned local database to keep pricing data current and auditable was a significant engineering challenge.

Communicating to Distressed, Non-Technical Users The AI agent's output needed to be understood by a 65-year-old patient who just left the ICU. Every finding had to be expressed in plain language, with emotional sensitivity and clear action steps — not legal or technical jargon.


🏆 Accomplishments That We're Proud Of

  • End-to-end agentic pipeline - raw bill image to dispute report in under 60 seconds
  • 94%+ drug name resolution accuracy across 40,000+ drug variants
  • Three government databases integrated and queryable in a single cross-reference pass
  • Live and deployed at med-clear-teal.vercel.app
  • Zero healthcare background required - any patient, anywhere in India, can use it
  • ✅ Built as a solo project from scratch - OCR pipeline, AI agent, database, frontend, and deployment

🧠 What We Learned

Agentic AI delivers its highest value when it eliminates asymmetric information - the gap between institutional knowledge and individual access.

Sanjeevani doesn't just answer questions. It perceives a complex real-world document, reasons across multiple authoritative databases, and takes action on behalf of someone who had no recourse before. That is what agentic AI is built for.

Technically: real-world OCR is 80% preprocessing and 20% model. Government open data in India is a largely untapped goldmine. And building for distressed users forces a level of UX clarity that makes every product better.


🔭 What's Next for Sanjeevani - AI Hospital Bill Auditor

  • 🌐 Multilingual support - 12 Indian regional languages for Tier 2/3 city penetration
  • 📱 WhatsApp bot - zero-app-install access for rural patients via conversational interface
  • 🤝 Insurance claim integration - auto-generate TPA dispute letters alongside hospital disputes
  • 🏛️ Hospital Compliance Dashboard - B2B product for hospital administrators to proactively audit billing before patient complaints
  • 🔗 ABDM Integration - connect with Ayushman Bharat Digital Mission health records for longitudinal billing analysis
  • 🤖 Predictive fraud detection - flag hospitals with statistically anomalous billing patterns before patients are overcharged
  • 🌍 Global expansion - adapt the model for healthcare systems in Southeast Asia, Africa, and other regions with weak patient billing protections

Built With

Share this project:

Updates