Procure_Health - Hospital Cost Estimation and Suggestion

Inspiration

India processes over 50 million hospitalizations every year, yet the average patient walks into a hospital with almost no idea what it will cost. In Tier 2 cities like Indore, Bhopal, Nagpur, and Lucknow where nearly 65% of India's population lives this problem is dramatically worse. Less than 18% of Indians have any form of health insurance, and of those who do, fewer than half understand what their policy actually covers or which hospitals are empanelled under it.

When a medical emergency strikes, a family faces a cascade of unanswered questions simultaneously: Which hospital should we go to? How far is it? What do other patients say about it? Will our insurance work there? If we take a loan, how much do we actually need too much means paying extra interest for years, too little means paying out of pocket at the worst possible moment. What will the surgery actually cost at this specific hospital, in this city, in this tier? Does the doctor here have good outcomes for this condition?

These are not luxury questions. They are survival questions. And today, no single tool answers them together.

We also recognized a parallel problem on the lender side. Medical loan approvals at NBFCs like Poonawalla Fincorp are often delayed or mispriced because the loan officer has no reliable procedure level cost benchmark. They rely on rough estimates, patient provided bills, or broad city averages none of which account for hospital tier, room type, comorbidities, or NPPA regulated implant costs. This leads to under sanctioned loans that leave patients scrambling, or over sanctioned amounts that increase NPA risk.

Precure_ was built to solve both sides of this problem at once giving patients clarity and giving lenders confidence.

What it does

Precure_ is an AI powered healthcare cost navigator and hospital recommendation engine, accessible as a conversational agent on the Prompt Opinion platform.

For patients:

When a user describes their situation "my father needs knee replacement in Delhi, we have a 4 lakh insurance policy" the system immediately calculates a fully itemized cost estimate broken into two buckets: variable costs (surgery package, room charges, ICU, medications) and fixed/capped costs (NPPA regulated implants and technology fees). It tells them exactly what their insurance covers, what the out of pocket gap is, and whether a medical loan is needed and if so, precisely how much to borrow.

It then suggests the top hospitals for that procedure in their city, ranked not just by Google rating but by a multi factor algorithm combining real patient review sentiment (using VADER NLP), specialty focus, accreditation status, doctor reputation from patient reviews, and geographic proximity. Users see which doctors at those hospitals are specifically praised for that specialty, what past patients said, and what to watch out for.

For patients anxious about a procedure, the system walks them through the complete clinical pathway from pre operative preparation and tests, through surgery day, ICU, ward recovery, and discharge timeline in plain language.

For users who describe symptoms rather than a procedure, the system identifies the right specialist to consult, explains urgency level, lists tests likely to be ordered, and flags red signs that require emergency care.

All of this works across 9 cities, 6 hospital tiers, over 2,300 hospitals, 12 surgical procedures with sub variants, and 12 diagnostic categories in English and Hinglish.

For lenders:

Every cost estimate includes a structured breakdown that a loan officer can use directly for sanction decisions procedure type, sub procedure variant, hospital tier, city adjustment factor, comorbidity loading, NPPA implant cost, and a recommended loan amount with a 10% contingency buffer. No guesswork. No broad averages. A per case capital requirement with full transparency.

How we built it

Step 1 City and hospital scope definition We started by defining scope: 9 cities across 3 tiers metro (Delhi, Mumbai, Bangalore), Tier 2 (Jaipur, Lucknow, Nagpur, Bhopal), and emerging (Dehradun, Indore). This ensured coverage of high end, mid market, and cost sensitive patient profiles.

Step 2 Hospital data collection and enrichment We sourced the base hospital list from NABH (National Accreditation Board for Hospitals) because NABH accredited hospitals are the primary target for medical loan disbursement they meet minimum quality standards and are preferred by insurance providers. We then enriched each hospital record by scraping Google Places API, hospital websites, and public review platforms to collect specialities offered, accreditation type, Google ratings, review counts, lat/long coordinates, phone numbers, and website URLs. After merging and deduplication, we had a cleaned dataset of approximately 2,300 hospitals across the 9 cities.

Step 3 Hospital tier classification We initially classified hospitals into 4 tiers (Tertiary Corporate, Advanced Multispecialty, Standard Secondary, Government) using specialty count, accreditation, web presence, and phone availability as signals. This produced a poor distribution too many hospitals were classified as Tertiary, the gap between Tertiary and Advanced was unclear, and boutique specialty hospitals (standalone ortho, cardiac, eye care centres) were incorrectly merged with large multispecialty chains. We reclassified into 6 tiers using a combination of regex based name tokenization (identifying brand signals like Apollo, Fortis, Max for Tertiary; Heart Institute, Eye Care, Spine Centre for Boutique), specialty code pattern matching on NABH data, and structural signals like NABH specialty count and empanelment type. This produced a significantly cleaner distribution with meaningful cost differentiation between tiers.

Step 4 Rate card construction We used CGHS (Central Government Health Scheme) rates as the base reference because they represent the government defined floor price and are the benchmark used by most insurance companies and lending institutions. However, CGHS rates are significantly below private market rates a TKR at a Tertiary Corporate hospital costs roughly 2x the CGHS rate, while cataract surgery can be 4x. Rather than applying a single blanket multiplier, we scraped procedure pricing from hospital websites and third party healthcare marketplaces across all 9 cities and 6 tiers, then computed per procedure, per tier, per city multipliers by comparing actual market prices against CGHS baselines. We then split each procedure cost into two buckets: Variable Base (surgery package, room, ICU, medications subject to tier and city multipliers) and Fixed/Capped Base (implants and technology capped at NPPA regulated ranges regardless of hospital tier). This two bucket structure means the cost model scales correctly: a premium hospital charges more for the room and surgery package, but cannot overcharge on a NPPA capped CoCr knee implant.

The final rate cards cover 12 surgical procedures (including bilateral and unilateral TKR with standard CoCr implants, CABG, cataract, angioplasty with stent, and more), 12 diagnostic and treatment categories, with sub procedure variants for each a total of over 40 distinct procedure variant combinations.

Step 5 ICD 10 mapping and clinical pathways Every procedure was mapped to its corresponding ICD 10 code to enable clinical interoperability and insurance compatibility checks. We also built step by step clinical care pathways for each procedure covering pre operative preparation, surgery day, ICU stay, ward recovery, physiotherapy, and discharge with realistic timelines and day by day guidance.

Step 6 Hospital ranking algorithm We ran VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis on patient reviews, extracting sentence level sentiment scores for specialty specific mentions. For each hospital specialty pair, we computed a weighted specialty score combining sentiment strength, mention count, and a confidence weight based on review volume. We then built a 7 factor composite ranking score incorporating: VADER specialty sentiment, Google rating, log normalized review count, NABH accreditation bonus, primary specialty focus (specificity score based on number of specialties a 4 specialty hospital is more focused than a 14 specialty one), doctor reputation from review mentions, and a market alignment score derived from structural hospital signals. The ranking surfaces the top 2 review validated hospitals followed by 3 market aligned hospitals, ensuring a mix of review backed and structurally strong suggestions. For rare specialties like Neurology and Oncology where review data is sparse, we implemented association rule mining on NABH specialty co occurrence patterns to build fallback mappings if a hospital has strong cardiac review scores, it likely also delivers quality neuro care.

Step 7 Doctor recommendation engine Doctor names mentioned in patient reviews were extracted and deduplicated using prefix and last name matching. Each doctor was scored by the sentiment of reviews mentioning them and the number of mentions, producing a ranked list of up to 3 recommended doctors per hospital specialty combination.

Step 8 MCP server and agent integration With the core engine built, we exposed all capabilities as a Model Context Protocol (MCP) server with 13 tools covering cost estimation, insurance gap analysis, hospital search, specialty hospital ranking, doctor recommendations, clinical pathways, procedure variants, rate card lookup, and input validation. The MCP server was integrated into the Prompt Opinion platform as a BYO Agent with FHIR Context Extension enabled, A2A communication active, and a structured system prompt guiding tool orchestration. The agent was published to the Prompt Opinion Marketplace, making it discoverable and invokable by other agents in the ecosystem.

Challenges we ran into

Data quality at scale. The NABH dataset contained significant noise duplicate hospital entries under slightly different names, incorrect or missing lat/long coordinates, inconsistent specialty code formats, and outdated contact information. Cleaning this required a multi pass pipeline: fuzzy name deduplication, Google Places API verification for coordinates and names, regex normalization of specialty codes, and manual spot checking of high frequency errors.

Tier classification boundary problem. The initial 4 tier system collapsed too many distinct hospital archetypes into the same category. The jump from CGHS rates to actual market prices is not uniform a Boutique Super Specialty ortho hospital charges very differently from a large Tertiary Corporate chain for the same procedure, even within the same city. Getting the 6 tier classification right required multiple iterations of signal weighting and validation against known hospital benchmarks.

Multiplier extraction per procedure. CGHS rates are a reasonable floor but the markup varies dramatically by procedure, tier, and city. TKR at a Tertiary hospital in Delhi is roughly 2.2x CGHS; cataract at a Boutique eye hospital in Indore is 3.8x. There is no published table for these we had to scrape and cross reference hundreds of hospital procedure price pages, normalize for room type, and compute multipliers that produced realistic ranges when compared against real quoted prices. Getting the Variable and Fixed split right for NPPA regulated implants required separate validation against NPPA published ceiling prices.

Sparse review data for rare specialties. For common specialties like Orthopaedics and Cardiology, VADER scoring worked well. For Neurology, Oncology, and Nephrology, many hospitals had zero specialty specific review sentences, making the ranking degenerate to pure market signals. The association rule fallback partially addressed this, but remains an area needing more data.

Ranking instability when VADER data is absent. When no hospital in a city had specialty specific review data, the composite score was dominated by Google rating and MAS, causing high rated general hospitals to outrank focused specialty hospitals. We fixed this by adding a specialty listing bonus and increasing the weight of the specificity score when VADER data is absent ensuring that a focused 4 specialty ortho hospital ranks above a 12 specialty general hospital that merely lists ortho among its services.

Accomplishments that we're proud of

Building a cost estimation engine that correctly separates NPPA regulated fixed costs from variable tier adjusted costs a distinction that matters enormously for loan sizing and insurance gap calculation, and that no existing consumer tool makes.

Assembling and cleaning a hospital dataset of 2,300+ hospitals across 9 cities with specialties, accreditations, review sentiments, doctor mentions, coordinates, and tier classifications entirely from public sources.

Deriving per procedure, per tier, per city multipliers by cross referencing hundreds of real hospital price quotes against CGHS baselines, producing cost ranges that are validated against actual market prices rather than estimated.

A suggestion algorithm that combines 7 independent signals review sentiment, rating, review volume, accreditation, specialty focus, doctor reputation, and market alignment into a single ranked output that surfaces the right hospital type for the right patient in the right city.

A dual sided value proposition: the same system that helps a patient pick a hospital also gives a lender the structured cost breakdown they need to approve a medical loan in minutes rather than days.

Full integration into the Prompt Opinion MCP + A2A ecosystem with FHIR context support, making the entire engine invokable by any compliant agent in the platform.

Designing with India's financially vulnerable population at the center millions of families who, during a medical emergency, cannot afford to choose the wrong hospital or borrow the wrong amount. Precure_ surfaces the best hospital within whatever budget a patient states, filters recommendations accordingly, and connects those who need financial support directly to loan providers with a structured, pre justified borrowing requirement. The system was built with the explicit vision that no family should spiral into a debt trap simply because they lacked access to cost intelligence at the moment it mattered most.

What we learned

More factors do not always mean better answers. Early versions of the ranking algorithm used 12+ signals and produced counterintuitive results. Reducing to 7 well calibrated factors with clear separation of concerns produced significantly better suggestions. Feature engineering matters more than feature count.

Designing for two users simultaneously patient and lender forces better product decisions. Every output had to be both human readable and structurally parseable. That constraint led to the two bucket cost model, which turned out to be the right abstraction for both audiences.

Removing financial anxiety is as much an information problem as a product problem. Patients don't fear medical loans because loans are bad they fear them because they don't know how much they need. Giving a precise, justified number with a 10% buffer removes that uncertainty. Insurance fear works the same way: patients avoid using insurance because they don't know if a hospital is covered. Mapping empanelment status per hospital directly into the suggestion flow is the fix, and it is entirely a data problem.

What's next for Precure_

ML based multiplier optimization. Current multipliers were derived through manual cross referencing of scraped prices. With the dataset now assembled, training a gradient boosted model on procedure × tier × city × room type × comorbidity combinations will produce tighter, more accurate ranges and automatically update as new pricing data is scraped.

Advanced NLP for review sentiment. VADER is a lexicon based model with known limitations on medical terminology and Hindi English mixed reviews. Replacing it with a fine tuned transformer model (BioBERT or a multilingual variant) will significantly improve specialty sentiment scores, especially for rare specialties.

Insurance empanelment mapping. Collecting and maintaining a structured database of which hospitals are empanelled under which insurance providers (Star Health, HDFC ERGO, Niva Bupa, PMJAY, CGHS, ECHS) and surfacing this directly in hospital suggestions "this hospital accepts your Star Health policy" is the single highest impact addition for reducing patient anxiety.

Pre loan approval workflow. Integrating the cost estimation output into a structured pre approval API that an NBFC loan officer can trigger directly from the agent receiving a sanctionable amount recommendation with procedure justification, hospital tier confirmation, and comorbidity loading completing the lender side value proposition.

Production architecture refactor. The current prototype has a monolithic structure well suited for rapid development but not for scale. Refactoring into independent microservices (cost engine, hospital ranking, FHIR gateway, review pipeline) with proper caching and a vector search layer for hospital lookup will support production grade load and enable faster iteration on individual components.

Built With

a2a-protocol
antigravity
beautiful-soup
cghs-rate-cards
claude
csv-(hospital-database)
difflib-(fuzzy-matching)
dotenv
excel
excel-(rate-cards.xlsx)
fastapi
fhir-context-extension
fhir-coverage-modeling
gipsa-ppn-insurance-data
google-places
gpt-4o-mini
haversine
icd-10
model-context-protocol-(mcp)
n8n
nabh-registry
ngrok
node.js
nppa-implant-price-data
numpy
openrouter-api
pandas
prompt-opinion-platform
python
snomed-ct
vader-nlp-(vadersentiment)

Updates

Vaidik Mane started this project — May 12, 2026 03:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.