Inspiration

Healthcare claim denials cost the US healthcare system $20.5 - 25.7 billion annually, causing stress for patients and administrative burden for providers. We witnessed families struggling with unexpected claim denials, forcing them to choose between medical care and financial stability. Our team was inspired to create an AI system that could predict claim outcomes before submission, giving patients and providers transparency and control over their healthcare decisions. Right after the physician's notes and EHR submissions, our model is able to predict whether the claim is likely to go through or if it requires more paperwork and what exactly that work would be!

What it does

ClaimAssist is an advanced AI system that predicts whether healthcare insurance claims will be approved, denied, or need manual review with up to 96% accuracy. It provides:

  • Real-time Predictions - Instant claim outcome forecasting
  • Clinical Reasoning - Detailed explanations for each decision
  • Risk Assessment - Identifies potential denial factors
  • Cost Analysis - Financial impact predictions
  • Next Steps - Actionable recommendations that can be implemented by healthcare providers/person for reimbursements.

How we built it

We used ClinicalBERT along with OpenAI Ensemble to get clinical reasoning with medical knowledge. On the dataset, we did a traditional Random Forest training for fast and reliable baseline predictions. We also tried reinforcement Learning for cost-optimized business decisions, and see how the models compared to each other. Finally, we adopted a weighting mechanism to include rule-based guidelines for regulatory compliance checks depending on the ICD-10 codes.

We created comprehensive synthetic healthcare claim dataset with OpenAI and some online available data from Kaggle dataset. Our testing approach involved 4 different AI approaches. We are using ensemble Learning that is combined models for optimal performance but there are some tradeoffs of that and for that we used weighing mechanism to give the weightage to each model we had and then reached an optimal one. For validation, we tested on real-world scenarios (again these were simulated and some data was fetched through MIMIC-III) and on edge cases as well.

Challenges we ran into

  • Challenge: Healthcare claims involve complex medical coding (ICD-10, CPT) and clinical guidelines
  • Solution: Integrated ClinicalBERT trained on medical literature and built comprehensive medical knowledge base

  • Challenge: Healthcare data is sensitive and difficult to obtain

  • Solution: Generated realistic synthetic data using actual medical coding standards and real denial patterns

  • Challenge: Healthcare decisions require explainable AI for regulatory compliance

  • Solution: Built reasoning components that provide clinical justifications for each prediction

Accomplishments that we're proud of

With our prediction model we were able to achieve

  • 96% accuracy with Traditional ML model
  • 90% accuracy with Clinical AI providing full reasoning
  • Real-time predictions with an average of 5-6 seconds

Multi-modal ensemble combining 4 different AI approaches Dynamic weighting system adapting to claim characteristics Clean JSON API ready for production integration

We also have some metrics to understand that shows AI performance through different charts, medical insights dashboard revealing healthcare patterns, and analysis of decision-making processes

What we learned

Healthcare AI requires much more than traditional machine learning - it needs medical knowledge, regulatory compliance, and clinical reasoning capabilities. Combining different AI approaches (symbolic reasoning, neural networks, rule-based systems) creates more robust and reliable predictions than any single model.

In healthcare, "black box" AI isn't acceptable. Every decision must be explainable with clinical reasoning and evidence. Realistic, high-quality synthetic data often outperforms large datasets with quality issues, especially for specialized domains like healthcare.

What's next for ClaimAssist

What would be best is that we can test our model on real data - since they are hard to get and see if it's performing as expected.

Built With

Share this project:

Updates