Inspiration

Cardiovascular Disease (CVD) is one of the leading causes of death worldwide, yet most people only discover their risk after symptoms appear. We were inspired by a simple question:

“What if people could understand their future CVD risk early — and also know why that risk exists and how to reduce it?”

Most existing systems only give a risk score, which is hard for non-medical users to trust or act upon. We wanted to build a system that not only predicts risk but also explains the contributing factors and provides actionable insights in a simple, understandable way.

What it does

CardioInsight AI is an AI-powered cardiovascular risk prediction system that:

Predicts 10-year CVD risk percentage

Classifies users into Low Risk / High Risk

Explains which factors contribute most to the risk

Simulates “what-if” scenarios to show how reducing certain factors (like BP, BMI, cholesterol, smoking) can lower future risk

Works with both:

Complete medical data

Limited real-world user data

The system is designed as a decision-support tool, not a medical diagnosis.

How we built it 🧠 Data Strategy

Training Dataset We trained our models using the Framingham Heart Study dataset, a well-known clinical dataset used in cardiovascular research.

Evaluation / Hackathon Dataset The hackathon-provided cardiovascular dataset was used during inference to validate model behavior in a real-world scenario.

🤖 Dual-Model Architecture (Key Design Choice)

We built two separate models to handle different real-world situations:

Model A – Full Clinical Model

Uses all available medical features

Designed for hospitals or detailed health records

Higher accuracy with comprehensive data

Model B – Lightweight Practical Model

Uses only 8 commonly available features (Age, Gender, BMI, Blood Pressure, Cholesterol, Glucose, Smoking)

Designed for public health tools, surveys, or limited data scenarios

Faster and more accessible

👉 This dual-model approach makes the system flexible and realistic, instead of assuming perfect medical data.

⚙️ Technical Approach

XGBoost classifier for strong performance on tabular medical data

Handled class imbalance using scale_pos_weight

Hyperparameter tuning with RandomizedSearchCV

Probability calibration using CalibratedClassifierCV

Threshold optimization focused on high recall (minimizing missed high-risk cases)

🔍 Explainability

Integrated SHAP (SHapley Additive Explanations) to:

Identify top risk-increasing factors

Provide human-readable explanations

🔁 What-If Simulation

Simulated medically safe improvements

Example: “If systolic BP is reduced from 150 → 120, risk decreases by X%”

Challenges we ran into

Handling imbalanced medical data where high-risk cases are rare

Avoiding data leakage during calibration and cross-validation

Making AI explanations understandable for non-technical users

Mapping hackathon dataset features to clinical equivalents safely

Ensuring “what-if” suggestions remain medically reasonable

Accomplishments that we're proud of

Built a fully explainable AI system, not just a black-box predictor

Successfully implemented dual-model architecture

Added actionable insights, not just predictions

Maintained strong performance while prioritizing recall

Designed the system to work in real-world, imperfect data conditions

What we learned

In healthcare AI, explainability is as important as accuracy

A single model is often not enough for real-world deployment

Risk prediction becomes meaningful only when users understand why

Small, interpretable improvements can have a big impact on trust

What's next for CardioInsight AI: Explainable CVD Risk Prediction

Deploy as a web application for public access

Integrate with wearable or EHR data

Add time-based risk progression tracking

Collaborate with healthcare professionals for clinical validation

Expand to include preventive recommendations aligned with guidelines

Built With

Share this project:

Updates