Inspiration
Cardiovascular Disease (CVD) is one of the leading causes of death worldwide, yet most people only discover their risk after symptoms appear. We were inspired by a simple question:
“What if people could understand their future CVD risk early — and also know why that risk exists and how to reduce it?”
Most existing systems only give a risk score, which is hard for non-medical users to trust or act upon. We wanted to build a system that not only predicts risk but also explains the contributing factors and provides actionable insights in a simple, understandable way.
What it does
CardioInsight AI is an AI-powered cardiovascular risk prediction system that:
Predicts 10-year CVD risk percentage
Classifies users into Low Risk / High Risk
Explains which factors contribute most to the risk
Simulates “what-if” scenarios to show how reducing certain factors (like BP, BMI, cholesterol, smoking) can lower future risk
Works with both:
Complete medical data
Limited real-world user data
The system is designed as a decision-support tool, not a medical diagnosis.
How we built it 🧠 Data Strategy
Training Dataset We trained our models using the Framingham Heart Study dataset, a well-known clinical dataset used in cardiovascular research.
Evaluation / Hackathon Dataset The hackathon-provided cardiovascular dataset was used during inference to validate model behavior in a real-world scenario.
🤖 Dual-Model Architecture (Key Design Choice)
We built two separate models to handle different real-world situations:
Model A – Full Clinical Model
Uses all available medical features
Designed for hospitals or detailed health records
Higher accuracy with comprehensive data
Model B – Lightweight Practical Model
Uses only 8 commonly available features (Age, Gender, BMI, Blood Pressure, Cholesterol, Glucose, Smoking)
Designed for public health tools, surveys, or limited data scenarios
Faster and more accessible
👉 This dual-model approach makes the system flexible and realistic, instead of assuming perfect medical data.
⚙️ Technical Approach
XGBoost classifier for strong performance on tabular medical data
Handled class imbalance using scale_pos_weight
Hyperparameter tuning with RandomizedSearchCV
Probability calibration using CalibratedClassifierCV
Threshold optimization focused on high recall (minimizing missed high-risk cases)
🔍 Explainability
Integrated SHAP (SHapley Additive Explanations) to:
Identify top risk-increasing factors
Provide human-readable explanations
🔁 What-If Simulation
Simulated medically safe improvements
Example: “If systolic BP is reduced from 150 → 120, risk decreases by X%”
Challenges we ran into
Handling imbalanced medical data where high-risk cases are rare
Avoiding data leakage during calibration and cross-validation
Making AI explanations understandable for non-technical users
Mapping hackathon dataset features to clinical equivalents safely
Ensuring “what-if” suggestions remain medically reasonable
Accomplishments that we're proud of
Built a fully explainable AI system, not just a black-box predictor
Successfully implemented dual-model architecture
Added actionable insights, not just predictions
Maintained strong performance while prioritizing recall
Designed the system to work in real-world, imperfect data conditions
What we learned
In healthcare AI, explainability is as important as accuracy
A single model is often not enough for real-world deployment
Risk prediction becomes meaningful only when users understand why
Small, interpretable improvements can have a big impact on trust
What's next for CardioInsight AI: Explainable CVD Risk Prediction
Deploy as a web application for public access
Integrate with wearable or EHR data
Add time-based risk progression tracking
Collaborate with healthcare professionals for clinical validation
Expand to include preventive recommendations aligned with guidelines
Built With
- github
- joblib
- jupyter/colab
- numpy
- pandas
- python
- scikit-learn
- shap
- streamlit
- streamlit-cloud
- xgboost

Log in or sign up for Devpost to join the conversation.