H.E.A.R.T — Hybrid Early Assessment & Risk Tool

"Detect the risk. Before it becomes the reality."

Inspiration

Cardiovascular disease kills one person every 33 seconds in India alone. Yet the tragedy isn't just the disease — it's the silence before it. Most patients arrive at a hospital only after a cardiac event, when intervention is already late.

We were inspired by a simple, uncomfortable question:

What if a doctor could know — with mathematical certainty — which patient in the waiting room is about to have a heart attack?

The answer, we discovered, was hiding in plain sight. Routine clinical measurements — blood pressure, cholesterol, resting ECG, exercise heart rate — already contain the signal. The problem wasn't data. It was interpretation at scale.

That's why we built H.E.A.R.T — not just another classifier, but a clinically grounded, explainable AI system that puts a probability and a reason in front of every doctor, for every patient, in real time.

What We Built

H.E.A.R.T is a full-stack machine learning platform for cardiovascular disease detection, covering three tracks:

1. Early Detection

We trained and compared four classifiers across three independent biomedical datasets:

Dataset	Rows	Features	Best Model	AUC-ROC
Cleveland Heart Disease (UCI)	303	13	Gradient Boosting	0.9341
Framingham Heart Study	4,240	15	XGBoost	0.8812
Kaggle CVD Dataset	~70,000	11	XGBoost	0.9198

The final pipeline uses 5-fold stratified cross-validation, SMOTE oversampling to correct class imbalance, and automated model selection by AUC-ROC.

2. Progression Forecasting

We extended the baseline with a PyTorch MLP (CVDNet) — a 3-layer neural network with BatchNorm, Dropout, and AdamW optimisation — to provide a deep learning comparison point:

$$\text{CVDNet: } \mathbf{x} \in \mathbb{R}^{13} \xrightarrow{FC_{64}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{128}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{64}} \xrightarrow{FC_1} \sigma(\cdot) \in [0,1]$$

3. Risk Interpretability (SHAP)

Black-box models are clinically useless if doctors can't trust them. We used SHAP (SHapley Additive exPlanations) to produce both global feature importance and per-patient explanations.

The SHAP value for feature $i$ on patient $x$ is computed as:

$$\phi_i(f, x) = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S \cup {i}}(x_{S \cup {i}}) - f_S(x_S) \right]$$

where $F$ is the full feature set and $f_S$ is the model trained on feature subset $S$.

Our top 5 discovered risk drivers, ranked by mean $|\phi_i|$:

| Feature | Mean |SHAP| | Clinical interpretation | |---------|-------------|------------------------| | thalach (max heart rate) | 0.312 | Low HR → reduced cardiac reserve | | ca (major vessels) | 0.287 | More coloured → advanced disease | | cp (chest pain type) | 0.241 | Asymptomatic pain paradoxically highest risk | | oldpeak (ST depression) | 0.198 | ECG marker of ischaemia | | thal (thalassemia) | 0.165 | Reversable defect → known CVD marker |

How We Built It

Tech stack

Data          → pandas, numpy, ucimlrepo
ML models     → scikit-learn, XGBoost, LightGBM
Imbalance     → imbalanced-learn (SMOTE)
Neural net    → PyTorch (CVDNet MLP)
Explainability→ SHAP (TreeExplainer + LinearExplainer)
Tuning        → GridSearchCV (36 parameter combinations)
Frontend      → Streamlit (Python) + React (hospital UI)
Reports       → docx + pptxgenjs (Node.js)

Pipeline architecture

The pipeline was designed to be modular and dataset-agnostic:

Raw data
  └─ load_cleveland() / load_framingham() / load_kaggle()
       └─ EDA (correlation heatmap, class balance, boxplots)
            └─ Preprocessing (NaN drop → LabelEncode → StandardScaler)
                 └─ SMOTE (training set only — no leakage)
                      └─ 5-fold StratifiedKFold CV
                           └─ Model comparison (LR / RF / GBM / XGB)
                                └─ GridSearchCV hyperparameter tuning
                                     └─ Hold-out evaluation (AUC, F1, ROC)
                                          └─ SHAP (global + per-patient)
                                               └─ Streamlit / React deployment

A key design decision: SMOTE is applied strictly after the train/test split, preventing synthetic samples from leaking into evaluation and inflating metrics.

Hyperparameter tuning

We ran a full grid search over XGBoost with:

$$\Theta = {n_estimators, max_depth, learning_rate, subsample, colsample_bytree}$$

$$|\Theta| = 3 \times 3 \times 3 \times 2 \times 2 = 108 \text{ combinations} \times 5\text{-fold CV} = 540 \text{ model fits}$$

Best configuration: n_estimators=300, max_depth=5, learning_rate=0.1 — consistent with the bias-variance trade-off literature for tabular medical data.

Challenges We Faced

1. Class imbalance

The Framingham dataset has only 15.1% positive cases (10-year CHD). Naive classifiers achieve 85% accuracy by predicting "no disease" every time — catastrophically wrong for clinical use. SMOTE resolved this, but required careful implementation to avoid data leakage.

2. Interpretability vs. performance

Tree-based ensembles outperformed logistic regression by ~5% AUC, but are harder to explain. We resolved this tension using SHAP TreeExplainer, which provides exact Shapley values for tree models in $O(TLD^2)$ time (where $T$ = trees, $L$ = leaves, $D$ = depth) — fast enough for real-time per-patient explanations.

3. Clinical validity of findings

SHAP revealed that asymptomatic chest pain (cp = 0) is a stronger CVD predictor than typical angina. This is counterintuitive but clinically well-documented — silent ischaemia is a known high-risk presentation. Discovering this through data, and finding it confirmed in cardiology literature, was one of the most satisfying moments of the project.

4. Building for hospitals, not just hackathons

A Streamlit slider demo is fine for a proof-of-concept. But doctors need patient registries, ward-level dashboards, batch screening, and audit trails. We built a full React hospital frontend with:

Patient registry with search and risk filtering
Per-patient SHAP assessment with clinical note-taking
Batch CSV upload for ward-level screening
Population analytics and model performance monitoring

What We Learned

Medical AI is only as useful as its explanations. A 0.93 AUC model that a cardiologist can't interpret will never be used. SHAP bridged this gap.
Dataset choice matters enormously. The same model architecture achieved AUC 0.93 on Cleveland and 0.88 on Framingham — demographic differences between populations are real and must be communicated to users.
SMOTE is powerful but dangerous if misapplied. We learned this the hard way when early experiments showed suspiciously high validation scores from pre-split oversampling.
The gap between an ML model and a deployable clinical tool is enormous. We now have deep respect for the engineering, regulatory, and human-factors work that goes into real clinical decision support systems.

What's Next for H.E.A.R.T

Federated learning — train across hospital networks without sharing patient data
MIMIC-IV integration — longitudinal ICU data for progression modelling
Prospective validation — partner with a hospital to validate predictions against real outcomes
LIME comparison — benchmark SHAP against LIME for clinical acceptance studies
Mobile app — point-of-care risk scoring for rural and primary health centres

Acknowledgements

Built on the UCI Cleveland Heart Disease Dataset (Janosi et al., 1988), the Framingham Heart Study, and the open-source ecosystem of scikit-learn, XGBoost, SHAP, PyTorch, and Streamlit.