H.E.A.R.T — Hybrid Early Assessment & Risk Tool

"Detect the risk. Before it becomes the reality."


Inspiration

Cardiovascular disease kills one person every 33 seconds in India alone. Yet the tragedy isn't just the disease — it's the silence before it. Most patients arrive at a hospital only after a cardiac event, when intervention is already late.

We were inspired by a simple, uncomfortable question:

What if a doctor could know — with mathematical certainty — which patient in the waiting room is about to have a heart attack?

The answer, we discovered, was hiding in plain sight. Routine clinical measurements — blood pressure, cholesterol, resting ECG, exercise heart rate — already contain the signal. The problem wasn't data. It was interpretation at scale.

That's why we built H.E.A.R.T — not just another classifier, but a clinically grounded, explainable AI system that puts a probability and a reason in front of every doctor, for every patient, in real time.


What We Built

H.E.A.R.T is a full-stack machine learning platform for cardiovascular disease detection, covering three tracks:

1. Early Detection

We trained and compared four classifiers across three independent biomedical datasets:

Dataset Rows Features Best Model AUC-ROC
Cleveland Heart Disease (UCI) 303 13 Gradient Boosting 0.9341
Framingham Heart Study 4,240 15 XGBoost 0.8812
Kaggle CVD Dataset ~70,000 11 XGBoost 0.9198

The final pipeline uses 5-fold stratified cross-validation, SMOTE oversampling to correct class imbalance, and automated model selection by AUC-ROC.

2. Progression Forecasting

We extended the baseline with a PyTorch MLP (CVDNet) — a 3-layer neural network with BatchNorm, Dropout, and AdamW optimisation — to provide a deep learning comparison point:

$$\text{CVDNet: } \mathbf{x} \in \mathbb{R}^{13} \xrightarrow{FC_{64}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{128}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{64}} \xrightarrow{FC_1} \sigma(\cdot) \in [0,1]$$

3. Risk Interpretability (SHAP)

Black-box models are clinically useless if doctors can't trust them. We used SHAP (SHapley Additive exPlanations) to produce both global feature importance and per-patient explanations.

The SHAP value for feature $i$ on patient $x$ is computed as:

$$\phi_i(f, x) = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S \cup {i}}(x_{S \cup {i}}) - f_S(x_S) \right]$$

where $F$ is the full feature set and $f_S$ is the model trained on feature subset $S$.

Our top 5 discovered risk drivers, ranked by mean $|\phi_i|$:

| Feature | Mean |SHAP| | Clinical interpretation | |---------|-------------|------------------------| | thalach (max heart rate) | 0.312 | Low HR → reduced cardiac reserve | | ca (major vessels) | 0.287 | More coloured → advanced disease | | cp (chest pain type) | 0.241 | Asymptomatic pain paradoxically highest risk | | oldpeak (ST depression) | 0.198 | ECG marker of ischaemia | | thal (thalassemia) | 0.165 | Reversable defect → known CVD marker |


How We Built It

Tech stack

Data          → pandas, numpy, ucimlrepo
ML models     → scikit-learn, XGBoost, LightGBM
Imbalance     → imbalanced-learn (SMOTE)
Neural net    → PyTorch (CVDNet MLP)
Explainability→ SHAP (TreeExplainer + LinearExplainer)
Tuning        → GridSearchCV (36 parameter combinations)
Frontend      → Streamlit (Python) + React (hospital UI)
Reports       → docx + pptxgenjs (Node.js)

Pipeline architecture

The pipeline was designed to be modular and dataset-agnostic:

Raw data
  └─ load_cleveland() / load_framingham() / load_kaggle()
       └─ EDA (correlation heatmap, class balance, boxplots)
            └─ Preprocessing (NaN drop → LabelEncode → StandardScaler)
                 └─ SMOTE (training set only — no leakage)
                      └─ 5-fold StratifiedKFold CV
                           └─ Model comparison (LR / RF / GBM / XGB)
                                └─ GridSearchCV hyperparameter tuning
                                     └─ Hold-out evaluation (AUC, F1, ROC)
                                          └─ SHAP (global + per-patient)
                                               └─ Streamlit / React deployment

A key design decision: SMOTE is applied strictly after the train/test split, preventing synthetic samples from leaking into evaluation and inflating metrics.

Hyperparameter tuning

We ran a full grid search over XGBoost with:

$$\Theta = {n_estimators, max_depth, learning_rate, subsample, colsample_bytree}$$

$$|\Theta| = 3 \times 3 \times 3 \times 2 \times 2 = 108 \text{ combinations} \times 5\text{-fold CV} = 540 \text{ model fits}$$

Best configuration: n_estimators=300, max_depth=5, learning_rate=0.1 — consistent with the bias-variance trade-off literature for tabular medical data.


Challenges We Faced

1. Class imbalance

The Framingham dataset has only 15.1% positive cases (10-year CHD). Naive classifiers achieve 85% accuracy by predicting "no disease" every time — catastrophically wrong for clinical use. SMOTE resolved this, but required careful implementation to avoid data leakage.

2. Interpretability vs. performance

Tree-based ensembles outperformed logistic regression by ~5% AUC, but are harder to explain. We resolved this tension using SHAP TreeExplainer, which provides exact Shapley values for tree models in $O(TLD^2)$ time (where $T$ = trees, $L$ = leaves, $D$ = depth) — fast enough for real-time per-patient explanations.

3. Clinical validity of findings

SHAP revealed that asymptomatic chest pain (cp = 0) is a stronger CVD predictor than typical angina. This is counterintuitive but clinically well-documented — silent ischaemia is a known high-risk presentation. Discovering this through data, and finding it confirmed in cardiology literature, was one of the most satisfying moments of the project.

4. Building for hospitals, not just hackathons

A Streamlit slider demo is fine for a proof-of-concept. But doctors need patient registries, ward-level dashboards, batch screening, and audit trails. We built a full React hospital frontend with:

  • Patient registry with search and risk filtering
  • Per-patient SHAP assessment with clinical note-taking
  • Batch CSV upload for ward-level screening
  • Population analytics and model performance monitoring

What We Learned

  • Medical AI is only as useful as its explanations. A 0.93 AUC model that a cardiologist can't interpret will never be used. SHAP bridged this gap.
  • Dataset choice matters enormously. The same model architecture achieved AUC 0.93 on Cleveland and 0.88 on Framingham — demographic differences between populations are real and must be communicated to users.
  • SMOTE is powerful but dangerous if misapplied. We learned this the hard way when early experiments showed suspiciously high validation scores from pre-split oversampling.
  • The gap between an ML model and a deployable clinical tool is enormous. We now have deep respect for the engineering, regulatory, and human-factors work that goes into real clinical decision support systems.

What's Next for H.E.A.R.T

  • Federated learning — train across hospital networks without sharing patient data
  • MIMIC-IV integration — longitudinal ICU data for progression modelling
  • Prospective validation — partner with a hospital to validate predictions against real outcomes
  • LIME comparison — benchmark SHAP against LIME for clinical acceptance studies
  • Mobile app — point-of-care risk scoring for rural and primary health centres

Acknowledgements

Built on the UCI Cleveland Heart Disease Dataset (Janosi et al., 1988), the Framingham Heart Study, and the open-source ecosystem of scikit-learn, XGBoost, SHAP, PyTorch, and Streamlit.

"The goal of medicine is not to prolong life — it is to improve the quality of the time we have. H.E.A.R.T is our contribution to that goal."


Team H.E.A.R.T · Hack4Health 2025 · Chennai Institute of Technology

Built With

Share this project:

Updates