H.E.A.R.T — Hybrid Early Assessment & Risk Tool
"Detect the risk. Before it becomes the reality."
Inspiration
Cardiovascular disease kills one person every 33 seconds in India alone. Yet the tragedy isn't just the disease — it's the silence before it. Most patients arrive at a hospital only after a cardiac event, when intervention is already late.
We were inspired by a simple, uncomfortable question:
What if a doctor could know — with mathematical certainty — which patient in the waiting room is about to have a heart attack?
The answer, we discovered, was hiding in plain sight. Routine clinical measurements — blood pressure, cholesterol, resting ECG, exercise heart rate — already contain the signal. The problem wasn't data. It was interpretation at scale.
That's why we built H.E.A.R.T — not just another classifier, but a clinically grounded, explainable AI system that puts a probability and a reason in front of every doctor, for every patient, in real time.
What We Built
H.E.A.R.T is a full-stack machine learning platform for cardiovascular disease detection, covering three tracks:
1. Early Detection
We trained and compared four classifiers across three independent biomedical datasets:
| Dataset | Rows | Features | Best Model | AUC-ROC |
|---|---|---|---|---|
| Cleveland Heart Disease (UCI) | 303 | 13 | Gradient Boosting | 0.9341 |
| Framingham Heart Study | 4,240 | 15 | XGBoost | 0.8812 |
| Kaggle CVD Dataset | ~70,000 | 11 | XGBoost | 0.9198 |
The final pipeline uses 5-fold stratified cross-validation, SMOTE oversampling to correct class imbalance, and automated model selection by AUC-ROC.
2. Progression Forecasting
We extended the baseline with a PyTorch MLP (CVDNet) — a 3-layer neural network with BatchNorm, Dropout, and AdamW optimisation — to provide a deep learning comparison point:
$$\text{CVDNet: } \mathbf{x} \in \mathbb{R}^{13} \xrightarrow{FC_{64}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{128}} \xrightarrow{BN+ReLU} \xrightarrow{FC_{64}} \xrightarrow{FC_1} \sigma(\cdot) \in [0,1]$$
3. Risk Interpretability (SHAP)
Black-box models are clinically useless if doctors can't trust them. We used SHAP (SHapley Additive exPlanations) to produce both global feature importance and per-patient explanations.
The SHAP value for feature $i$ on patient $x$ is computed as:
$$\phi_i(f, x) = \sum_{S \subseteq F \setminus {i}} \frac{|S|!(|F|-|S|-1)!}{|F|!} \left[ f_{S \cup {i}}(x_{S \cup {i}}) - f_S(x_S) \right]$$
where $F$ is the full feature set and $f_S$ is the model trained on feature subset $S$.
Our top 5 discovered risk drivers, ranked by mean $|\phi_i|$:
| Feature | Mean |SHAP| | Clinical interpretation |
|---------|-------------|------------------------|
| thalach (max heart rate) | 0.312 | Low HR → reduced cardiac reserve |
| ca (major vessels) | 0.287 | More coloured → advanced disease |
| cp (chest pain type) | 0.241 | Asymptomatic pain paradoxically highest risk |
| oldpeak (ST depression) | 0.198 | ECG marker of ischaemia |
| thal (thalassemia) | 0.165 | Reversable defect → known CVD marker |
How We Built It
Tech stack
Data → pandas, numpy, ucimlrepo
ML models → scikit-learn, XGBoost, LightGBM
Imbalance → imbalanced-learn (SMOTE)
Neural net → PyTorch (CVDNet MLP)
Explainability→ SHAP (TreeExplainer + LinearExplainer)
Tuning → GridSearchCV (36 parameter combinations)
Frontend → Streamlit (Python) + React (hospital UI)
Reports → docx + pptxgenjs (Node.js)
Pipeline architecture
The pipeline was designed to be modular and dataset-agnostic:
Raw data
└─ load_cleveland() / load_framingham() / load_kaggle()
└─ EDA (correlation heatmap, class balance, boxplots)
└─ Preprocessing (NaN drop → LabelEncode → StandardScaler)
└─ SMOTE (training set only — no leakage)
└─ 5-fold StratifiedKFold CV
└─ Model comparison (LR / RF / GBM / XGB)
└─ GridSearchCV hyperparameter tuning
└─ Hold-out evaluation (AUC, F1, ROC)
└─ SHAP (global + per-patient)
└─ Streamlit / React deployment
A key design decision: SMOTE is applied strictly after the train/test split, preventing synthetic samples from leaking into evaluation and inflating metrics.
Hyperparameter tuning
We ran a full grid search over XGBoost with:
$$\Theta = {n_estimators, max_depth, learning_rate, subsample, colsample_bytree}$$
$$|\Theta| = 3 \times 3 \times 3 \times 2 \times 2 = 108 \text{ combinations} \times 5\text{-fold CV} = 540 \text{ model fits}$$
Best configuration: n_estimators=300, max_depth=5, learning_rate=0.1 — consistent with the bias-variance trade-off literature for tabular medical data.
Challenges We Faced
1. Class imbalance
The Framingham dataset has only 15.1% positive cases (10-year CHD). Naive classifiers achieve 85% accuracy by predicting "no disease" every time — catastrophically wrong for clinical use. SMOTE resolved this, but required careful implementation to avoid data leakage.
2. Interpretability vs. performance
Tree-based ensembles outperformed logistic regression by ~5% AUC, but are harder to explain. We resolved this tension using SHAP TreeExplainer, which provides exact Shapley values for tree models in $O(TLD^2)$ time (where $T$ = trees, $L$ = leaves, $D$ = depth) — fast enough for real-time per-patient explanations.
3. Clinical validity of findings
SHAP revealed that asymptomatic chest pain (cp = 0) is a stronger CVD predictor than typical angina. This is counterintuitive but clinically well-documented — silent ischaemia is a known high-risk presentation. Discovering this through data, and finding it confirmed in cardiology literature, was one of the most satisfying moments of the project.
4. Building for hospitals, not just hackathons
A Streamlit slider demo is fine for a proof-of-concept. But doctors need patient registries, ward-level dashboards, batch screening, and audit trails. We built a full React hospital frontend with:
- Patient registry with search and risk filtering
- Per-patient SHAP assessment with clinical note-taking
- Batch CSV upload for ward-level screening
- Population analytics and model performance monitoring
What We Learned
- Medical AI is only as useful as its explanations. A 0.93 AUC model that a cardiologist can't interpret will never be used. SHAP bridged this gap.
- Dataset choice matters enormously. The same model architecture achieved AUC 0.93 on Cleveland and 0.88 on Framingham — demographic differences between populations are real and must be communicated to users.
- SMOTE is powerful but dangerous if misapplied. We learned this the hard way when early experiments showed suspiciously high validation scores from pre-split oversampling.
- The gap between an ML model and a deployable clinical tool is enormous. We now have deep respect for the engineering, regulatory, and human-factors work that goes into real clinical decision support systems.
What's Next for H.E.A.R.T
- Federated learning — train across hospital networks without sharing patient data
- MIMIC-IV integration — longitudinal ICU data for progression modelling
- Prospective validation — partner with a hospital to validate predictions against real outcomes
- LIME comparison — benchmark SHAP against LIME for clinical acceptance studies
- Mobile app — point-of-care risk scoring for rural and primary health centres
Acknowledgements
Built on the UCI Cleveland Heart Disease Dataset (Janosi et al., 1988), the Framingham Heart Study, and the open-source ecosystem of scikit-learn, XGBoost, SHAP, PyTorch, and Streamlit.
"The goal of medicine is not to prolong life — it is to improve the quality of the time we have. H.E.A.R.T is our contribution to that goal."
Team H.E.A.R.T · Hack4Health 2025 · Chennai Institute of Technology
Built With
- apis
- cvd
- data
- dataset
- datasets
- deployment
- framingham
- heart
- imbalanced-learn
- javascript
- kaggle
- latex
- lightgbm
- markdown
- matplotlib
- ml
- node.js
- numpy
- pandas
- pptxgenjs
- python
- python-docx
- pytorch
- react
- repository
- science
- scikit-learn
- seaborn
- shap
- smote
- streamlit
- study
- uci
- ucimlrepo
- visualisation
- xgboost
Log in or sign up for Devpost to join the conversation.