CardioSight

Inspiration

Every 33 seconds, someone in the United States dies from cardiovascular disease.
Despite decades of research, most clinical risk tools — like the Framingham Risk Score — were built in 1948 on a small, homogeneous population using simple linear math.
We wanted to ask: what if we gave modern machine learning access to the same data a primary care doctor sees, and made it explain itself?

What it does

CardioSight predicts an individual's cardiovascular disease (CVD) risk from routine clinical biomarkers — blood pressure, cholesterol panel, HbA1c, BMI, smoking history — and stratifies them into four actionable tiers:

Tier	Probability	Clinical Action
🟢 Low	< 10%	Annual monitoring
🟡 Moderate	10–20%	Lifestyle intervention
🔴 High	20–40%	Prompt cardiology referral
🟣 Very High	> 40%	Immediate evaluation

Every prediction comes with a SHAP explanation — a ranked list of the exact features driving that patient's risk score, so a clinician can understand why, not just what.

How we built it

Dataset: NHANES 2017-2018 (CDC) — a nationally representative survey of 9,254 Americans combining physical exams, lab results, and medical history. We merged 11 data modules and engineered 26 clinical features including Pulse Pressure, Cholesterol/HDL Ratio, Metabolic Syndrome Score, and a computed Framingham Risk Score.

Pipeline:

MICE imputation (Multiple Imputation by Chained Equations) for missing clinical values
SMOTE oversampling to correct the 11.3% class imbalance
Optuna hyperparameter tuning — 150 trials across Random Forest, XGBoost, and LightGBM
Stacking ensemble — Logistic Regression meta-learner on 5-fold out-of-fold predictions
Threshold calibration — swept 0.10→0.70, maximised F1 on validation set (optimal: 0.11)
SHAP TreeExplainer — global importance plots + per-patient waterfall explanations

Results:

[ \text{CV AUROC} = 0.815 \pm 0.009 ]

Model	AUROC	F1 (calibrated)
Soft Ensemble	0.8026	0.393
Stacking	0.7942	0.392
Framingham Score (baseline)	~0.73	—

Risk stratification validated — actual CVD rates rise monotonically: 3.7% → 16.9% → 30.1% → 31.5% across Low / Moderate / High / Very High tiers.

Website built and deployed using Coder, displaying all results, plots, and an interactive patient risk calculator — no backend required.

Challenges we ran into

CDC restructured their data portal mid-build — the NHANES URL pattern changed, causing all downloads to silently return HTML 404 pages instead of XPT files. We debugged the file headers byte-by-byte and found the correct new URL structure.
Self-reported CVD labels introduce irreducible recall bias — patients forget past events. Our Optuna CV AUROC hit 0.98 on training but plateaued at 0.80 on test, a gap that reveals label noise rather than overfitting.
Threshold misconception — default 0.5 threshold is nearly useless on 11% imbalanced data. Threshold calibration to 0.11 raised F1 by +0.19 absolute — the single biggest gain in the entire pipeline.

Accomplishments that we're proud of

Built a fully reproducible pipeline that downloads real CDC data, trains 6 models, tunes hyperparameters, and generates a 5-page PDF report — in a single script.
Achieved AUROC 0.815 on a nationally representative real-world cohort, meaningfully outperforming the 70-year-old Framingham baseline.
Every prediction is explainable — not a black box.
A live interactive risk calculator that estimates your CVD risk from 9 inputs in under a second, with factor-level explanations.

What we learned

Clinical ML lives and dies by target quality — self-reported labels are the hardest ceiling to break through, more than model architecture or hyperparameters.
Threshold calibration matters more than model choice in imbalanced medical datasets.
SHAP explanations are not just nice-to-have — they are the difference between a tool clinicians trust and one they ignore.
Feature engineering with domain knowledge (Metabolic Syndrome Score, Trig/HDL ratio, BP Stage) meaningfully outperforms throwing raw variables at a model.

What's next for CardioSight

Validate on MIMIC-IV (adjudicated EHR outcomes, no recall bias) to push past 0.85 AUROC.
Add longitudinal risk trajectories — not just current risk, but projected 10-year curves.
Integrate NHANES 2019-2023 cycles once CDC migrates them to the new CDN.
Build a clinician-facing dashboard with patient cohort views and population-level insights.

Built With

coder
fpdf2
imbalanced-learn
jupyter
lightgbm
matplotlib
netlify
nhanes
numpy
optuna
pandas
python
scikit-learn
seaborn
shap
xgboost

Updates

Parth Rathi started this project — Mar 06, 2026 06:24 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.