Inspiration

Every 33 seconds, someone in the United States dies from cardiovascular disease.
Despite decades of research, most clinical risk tools — like the Framingham Risk Score — were built in 1948 on a small, homogeneous population using simple linear math.
We wanted to ask: what if we gave modern machine learning access to the same data a primary care doctor sees, and made it explain itself?

What it does

CardioSight predicts an individual's cardiovascular disease (CVD) risk from routine clinical biomarkers — blood pressure, cholesterol panel, HbA1c, BMI, smoking history — and stratifies them into four actionable tiers:

Tier Probability Clinical Action
🟢 Low < 10% Annual monitoring
🟡 Moderate 10–20% Lifestyle intervention
🔴 High 20–40% Prompt cardiology referral
🟣 Very High > 40% Immediate evaluation

Every prediction comes with a SHAP explanation — a ranked list of the exact features driving that patient's risk score, so a clinician can understand why, not just what.

How we built it

Dataset: NHANES 2017-2018 (CDC) — a nationally representative survey of 9,254 Americans combining physical exams, lab results, and medical history. We merged 11 data modules and engineered 26 clinical features including Pulse Pressure, Cholesterol/HDL Ratio, Metabolic Syndrome Score, and a computed Framingham Risk Score.

Pipeline:

  1. MICE imputation (Multiple Imputation by Chained Equations) for missing clinical values
  2. SMOTE oversampling to correct the 11.3% class imbalance
  3. Optuna hyperparameter tuning — 150 trials across Random Forest, XGBoost, and LightGBM
  4. Stacking ensemble — Logistic Regression meta-learner on 5-fold out-of-fold predictions
  5. Threshold calibration — swept 0.10→0.70, maximised F1 on validation set (optimal: 0.11)
  6. SHAP TreeExplainer — global importance plots + per-patient waterfall explanations

Results:

[ \text{CV AUROC} = 0.815 \pm 0.009 ]

Model AUROC F1 (calibrated)
Soft Ensemble 0.8026 0.393
Stacking 0.7942 0.392
Framingham Score (baseline) ~0.73

Risk stratification validated — actual CVD rates rise monotonically: 3.7% → 16.9% → 30.1% → 31.5% across Low / Moderate / High / Very High tiers.

Website built and deployed using Coder, displaying all results, plots, and an interactive patient risk calculator — no backend required.

Challenges we ran into

  • CDC restructured their data portal mid-build — the NHANES URL pattern changed, causing all downloads to silently return HTML 404 pages instead of XPT files. We debugged the file headers byte-by-byte and found the correct new URL structure.
  • Self-reported CVD labels introduce irreducible recall bias — patients forget past events. Our Optuna CV AUROC hit 0.98 on training but plateaued at 0.80 on test, a gap that reveals label noise rather than overfitting.
  • Threshold misconception — default 0.5 threshold is nearly useless on 11% imbalanced data. Threshold calibration to 0.11 raised F1 by +0.19 absolute — the single biggest gain in the entire pipeline.

Accomplishments that we're proud of

  • Built a fully reproducible pipeline that downloads real CDC data, trains 6 models, tunes hyperparameters, and generates a 5-page PDF report — in a single script.
  • Achieved AUROC 0.815 on a nationally representative real-world cohort, meaningfully outperforming the 70-year-old Framingham baseline.
  • Every prediction is explainable — not a black box.
  • A live interactive risk calculator that estimates your CVD risk from 9 inputs in under a second, with factor-level explanations.

What we learned

  • Clinical ML lives and dies by target quality — self-reported labels are the hardest ceiling to break through, more than model architecture or hyperparameters.
  • Threshold calibration matters more than model choice in imbalanced medical datasets.
  • SHAP explanations are not just nice-to-have — they are the difference between a tool clinicians trust and one they ignore.
  • Feature engineering with domain knowledge (Metabolic Syndrome Score, Trig/HDL ratio, BP Stage) meaningfully outperforms throwing raw variables at a model.

What's next for CardioSight

  • Validate on MIMIC-IV (adjudicated EHR outcomes, no recall bias) to push past 0.85 AUROC.
  • Add longitudinal risk trajectories — not just current risk, but projected 10-year curves.
  • Integrate NHANES 2019-2023 cycles once CDC migrates them to the new CDN.
  • Build a clinician-facing dashboard with patient cohort views and population-level insights.

Built With

Share this project:

Updates