Cardio Disease Risk Prediction using Machine Learning

CARDIO RISK PREDICTION

Inspiration

Cardiovascular disease is one of the leading causes of death worldwide. I wanted to build a simple and practical ML model that can predict CVD risk early using basic clinical and lifestyle information.

What it does

This project predicts whether a person is likely to have cardiovascular disease (cardio = 1) or not (cardio = 0) using patient health attributes such as age, blood pressure, cholesterol, glucose, and lifestyle indicators.

How I built it

Loaded the processed cardiovascular dataset (70,000 records)
Cleaned the data by removing non-useful columns (Unnamed: 0, id)
Built a fully reproducible preprocessing + training pipeline using scikit-learn
Trained and compared two models:
- Logistic Regression (baseline, interpretable)
- Random Forest (better performance)
Evaluated using ROC-AUC and accuracy
Visualized feature importance for explainability