Inspiration

Cardiovascular disease is one of the leading causes of death worldwide, yet many cases can be prevented through early detection and better health monitoring. This inspired the development of CardioProgress AI, a machine learning system designed to analyze patient health data and identify patterns associated with cardiovascular disease risk.

The goal of this project is to explore how artificial intelligence and data science can support preventative healthcare by helping identify potential risk factors earlier. By combining predictive machine learning models with explainable AI techniques, the project aims to make data-driven health predictions both accurate and interpretable.


What it does

CardioProgress AI analyzes patient health metrics such as age, blood pressure, cholesterol levels, BMI, and lifestyle indicators to predict whether a patient may be at risk of cardiovascular disease.

The system trains several machine learning models and compares their performance to determine which algorithm produces the most accurate predictions. In addition to predicting disease risk, the project also uses explainable AI techniques to identify the most influential factors contributing to the model’s predictions.

This allows the system not only to generate predictions but also to provide insights into which health variables are most strongly associated with cardiovascular disease.


How we built it

The project was built using a full machine learning workflow:

1. Data preprocessing
The dataset was cleaned by removing unnecessary columns, converting age from days to years, and creating a Body Mass Index (BMI) feature using height and weight.

2. Exploratory data analysis
Visualizations such as age distribution charts, cardiovascular disease frequency plots, and correlation heatmaps were used to better understand relationships between variables.

3. Machine learning models
Three classification algorithms were implemented and compared:

  • Logistic Regression
  • Random Forest
  • XGBoost

4. Model evaluation
The models were evaluated using accuracy scores, confusion matrices, and classification reports to determine the most effective prediction model.

5. Explainable AI
SHAP (SHapley Additive Explanations) was used to analyze feature importance and interpret how different health metrics influence predictions.

6. Prediction function
A simple prediction function was created to demonstrate how the trained model can analyze new patient data and estimate cardiovascular disease risk.


Challenges we ran into

One of the main challenges was ensuring that the dataset was properly cleaned and formatted before training the machine learning models. Small preprocessing steps such as converting age into years and creating derived features like BMI were important for improving model performance.

Another challenge involved interpreting the model’s predictions. While advanced algorithms like XGBoost can provide strong predictive results, understanding why the model makes certain predictions is essential in healthcare applications. Implementing SHAP explainability helped address this challenge by revealing which features contributed most strongly to predictions.


Accomplishments that we're proud of

We are proud of building a complete end-to-end machine learning pipeline that includes data preprocessing, exploratory analysis, model training, evaluation, and explainability.

Another accomplishment was successfully applying Explainable AI techniques to better understand how the model evaluates cardiovascular disease risk. This makes the predictions more transparent and demonstrates how machine learning models can provide meaningful insights into health data.


What we learned

Through this project, we learned how to build and evaluate multiple machine learning models and compare their predictive performance. We also gained experience in feature engineering, exploratory data analysis, and implementing explainable AI methods.

Additionally, we learned the importance of transparency and interpretability when applying machine learning to healthcare-related problems, where understanding how predictions are made is just as important as the predictions themselves.


What's next for CardioProgress AI

Future improvements for CardioProgress AI could include:

  • Performing hyperparameter tuning to further improve model performance
  • Incorporating additional health and lifestyle data
  • Building an interactive user interface for real-time predictions
  • Deploying the model as a web application or API
  • Testing the model with larger and more diverse healthcare datasets

Ultimately, the goal is to continue developing the system so it can better support early detection and preventative healthcare strategies.

Built With

Share this project:

Updates