Added images to show the workflow as I forgot to record a step-by-step visual demo of working application

Inspiration

Improve health care using Machine Learning. #AIforgood #AIinHealthcare

What it does

The goal is to the find presence/absence of heart disease in the patient. I got the data from UCI Repository. I used Cleveland database. It contains 14 attributes, namely, age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal, num (the predicted attribute). The num attribute is integer valued from 0 (no presence) to 4. We concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).

How we built it

We register the dataset in Azure and then created compute cluster and ran automl and hyperdrive runs. Hyperparameter tuning using logisic regression model with hyperparameters C(Inverse of regularization strength. Smaller values cause stronger regularization) and max-iter(Maximum number of iterations to converge) and used RandomParameterSampling with params max_iter(can have values 100,200,300,400) and C (can have 0.001, 0.01, 0.1, 1, 10, 100, 1000) and Bandit Policy with evaluation_interval(The frequency for applying the policy) as 2 and slack_factor(The ratio used to calculate the allowed distance from the best performing experiment run) as 0.1 .

AutoML run with experiment_timeout_minutes(time after which experiment is timed out), model_explainability(best model is explained), compute_cluster(multiple runs at a time) for automl run. The task is a classification(binary) task as we are trying to predict presence or absence of heart disease. I selected the primary metric as accuracy as the dataset is balanced.

Deployed the best model using Azure Container Instance (ACI) and enabled Application Insights.

Challenges we ran into

What to do will NA values?
Ended up removing them

Which model to use? How to tune hyperparameters?
Used LogisticRegression model and hyperdrive for parameter tuning. Also used automl service provided by Azure ML(to check for different models).

Accomplishments that we're proud of

The automl best run accuracy is 0.84870 and hyperdrive best run accuracy is 0.88888888. Deployed model with best accuracy (LogisticRegression model) using Azure container instance. Enabled Application Insights for the web service.

What we learned

Learnt about developing and deploying models with azure and services it provides - AutoMl , Application Insights

What's next for Heart Disease Prediction With Azure Machine Learning

The automl model can be improved by further exploring the automl config(like adding custom FeaturizationConfig)

The logistic regression model can be improved further by exploring different sampling techniques(grid sampling - grid sampling over a hyperparameter search space, bayesian sampling - tries to intelligently pick the next sample of hyperparameters, based on how the previous samples performed, such that the new sample improves the reported primary metric), early termination policy(Median stopping policy - based on running averages of the primary metric of all runs, Truncation selection policy - cancels a given percentage of runs at each evaluation interval)

Share this project:

Updates