Added images to show the workflow as I forgot to record a step-by-step visual demo of working application
Improve health care using Machine Learning. #AIforgood #AIinHealthcare
What it does
The goal is to the find presence/absence of heart disease in the patient. I got the data from UCI Repository. I used Cleveland database. It contains 14 attributes, namely, age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal, num (the predicted attribute). The num attribute is integer valued from 0 (no presence) to 4. We concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
How we built it
We register the dataset in Azure and then created compute cluster and ran automl and hyperdrive runs. Hyperparameter tuning using logisic regression model with hyperparameters C(Inverse of regularization strength. Smaller values cause stronger regularization) and max-iter(Maximum number of iterations to converge) and used RandomParameterSampling with params max_iter(can have values 100,200,300,400) and C (can have 0.001, 0.01, 0.1, 1, 10, 100, 1000) and Bandit Policy with evaluation_interval(The frequency for applying the policy) as 2 and slack_factor(The ratio used to calculate the allowed distance from the best performing experiment run) as 0.1 .
AutoML run with experiment_timeout_minutes(time after which experiment is timed out), model_explainability(best model is explained), compute_cluster(multiple runs at a time) for automl run. The task is a classification(binary) task as we are trying to predict presence or absence of heart disease. I selected the primary metric as accuracy as the dataset is balanced.
Deployed the best model using Azure Container Instance (ACI) and enabled Application Insights.
Challenges we ran into
What to do will NA values?
Ended up removing them
Which model to use? How to tune hyperparameters?
Used LogisticRegression model and hyperdrive for parameter tuning. Also used automl service provided by Azure ML(to check for different models).
Accomplishments that we're proud of
The automl best run accuracy is 0.84870 and hyperdrive best run accuracy is 0.88888888. Deployed model with best accuracy (LogisticRegression model) using Azure container instance. Enabled Application Insights for the web service.
What we learned
Learnt about developing and deploying models with azure and services it provides - AutoMl , Application Insights
What's next for Heart Disease Prediction With Azure Machine Learning
The automl model can be improved by further exploring the automl config(like adding custom FeaturizationConfig)
The logistic regression model can be improved further by exploring different sampling techniques(grid sampling - grid sampling over a hyperparameter search space, bayesian sampling - tries to intelligently pick the next sample of hyperparameters, based on how the previous samples performed, such that the new sample improves the reported primary metric), early termination policy(Median stopping policy - based on running averages of the primary metric of all runs, Truncation selection policy - cancels a given percentage of runs at each evaluation interval)