Semi-supervised arrhythmia classification using capsules

Team Members

Andrei Petrus (apetrus)
Jason Hooker (jhooker2)
Varun Kumar(vkumar24)

Introduction

Arrhythmia is a form of cardiovascular disease - the leading cause of death globally as of 2019 - that refers to irregularities in the rate or rhythm of heartbeats. Prolonged arrhythmia can be fatal, so tools such as electrocardiograms (ECGs) are used to monitor heartbeats for any abnormalities. ECGs are non-invasive procedures which can now be performed by commonly-owned devices such as Apple Watches, and they’re most informative when arrhythmias are classified automatically. Several deep learning models have been built to address this problem, with the best achieving classification accuracy in the mid- to high-nineties. We will build a semi-supervised 2-D convolutional neural network with capsules to classify 18 categories of arrhythmia from ECG signals, a model architecture that to our knowledge has not been implemented before. Training a semi-supervised model will allow for the use of smaller labeled datasets, a common issue with medical data given privacy concerns and the labor required to manually label data. Transforming the one-dimensional ECG signals into two-dimensional images removes the need for noise filtering, which undesirably yet unavoidably removes relevant data. Using capsule instead of pooling will allow us to reduce the depth of our network while maintaining or improving performance. We seek to improve upon previous semi-supervised models with our novel architecture.

Related Work

The idea to use capsule architecture with CNN is based on the work by Sabour et. al (2017) that have been shown to improve the performance of regular CNN architectures in classification tasks significantly [1]. CNN as a tool has been used by other researchers for classification of ECG signals since use of machine learning for prediction of heart diseases has been gaining popularity over the years. Our idea to utilize capsule based CNNs for Arrhythmia classification is based on work by Jun et. al [2], who used the 2-D CNN network on MIT-BIH Arrhythmia data to classify eight different types of ECG beats. Their idea of converting a 1-D ECG signal into a 2-D image before using it with a CNN architecture is unique and enables the CNN model to extract relevant signals which are generally noisy and require other filtering algorithms during pre-processing. Using a 6-layer deep CNN architecture, the researchers were able to achieve a classification accuracy of up to 99%. The publicly available implementation of this architecture can be found here [3]. Our objective is to combine capsule based CNN on 2-D Arrhythmia data to achieve equivalent accuracy efficiently.

References:
[1] Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." Advances in neural information processing systems 30 (2017).
[2] Jun, Tae Joon, et al. "ECG arrhythmia classification using a 2-D convolutional neural network." arXiv preprint arXiv:1804.06812 (2018).
[3]https://github.com/daimenspace/ECG-arrhythmia-classification-using-a-2-D-convolutional-neural-network.

Data

We have chosen the MIT-BIH Arrhythmia Database for our model in part because it spans a heterogeneous sample of patients and pathologies but also due to its popularity within the deep cardiology circles, allowing us to easily benchmark our approach against more conventional networks. The source of the ECGs included in the MIT-BIH Arrhythmia Database is a set of over 4000 ambulatory recordings that were obtained by the Beth Israel Hospital from a mixed population of inpatients (60%) and outpatients (40%). 23 recordings were chosen at random from the set as a ‘control’, while another 25 were selected to include a variety of rare but clinically important phenomena that would not be well-represented by a small random sample. Each of the 48 records is slightly over 30 minutes long.

Each ECG signal will be split into non-overlapping segments of 3600 samples (10 seconds at 360Hz) and have attached to them the dominant label within the timeframe: 16 arrhythmia classes from the MIT-BIH database (AFib, AFL, APB, Bigeminy, Fusion, IVR, LBBBB, PR, PVC, RBBBB, SDHB, SVTA, Trigeminy, VFL, VT, WPW), one class for normal sinus rhythm (NSR) and one outlier class for noisy signal.

The signal segments are then converted to their respective spectrogram images and grouped into a training and testing set.

Methodology

Our network architecture will consist of multiple 2-D Convolution layers where the pooling layer will be replaced by a capsule architecture. For pre-processing, we will be converting the ECG signal into a 2-D ECG image. The dataset already contains the labels for each signal and will be used for cross entropy loss minimization during training. Additionally, during data preparation, class balancing will be used to create equal representation of all eight classes that we are classifying. Capsule based CNNs may be useful for ECG signal classification tasks due to the following reasons:

They detect orientation of features along with critical features in the data thus allowing one to deal with noisy data with minimal preprocessing. Noisy ECG data due to equipment noise is commonplace and current techniques require filtering of these signals before they can be used for classification tasks. Capsule based CNN can help achieve Equivariance, which is difficult to achieve with current CNN architectures.
The dynamic connection between lower level and higher level capsules enables better feature extraction.

One of the most challenging aspects of implementing a capsule based architecture is the passing of information between different capsule layers in order to extract the most relevant features. This is known as dynamic routing and is often the most complex part of this architecture. Also, it has been reported that these architectures could sometimes lead to reduced accuracy due to these dynamic connections between capsules. As a workaround, we also intend to investigate the use of semi-supervised training as outlined in [4]. This may help us improve accuracy with a simpler model architecture in case the objectives cannot be achieved with a capsule based CNN alone.

References:
[4] Zhang, Dedong. Semi-supervised learning for electrocardiography signal classification. Diss. 2018.

Metrics

We will train our network on a set containing 80% of the data in each class, whilst the rest is reserved for evaluating test accuracy. The number of epochs considered for training based on previous works is 20, with a maximum run time of 50 epochs expected for the loss to plateau. In testing, we will consider the following four evaluation metrics: Accuracy (Acc), Sensitivity (Se), Specificity (Sp), and Area Under the Curve (AUC).

Accuracy is a broad measure of the model’s ability to correctly determine correctness: the ratio between the sum of true positives and true negatives on the total number of inputs.

Acc = TP + TN / TP + TN + FP + FN

Sensitivity will determine the model’s ability to correctly identify signals pertaining to one arrhythmia class: the ratio between true positives and the sum of true positives and false negatives.

Sp = TP / TP + FN

Specificity will determine the model’s robustness towards misclassifying signals from an arrhythmia class to another: the ratio between true negatives and the sum of true positives and false negatives.

Sp = TN / TN + FP

Finally, Area Under the Receiver Operating Characteristics (ROC) measures the separability within the classes in a multi-classification model. The ROC is plotted with True Positive Rate (Sensitivity) against False Positive Rate (1-Specificity). Hosmer and Lemeshow provide the general guidelines for interpreting AUC values:

Excellent discrimination: 0.9 – 1.0 Good discrimination: 0.8 – 0.9 Acceptable discrimination: 0.7 – 0.8 Poor discrimination: 0.6 – 0.7 No discrimination: 0.5 – 0.6

Our baseline metrics are the evaluation results obtained from GoogleNet on the MIT-BIH Arrhythmia Database, but ideally, target to improve sensitivity by at least 5-10%. Given that Capsule Networks have been proven to make more specific feature comparisons in previous literature, we set our stretch goals to overcome the 80% sensitivity barrier.

Ethics

Why is Deep Learning a good approach to this problem?

Given the type of data used and the nature of this problem, deep learning is a strong approach in this scenario. Since we will be converting the ECG signals into 2-d images, we are working with an image classification problem - a type of problem to which we know deep learning is well-suited. Rather than trying to engineer features from which our model would learn ourselves, a neural network will handle this automatically. Additionally, we typically must consider an accuracy-interpretability tradeoff when choosing between classical machine learning and deep learning models. The complexity of neural networks often makes them more accurate, but at the expense of reduced interpretability as compared to classical machine learning models. When classifying arrhythmias from ECG signals, we care most about making correct decisions. We are less concerned with how the model makes these decisions than we would be if, say, we were using a model to determine a person’s eligibility for a bank loan. For these reasons, deep learning is a suitable approach to this problem.

Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?

The major stakeholders in this scenario include the people taking ECGs, the companies or organizations who would implement our model to process ECG signals, and the medical providers of the ECG users. For people who regularly take ECGs to monitor their heart rhythms, confidence in the product they use to conduct and process their ECGs is paramount. They must feel assured that if potentially harmful abnormalities in their heart rhythms arise, they will be detected, properly classified and reported back to them in a timely manner. Organizations implementing our model with their ECG products share the same priorities - customer safety and satisfaction is essential, so they value near certainty that no harmful arrhythmias will go undetected. The medical providers of the ECG users are best able to do their jobs when harmful arrhythmias are caught sooner rather than later or not at all, so they, too, would prioritize detection of such irregularities. With all this being said, the consequences of mistakenly classifying irregular heart rhythms as normal can be more dire than incorrectly classifying normal rhythms as abnormal. The measures of the model’s ability to prevent these mistakes are sensitivity and specificity, respectively. Mistakes of the former variety could be fatal, while those of the latter may lead to unnecessary visits to medical providers. We’d like to maximize sensitivity, while also achieving as high a specificity as possible so as not to strain the healthcare system unnecessarily or place financial burdens on the ECG users.