Predicting Response to Cancer Immunotherapy Using automatic & Unsupervised Feature Extraction From Chest CT Scans (AI in Medical Imaging Track, Challenge #1)
Inspiration
Most of the current AI algorithms in the Medical Imaging domain are governed by labeled data. This is a costly process, both in economical and time terms. Quite often, data labeling requires radiologists' expertise, while radiologists might be busy with clinical work. Especially with the aim of detecting the response of the treatment prior its administration, labeling for research could put at risk some patients. All these factors may prevent the advent of new advances in AI research that could be translated into future improvements in clinical treatment and/or diagnosis.
Moreover, it is difficult to deal with clinical image data, having millions of voxels per acquisition. This represents an extra challenge for the clinicians to extract relevant biomarkers from images.
Consequently, we have decided to build a fully-unsupervised model for automatic feature extraction from high-dimensional image data that can be integrated with other non-imaging metadata. This model could facilitate the path towards a personalized diagnostic or treatment, without the need for manual data labeling. The main example where this will be applied is Chest CT of lung cancer patients treated with immunotherapy, where not so many biomarkers are known.
What it does
Automated and fully-unsupervised feature extraction from chest CT images, without any need for human labels, with Variational AutoEncoders (VAEs), which can be combined with non-imaging metadata.
How we built it
We have used an open-source dataset from Kaggle (https://www.kaggle.com/kmader/siim-medical-images), consisting of 100 CT slices from 65 lung cancer patients treated with immunotherapy, just as a preliminary dataset. The CT slices in that dataset have been preprocessed by:
- Setting a common resolution to all slices of 1.2mm, together with a common matrix size of 256x256 pixels
- Manipulating image contrast with a technique called Contrast-Limited Adaptive Histogram Equalization (CLAHE), using a window size of 4 pixels. This allows to better visualize lung regions that could be affected by lung tumors in our dataset.
- z-scoring all images (mean subtraction + division with the standard deviation)
All images have been additionally augmented to minimize overfitting with the preliminary dataset. Random transformations consisting of elastic deformations, flips, rotations from -60 degrees to + 60 degrees, contrast and brightness changes from -2% to +2% have been applied during augmentation.
The chosen model for automatic feature extraction from these data has been a Convolutional Variational AutoEncoder or VAE. It consists of an encoder that filters and downsamples the data into a low-dimensional representation called "Latent Space", and a decoder that learns to reconstruct the original data from the low-dimensional Latent Space, by further upsampling and filtering the data in the latent space. Non-imaging metadata can be included in the latent space to improve the reconstruction provided by the Decoder. The loss function driving the learning process consists of a Mean-Squared Error (MSE) term, together with a regularizer in the Latent Space that tries to approximate the space to a normal distribution, known as Kullback-Leibler Divergence term.
The basic architecture of the VAE contains five symmetrical layers for downsampling and upsampling (four convolutional and one linear), either with an increasing or a decreasing number of hidden units. The learnable kernels have a size of five. Regularization approaches as batch-normalization and L2 penalties of 1e-5 were applied, too. The ideal size of the latent space was also studied. The inclusion or not of external metadata in the latent space was investigated as well. The application of transfer learning with a VGG-11 encoder pre-trained on Image-Net was also attempted. Data were split into a random 85% for training and 15% for testing.
The configuration minimizing the MSE-Kullback Leibler loss with the closest reconstructions and the better tuned latent space consists of a VAE with four layers, with 64/32/16/8 hidden units per layer, a latent space size of four, and without the inclusion of transfer learning nor external metadata in the latent space.
Challenges we ran into
All the VAE configurations that were tested did quickly converge to a stable loss term while training, no matter the length of the epochs applied. All CT slices in the test set were reconstructed into a similar image showing something similar to the lungs and the heart. This might be caused by the small size of the preliminary dataset that was used. We hypothesize that the use of deeper models with more layers and in larger datasets could alleviate this problem, getting a more suitable latent space and reconstructed images.
Accomplishments that we're proud of
We are proud of having worked with unsupervised learning for Medical Image data, where not so much progress has been done in comparison to supervised learning research. Even if our results on lung CT scans are not so great, this preliminary work could help in the elaboration of future unsupervised approaches that reduce the effort needed to produce new Medical Image models. This would facilitate the automatic extraction of low-dimensional features that could be combined with external metadata to supply a personalized diagnosis or treatment.
What we learned
Even if unsupervised approaches seem to be more challenging to apply to clinical data, there is a need to continue working with these approaches, as they could simplify data processing without the need for human labeling, eventually helping to supply better care for the patient. This was just a preliminary study on the potential of fully-unsupervised architectures as VAEs to extract relevant information in a clinical setting, given the short amount of time given.
What's next for The AI Team
We would like to improve our model, exploring the inclusion of more layers, improving the algorithm with other external and larger datasets, and having a more aggressive augmentation that would allow better reconstructions and more suitable latent spaces. It would also be interesting to work with 3D versions of the autoencoder, where depth information also plays an important role. This would increase the chances for the algorithm to be applied in the future in a clinical setting.

Log in or sign up for Devpost to join the conversation.