Cancer as Anomaly

Note: This development is a comparison study, not recommended for its medical use

Inspiration

To diagnose cancer, doctors need to analyze a suspicious tissue to know if it is malignant or not. State of the art machine learning methods can help with this task by learning discriminatory rules from data labeled by experts. But the lack of enough malignant tissue data and the lack of understanding about cancer represents a limitation for cancer detection using discriminatory methods. For this reason, in this project, I present an anomaly detection approach for breast cancer detection.

What it does

In the project, I predict breast cancer from histopathology images using both discriminatory (Azure Custom Vision) and anomaly detection (own developed architecture) AI methods. The breast cancer detection service is offered as a web app for both prediction approaches. The discriminatory AI approach has a +90% accuracy while the anomaly detection AI-based approach has a 73% accuracy. It is important to note that the discriminatory approach is trained from both benign and malignant tissue images while the anomaly detection AI-based approach is exclusively trained on benign data.

How I built it

I had the idea of detecting malignant breast cancer from healthy tissue data. The reason for this is that common discriminatory AI approaches such as CNNs (Convolutional Neural Networks) used for classification tasks, has a limitation on the quantity of data for training tasks; as well, cancer is still not well understood. On the other side, autoencoders overcome these limitations since can be trained with only benign images. The data that I used for the development was requested to the authors of this paper:

Spanhol, Fabio A., et al. "A dataset for breast cancer histopathological image classification." IEEE Transactions on Biomedical Engineering 63.7 (2015): 1455-1462.

For the first anomaly detection approach, I tried using the Azure Machine Learning Studio service for PCA anomaly detection, here I used the images pixels data as input columns. After having no success in those tasks I decided to try an autoencoder neural network architecture. I used Keras to build a model that includes convolutional, pooling and upscaling layers. The main idea of this approach is to train an autoencoder with benign images for their reconstruction while measuring the reconstruction error. Below is an image of how the autoencoder approach works.

The hypothesis is that the autoencoder model will not be able to properly reconstruct a malignant image sample and a reconstruction error threshold can be used for classification purposes. For the training tasks, I used the Azure Machine Learning service where I designed multiple screening experiments. These experiments included training operation with grayscale, color and different zoom images. Multiple autoencoder architectures were tested for the developments including autoencoders for image completion and transfer learning approaches. An idea of the efforts invested in this development is exemplified in the image below.

On the other side, I used Azure Cognitive Custom Vision services to implement a images classification model. For this task, I uploaded a sample of the same images used in the autoencoder approach. Finally, I deployed both approaches to build an example WebApp built with Vue.JS where you can perform breast cancer anomaly detection (not intended for its medical usage). Below is an image of how this application works.

Challenges I ran into

I faced multiple challenges especially on how to evaluate the reconstruction error of the trained autoencoders. As the reconstruction error is a probability distribution, I decided to test different data distribution properties such as the mean, kurtosis, skewness, sum of absolute error, etc. Depending on the architecture the best performance was reflected on different properties until finally, the sum of the absolute error has shown the best performance. Another challenge was to leverage the available free credits in the Azure Machine Learning service for model training since I needed to test multiple configurations.

Accomplishments that I'm proud of

I am proud of achieving a good accuracy in the anomaly detection results and revealing understandable reconstruction errors in malignant histopathologies. Below is shown the performance of the best model achieved so far.

What I learned

I learned about how easy is to organize machine learning experiments in the Azure Machine Learning service - it is easy to follow a scientific method on the proposed interface. As well I learned a lot about autoencoders architecture design and the factors that influence their learning.

What's next for Cancer as Anomaly

I would like to use Azure AutoML for hyperparameters optimization and contact other universities or research centers to implement the Cancer as Anomaly developments on other types of cancer.

Built With

Updates

Horacio Canales started this project — Sep 09, 2019 12:15 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.