Lung-Disease Prediction

Changing healtcare industry.
View the demo ยป

About The Project

Disease Prediction

An intelligent platform to predict disease of chest x-rays.

This platform makes use of a machine learning algorithm capable of tracking and detecting diseases. Artificial Intelligence (AI) has emerged as one of the most disruptive forces behind digital transformation that is revolutionizing the way we live and work. This applies to the field of healthcare and medicine too, where AI is accelerating change and empowering physicians to achieve more

Resources used in the project

  • National Institutes of Health (NIH) chest x-ray dataset. This dataset is a publicly available and medically curated dataset. Technique

  • State-of-the-art DenseNet for image classification. DenseNet is an open-source deep learning algorithm with implementations available in Keras (using TensorFlow as a back-end). We also explored the PyTorch version of DenseNet.

  • Class Activation Maps are used to understand model activation and visualize it.


Some facts:

  • Two-thirds of the world's population lacks access to trained radiologists, even when imaging equipment is readily available.
  • The lack of image interpretation by experts may lead to delayed diagnosis and could potentially increase morbidity or mortality rates for treatable diseases like pneumonia.
  • Approx. 2.5 million people die from lung diseases.

Built With

With a lot of love ๐Ÿ’–, motivation to help others ๐Ÿ’ช๐Ÿผ and Python ๐Ÿ, using:

Inspired by the CheXNet work done by Stanford University ML Group, we explore how we can build a deep learning model to predict diseases from chest x-ray images.


Data Exploration

We use a labelled dataset that was released by the NIH. The dataset is described in this paper, and you can download it from here. It includes over 30,805 unique patients and 112,120 frontal-view X-ray images with 14 different pathology labels (e.g. atelectasis, pneumonia, etc.) mined from radiology reports using NLP methods such as keyword search and semantic data integration. The NIH-released data also has 983 hand-labelled images covering 8 pathologies, which can be considered as strong labels.

Model Training

Deep neural networks are notoriously hard to train well, especially when the neural networks get deeper. We use the DenseNet-121 architecture with pre-trained weights from ImageNet as initialization parameters.

This allows us to both pass the gradient more efficiently and train a deeper model. This architecture alleviates the vanishing-gradient problem and enables feature map reuse, which makes it possible to train very deep neural networks.

we used the AUROC score to measure the performance for the diseases by selecting the model with the lowest validation loss.

Disease AUC Score Disease AUC Score
Atelectasis 0.689804 Effusion 0.769636
Cardiomegaly 0.699429 Consolidation 0.725847
Infiltration 0.655084 Edema 0.817075
Mass 0.601279 Emphysema 0.603675
Nodule 0.571633 Fibrosis 0.660121
Pneumonia 0.634000 Pleural_Thickening 0.650140
Pneumothorax 0.677171 Hernia 0.647572

What's next for Disease Prediction using X-RAY

  • Develop a phone application that can recognise the diseases
  • Improve user interface for the angular web app
  • Partner with doctors to build a real-world chest x-ray database.
  • Test prototype with a Radiologist


Early diagnosis and treatment of pneumonia and other lung diseases can be challenging, especially in geographical locations with limited access to trained radiologists.

Database limitations

There are several limitations of the dataset which may limit its clinical applicability or performance in a real-world setting. First, radiologists often interpret chest x-rays acquired in two projections, frontal and lateral, which aids in both disease classification and localization. The NIH dataset we used in this blog only provides frontal projections (PA and AP). Second, clinical information is often necessary for a radiologist to render a specific diagnosis, or at least provide a reasonable differential diagnosis.

Share this project: