Lung-Disease Prediction
Changing healtcare industry.
View the demo ยป
About The Project
An intelligent platform to predict disease of chest x-rays.
This platform makes use of a machine learning algorithm capable of tracking and detecting diseases. Artificial Intelligence (AI) has emerged as one of the most disruptive forces behind digital transformation that is revolutionizing the way we live and work. This applies to the field of healthcare and medicine too, where AI is accelerating change and empowering physicians to achieve more
Resources used in the project
National Institutes of Health (NIH) chest x-ray dataset. This dataset is a publicly available and medically curated dataset. Technique
State-of-the-art DenseNet for image classification. DenseNet is an open-source deep learning algorithm with implementations available in Keras (using TensorFlow as a back-end). We also explored the PyTorch version of DenseNet.
Class Activation Maps are used to understand model activation and visualize it.
Motivation
Some facts:
- Two-thirds of the world's population lacks access to trained radiologists, even when imaging equipment is readily available.
- The lack of image interpretation by experts may lead to delayed diagnosis and could potentially increase morbidity or mortality rates for treatable diseases like pneumonia.
- Approx. 2.5 million people die from lung diseases.
Built With
With a lot of love ๐, motivation to help others ๐ช๐ผ and Python ๐, using:
- Pytorch
- Google Colab (with its wonderful GPUs)
- A real-time Flask and Dash integration (along with Dash Bootstrap Components)
- A real-time database, of course, from Firebase
- Vercel (hosting repository)
- Angular 10
Inspired by the CheXNet work done by Stanford University ML Group, we explore how we can build a deep learning model to predict diseases from chest x-ray images.
Usage
Data Exploration
We use a labelled dataset that was released by the NIH. The dataset is described in this paper, and you can download it from here. It includes over 30,805 unique patients and 112,120 frontal-view X-ray images with 14 different pathology labels (e.g. atelectasis, pneumonia, etc.) mined from radiology reports using NLP methods such as keyword search and semantic data integration. The NIH-released data also has 983 hand-labelled images covering 8 pathologies, which can be considered as strong labels.
Model Training
Deep neural networks are notoriously hard to train well, especially when the neural networks get deeper. We use the DenseNet-121 architecture with pre-trained weights from ImageNet as initialization parameters.
This allows us to both pass the gradient more efficiently and train a deeper model. This architecture alleviates the vanishing-gradient problem and enables feature map reuse, which makes it possible to train very deep neural networks.
we used the AUROC score to measure the performance for the diseases by selecting the model with the lowest validation loss.
| Disease | AUC Score | Disease | AUC Score |
|---|---|---|---|
| Atelectasis | 0.689804 | Effusion | 0.769636 |
| Cardiomegaly | 0.699429 | Consolidation | 0.725847 |
| Infiltration | 0.655084 | Edema | 0.817075 |
| Mass | 0.601279 | Emphysema | 0.603675 |
| Nodule | 0.571633 | Fibrosis | 0.660121 |
| Pneumonia | 0.634000 | Pleural_Thickening | 0.650140 |
| Pneumothorax | 0.677171 | Hernia | 0.647572 |
What's next for Disease Prediction using X-RAY
- Develop a phone application that can recognise the diseases
- Improve user interface for the angular web app
- Partner with doctors to build a real-world chest x-ray database.
- Test prototype with a Radiologist
Challenges
Early diagnosis and treatment of pneumonia and other lung diseases can be challenging, especially in geographical locations with limited access to trained radiologists.
Database limitations
There are several limitations of the dataset which may limit its clinical applicability or performance in a real-world setting. First, radiologists often interpret chest x-rays acquired in two projections, frontal and lateral, which aids in both disease classification and localization. The NIH dataset we used in this blog only provides frontal projections (PA and AP). Second, clinical information is often necessary for a radiologist to render a specific diagnosis, or at least provide a reasonable differential diagnosis.



Log in or sign up for Devpost to join the conversation.