An example of one CT scan of a patient diagnosed as lung cancer within one year after the scan.

Lung Cancer Visualization & Detection

This project is the Kaggle Data Sicence Bowl 2017

Why do it?

The air quality is becoming worse globally, millions of people are breathing excessive aerosols everyday. Early diagnose of lung cancers can help control the damage caused by polluted air. Traditional diagnosis are made by experienced doctors who visually identify malignant lesions in CT scans of patients' lungs. This makes it slow, expensive, and inaccessible to many people.
Everybody likes something cool in a hackathon, but it is also fun to try something that may have a larger impact on people. Kaggle is a great way to learn for people not from computer science background (like myself). This project will be even slower if not for the tutorials posted by Guido Zuidhof here and by Jonathan Mulholland and Aaron Sander here.

Where do the data come from?

CT scan images with diagnose information are made available by National Lung Screening Trial. My goal is to use these labelled images to train machine learning models to diagnose lung cancer.

What are the challenges?

The images are in special format and the resolutions of the images can be different for different patients. I've finished the preprocessing of these images, so that 3D arrays of consistent scales and sizes are produced and fed to 3D convolutional neural networks.
The size of the training images are too big to fit in the memory (~150 GB in numpy float array format). Used keras image data generator.
The training of 3D convolutional neural networks is computational expensive and did not finish before the end of McHacks.

What's next?

Will train 3D conv nets on small batch flows;
Will use U-net for segmentation to reduced the data size that enters the 3D convolutional nets;
Will try recurrent 2D conv nets.

Built With

keras
matplotlib
pydicom
python
scikit-learn
skimage
tensorflow

Updates

Qi Feng started this project — Jan 29, 2017 05:09 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.