{ON} · {CO} · {BOT}

Oncology meets machine learning.


We were interested in creating a project that uses machine learning and computer vision.

Using computing to get instant, concise results from CT scans can free up doctors' time and ultimately lower healthcare costs for the patient.

What it does

oncoBot is a machine learning algorithm, that uses computer vision to identify lung nodules in CT scans of lungs.

Using our web interface, doctors can submit a CT scan and get instant data on the locations of potential lung nodules, and the probability they are cancerous.

How we built it

The project was built using OpenCV, Python, Flask, and C++.

We use cloud computing on a Linode server to run our machine learning algorithms.

Our training set is made up of a detailed study done at Cornell containing roughly 250 CT scans of healthy and cancerous lungs. Each lung scan corresponds to detailed information—such as size of nodule, location of nodule, benign and malignant tumor, etc.--

We use openCV's SIFT feature detection and k-means clustering on each of the Cornell CT scans in combination with each images' details to form an accurate learning set.

Our user login and image upload interface runs on a flask server (also hosted on our Linode server), allowing doctors to register and login to a user database and securely upload their images.

The CT scan is processed on the cloud and compared to our training set (currently made up of 250 images from a Cornell study: http://www.via.cornell.edu/databases/)

Challenges we ran into

The dataset of CT scan images existed only in an image format specific to the medical industry. This required us to write several batch image processing scripts to standardize the several hundred images before we could start processing them.

Firebase does not support file uploads, so we had to switch over to running a flask server to implement our user interface.

Writing machine learning algorithms from scratch is a challenging feat but we managed to get an above 70% accuracy rate.

Some of openCV's functionality was removed in recent versions. Finding open source algorithms that could handle all of the computer vision operations we needed was difficult.

What we learned

None of us were particularly well-versed in how to build and run a server, so using Flask and managing a cloud computer was a new experience for us.

Machine learning algorithms can still be effective even with small data sets as we got meaningful results (accuracy > 70%) with a data set of 250 images.

What's next for oncoBOT

Obtaining private medical data is difficult because of the sensitive nature of all patient info. We would like to increase the size of our image database to raise our accuracy. We would also like to expand oncoBot to other forms of cancer. Breast cancer and skin cancer are very good candidates.

Share this project: