Inspiration
For this project, I wanted to use my fledgling machine learning skills in order to solve a problem in the medical industry. According to the World Cancer Research Fund, Breast Cancer is the most common cancer in women, and the second most common type of cancer overall. I decided to try to classify breast tumors into cancerous and non-cancerous. I chose breast cancer because there is lots of data for it because it’s so common. My aim with this project was to train both a logistic regression classifier and a neural network on a dataset of various breast tumors to develop a way to accurately and confidently predict whether a tumor is benign or malignant.
What it does
Given certain attributes of a breast tumor, it predicts whether the tumor is cancerous or not using two different models: a neural network and a logistic regression classifier.
How I built it
First, I formatted the data, removing attributes that didn’t correlate with being malignant or benign. To do this, I plotted histograms of each attribute for malignant and benign tumors. Then, I trained two logistic regression classifiers, which plot the data in many dimensions and try to find a complex surface that separates the benign tumors from the malignant ones. I trained both linear and polynomial logistic regression classifiers. I also tried training a neural network, which is able to learn more complex prediction equations and is modeled after the human brain.
Accomplishments that I'm proud of
I trained the logistic regression classifier using both linear and quadratic regression. Both produced the same quality results. The neural network produced many more false positives than false negatives, which is good because a false positive is better than a false negative. However, the logistic regression classifiers were more accurate overall.
Accuracy of logistic regression (linear): 97.54% Accuracy of logistic regression (quadratic): 97.54% Accuracy of neural network: 82.78%
What I learned
This increased my knowledge of neural networks and logistic regression classifiers, and my knowledge of Matlab in general.
What's next for Breast Cancer Prediction
One way to improve these algorithms would be to collect even more data on malignant and benign tumors to further increase accuracy. These algorithms are general and would work for other types of cancer as well, provided you have good data.
Log in or sign up for Devpost to join the conversation.