IASBC - Insights and Solutions to Breast Cancer

Inspiration

Breast cancer remains a significant health concern in the modern world, and making treatment affordable for all is critical to ensuring greater survival rates. However, the average cost per patient for treatment services during the initial and late care phases are $35,000 and $76,100 respectively. Yet, notice how they are significantly more expensive specifically at the later stages of cancer. That's why we've created IASBC in order to detect breast cancer sooner when it is easier to treat. This is crucial for reducing health care spending---especially for low-income families---because Breast Cancers diagnosed at an early stage are much less expensive to treat than those diagnosed at a late stage. Even more than just reduced spending, IASBC also reduces the risk of breast cancer cells spreading to other parts of the body, ultimately preventing of continuation of breast cancer into the late stage.

What it does

IASBC is a machine learning model implemented on our user-interactive cloud webpage that takes in a user's breast cells' data (area mean, perimeter mean, etc.) to classify them as malignant (cancerous) or benign (non-cancerous) depending on the size parameters.

Then, if a user's cells were classified as cancerous, with our extensive research on different types of breast cancer treatment, their ideal consumption stages, and what specific characteristics of cells they are best suited for, IASBC returns an optimal and personalized treatment plan based on the user's input data.

How we built it

Reading Data and Preprocessing:

Our model first reads a CSV file of our breast cancer API dataset with a pandas DataFrame
Then, the target variable (diagnosis) is converted from categorical values ('Malignant' and 'Benign') to binary numeric values (1 and 0, respectively)
Predictors (data columns) are stored in X while the target variable is stored in y.

Feature Scaling and Setup:

Predictor values in X are normalized using StandardScaler.
Our dataset is then split into training and testing sets. The test set size is 75% of the entire dataset, while the random_state is set to 100 for reproducibility.

Model Training and Predictions:

Initialize our Logistic Regression model as LogReg.
Our model is then trained on the training data using the fit method with X_train and y_train passed in as arguments
Trained model is used to predict the target labels on the test set (X_test), and the predictions are stored in y_predict.

Challenges we ran into

We ran into many optimization challenges and technical bugs, but that is what programming is all about. While optimizing our originally constructed model to make it unique with greater efficiency, we spent significant time figuring out the optimal dataset of columns to train and predict on during our many test runs with different combinations. Even more challenging was learning how to create and host our own web app on Streamlit, as we aren’t too familiar with it.

Accomplishments that we're proud of

We are proud of so many things. We got to experience using Streamlit---an open-source app framework in Python---which is something that we weren't familiar with. After seeing how easy it is to implement and use, we definitely plan on using it more in the future. Additionally, we are proud of the fact that we combined all of our skills to not only construct an accurate machine-learning model but take it above and beyond by optimizing it. We also love the fact that we were able to document our hard work on a cloud webpage and assist users with our product by making it fully interactive for them. Lastly, we are proud of the amount of work we pulled off in the given time. We would have never thought we could accomplish this much in such a small amount of time.