Breast Cancer Data Challenge

Inspiration

Breast cancer is one of the most common cancers affecting women worldwide. Early detection through screening and diagnostic imaging (e.g., mammograms, biopsies) significantly improves treatment outcomes.

What it does

This machine learning model helps classify whether breast cancer tumors are malignant (cancerous) or benign (non-cancerous). During exploratory analysis it showed that the data was dense and binary-class For classification models, accuracy and F1 score would be our key metrics.

How we built it

It was built using python and done using google co-lab. Data exploration was first performed to understand the different aspects of the dataset. It showed it was a binary-classification problem. It also showed there was some data imbalance but it was not significant. Since the data had so many potential predictors balancedRandomclassifier was used to select key features. The data was then split into train and test and then multiple machine learning models were applied. A stacked ensemble model was used and this gave the best result.

Challenges we ran into

The main challenge was testing and experimenting with different machine learning models.

Accomplishments that we're proud of

Was able build a fairly robust ML model with an accuracy and f1 score of 95% respectively.

What we learned

Doing exploratory analysis is key as it helped uncover the quality of the data. This helped in shaping which the ML modelling process

What's next

In future the classification model could be incorporated in a web app which health practitioners can use to potential predict the tumour type. Could also incorporate AutoML/pycaret.

Built With

google-colab
python

Updates

Ed U started this project — Oct 22, 2025 04:06 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.