Inspiration

We chose to create the BCP because of our friends and families who have experienced breast cancer. When we first came across the Alliance Medical Innovation Challenge, the first thing that came into our minds was breast cancer, since around one in ten women develop breast cancer in their lives. The question that we needed to answer was how we could use artificial intelligence to help those with breast cancer.

What it does

The BCP is a Python-based application using a pre-trained binary classification model to analyze patients' data and tumours. The analysis incorporates eight key data points: patient age, menopausal status, tumour size in centimetres, presence of invasive or lymph nodes, tumour location/quadrant, presence of metastasis, and history of previous breast cancer. The application then predicts whether the tumour is malignant or benign, meaning if the tumour is cancerous or not. The training model has a general accuracy of 91.7% and only has a 2% chance of producing a false negative.

How we built it

The classification model was built using the Random Forest Classifier from the sklearn python library. This model was chosen as it provided the highest accuracy when complete. We trained this model using a breast cancer dataset of over 200 entries containing data from various cases in the years 2019 and 2020. Once the model was complete, we used a Python library called Joblib to compile the model into a downloadable file. From there we decided to use the Tkinter python library to make the application as it provides a quick and easy way to create an application that takes inputs from a user.

Challenges we ran into

The main challenge we ran into was turning the model into an application. Even though we are both primarily Python developers, it was our first experience creating an app, so there were plenty of hardships learning what worked and what didn't.

Another challenge that stumped us at the beginning was finding a suitable dataset that we could use. The problem with many of the public data sets is that there are not enough entries for a model to be properly trained. On the contrary, the data sets that had more information were priced at a premium. It took us several days to find a suitable set of data that we could use to train a model.

Accomplishments that we're proud of

Many AI-based solutions that currently exist in the medical field are tailored towards administration work and online medical filing. Using AI to predict whether a tumour is cancerous is a niche application that is not currently seen in mainstream hospitals.

As well, false negatives are the worst-case scenarios, as it could mean that the patient could overlook a cancerous tumour. We are happy that the model not only has a 91.7% accuracy but also rarely predicts false negatives, which was no easy feat.

What we learned

We learned how to apply the engineering design process to develop an early idea into a functioning application through organized documentation and software development.

What's next for Breast Cancer Predictor (BCP)

  • Expanding the types of data that can be input such as measurements of the breasts or even image scans. This can be done by integrating a variety of different models that are trained on these different inputs.
  • Improving the UI of the application

Built With

Share this project:

Updates