Inspiration

When presented with the prompts for the Datathon, we found that if we were able to successfully create a project around skin cancer, it would be the most importantly impactful. Dataset: International Skin Imaging Collaboration. SLICE-3D 2024 Challenge Dataset. International Skin Imaging Collaboration https://doi.org/10.34970/2024-slice-3d (2024).

What it does

Our website lets the user upload a picture of their skin lesion and uses machine learning to predict the probability that the lesion is malignant.

How we built it

We trained a CNN model to predict likelihood of an uploaded lesion picture being malignant. and then we made a front end for a website where people can upload an image and see a predicted chance of maligant skin cancer. For the backend we used FastApi to connect the frontend to the CNN model.

Challenges we ran into

Our biggest challenge was that the data we were working with was unbalanced, with approximately 1000 benign data points for every malignant point, meaning for the initial trainings, our model learned to predict only zeros. We handled this problem by oversampling the malignant data to reach a 10:1 ratio during the training process and increasing the batch size from 32 to 100 so every batch had a high probability of containing at least one malignant sample. For the website, it was difficult to link the back-end and front-end since we were not previously experienced with doing so. Additionally, we were struggling with coding collaboratively, especially since we were dealing with 2 GB of image data which was not easily uploadable to most platforms built for cloud collaborative coding (such as DeepNote).

Accomplishments that we're proud of

We are proud of being able to create a functional website within our short time span, as initially we thought it would take a while. We also created another model on the side based on just the metadata, which is not able to be applied to the website because we would not be able to get metadata from an uploaded method, but it was beneficial to our learning.

What we learned

As for skills, we learned how to connect front-end and back-end in website development and new models of machine learning (CNN, Random Forest). We also learned things such as how to live collaborate on code with. We learned firsthand about the difficulties of training a model with an unbalanced dataset.

What's next for Skin Cancer Prediction With CNN

If somehow we can get more image data (namely, for malignant cases), we can get better training for our model. Currently, we are duplicating the malignant data so that the model does not constantly predict 0, which means our prediction is likely overfitting. One other approach is writing a custom loss function to punish false negatives significantly more than false positives during gradient descent. As for the front-end of the website, we could add more resources and information about skin cancer so it could be more useful.

Built With

Share this project:

Updates