Beneath the Surface: Predicting Skin Lesions with ML

Inspiration

This project was inspired by the need for early detection of malignant skin lesions, particularly in settings where access to dermatological care is limited. Using a dataset of 3D total body photography (TBP), we developed a machine learning workflow to explore missingness at the lesion level and evaluate how different features contribute to malignancy prediction.

What it does

Slice 3D is a logistic regression-based model designed to predict whether a skin lesion is benign or malignant. By leveraging lesion-level features—including color (hue, chroma), contrast, size, and location—it supports the triage process in clinical workflows and aims to identify high-risk lesions for further inspection.

How we built it

We began by cleaning the data and selecting variables with the lowest missingness. We applied undersampling to address the severe class imbalance, ensuring our model could learn effectively from the minority class (malignant lesions). We used a random forest for feature selection, guiding us in prioritizing variables for logistic regression. Our final model was evaluated using AUC and tested on the full dataset.

Challenges we ran into

The dataset was heavily imbalanced and contained numerous missing values. We had to carefully consider which features were usable, and balancing the dataset via undersampling meant reducing data size, potentially sacrificing information. Evaluating model generalizability outside of the undersampled context was another key challenge that we creatively navigated.

Accomplishments we’re proud of

We built an interpretable logistic regression model with an AUC of 0.737, trained on the undersampled dataset and tested on the full dataset to evaluate generalizability. Despite training on only 1,300 observations, the model generalized well to over 400,000 cases.

What we learned

We deepened our understanding of handling imbalanced clinical datasets, the impact of feature completeness, and how to use random forest outputs to inform model design. We also gained experience evaluating models beyond accuracy, using metrics like AUC, sensitivity, and specificity.

What’s next?

Next, we plan to integrate more domain knowledge into the feature selection process, explore calibration techniques, and further investigate how variables like tbp_lv_dnn_lesion_confidence—a neural net-based score—relate to model confidence versus diagnosis. Ultimately, we aim to build a reliable, deployable tool to support dermatologists in making early, accurate assessments.