MelaninMed

Inspiration

I was almost a no_grad of high school. Torch.no_grad, that is. My final presentation as only 1 of 20 Research Interns at the Stanford Center for Artificial Intelligence in Medicine and Imaging was mere days away, and my team’s ultrasound analysis model was spitting out double-digit loss numbers.

After spending hours frustratedly debugging, we realized we forgot one key statement: Torch.no_grad.

Never again.

While that experience was certainly memorable, it wasn’t the defining moment of my internship at the Stanford AIMI Center. What truly shaped my journey was discovering AIMI’s open-access healthcare datasets. While browsing through them, I stumbled upon the Diverse Dermatology Images (DDI) dataset, and was shocked to learn that the implicit racial bias in healthcare extends to AI models. Technology should improve healthcare access, yet there's still a significant gap in AI-based skin classification for people with darker skin. I couldn’t believe that we were still failing to provide equitable healthcare services that worked regardless of skin color. It wasn't just a technical flaw; it was a social one.

The more I read, the more determined I became to make a difference.

That’s how MelaninMed was born: an intuitive mobile application designed to deliver rapid, accurate skin cancer detection for all skin tones. MelaninMed not only improves access to skin cancer diagnostics for underrepresented populations, but aims to set a precedent for equity in healthcare and technology. What started as a coding challenge has become a mission to ensure AI serves all communities equally.

What it does

Skin cancer is the most common cancer in the United States. Early detection and treatment of skin cancer is critical, increasing survival rates from 35% to 94%.

Artificial intelligence-based skin classification mobile applications offer a popular, cost-effective, and accessible way to identify skin cancer. However, almost all of these platforms are biased against darker skin, since dermatology data of dark skin is scarce. Only 0.45% of open-access dermatology data is from brown/black skin, and artificial intelligence-based skin classifiers have been proven to perform significantly worse on darker skin.

MelaninMed addresses the gap as the first AI skin classifier and mobile application developed on a diverse dermatology image dataset. MelaninMed integrates two machine learning models: a malignancy classifier (ResNet-18 with an attention mechanism) and a lesion diagnosis system (custom ViT adaptor with a SqueezeExcite block). The malignancy classifier has ~99% accuracy, sensitivity, and specificity. The lesion diagnosis system diagnoses 78 different skin conditions with 95% accuracy and sensitivity, as well as 99% specificity. There is no significant performance difference between skin tones.

These rapid, highly-accurate models are combined in an easy-to-use, visually appealing mobile application. Users can quickly upload a picture of their lesion and get a risk classification and diagnosis in seconds.

During onboarding of the application, users create an account, securely stored in Firebase, which links their scan history and personal data. They also set up notifications for suspicious lesion follow-ups. After logging in, the first tab, Home, details information about the application, scan history, and pending dermatology advice. The Profile tab allows users to edit personal information and access nearby dermatologists' ratings and websites. The third tab, Scan, integrates both models to allow users to receive a near instantaneous AI diagnosis for a submitted skin lesion. The fourth tab, Reading, includes a curated library of topical dermatology resources. Lastly, the Chat tab connects users with a state-of-the-art GPT-4-powered AI dermatologist for personalized advice.

With MelaninMed, diagnostic tools are available to all, regardless of skin tone. Access to early detection and life-saving treatment for skin cancer is streamlined, saving lives. With MelaninMed, healthcare is colorblind.

How we built it

Note: After creating the video demonstration, I integrated AWS Amplify services. I used Amplify to enable authentication, allowing users to sign in through platforms like Google. Additionally, I leveraged Amazon SageMaker to deploy my two custom-built machine learning models onto the cloud, enabling reliable, real-time predictions for users. SageMaker further supported model improvements by enabling hyperparameter tuning and fine-tuning for enhanced performance. With Amplify’s analytics capabilities, I can also track app feature usage and manage user engagement. I only discovered the services after my video presentation, and thus, did not demonstrate them.

Methods In order to be considered an alternative to traditional dermatology, MelaninMed must meet the following four criteria: (1) must possess an intuitive, user-friendly interface; (2) achieve accuracy metrics over 90%, focusing on diagnostic accuracy, sensitivity, and specificity; (3) provide rapid diagnoses in under 30 seconds; and (4) perform consistently across all skin types (Fitzpatrick Skin Types I-VI). This is achieved with a three part solution: a malignancy classifier to determine the malignancy of a lesion, a diagnosis system to classify a lesion’s potential condition, and an easy-to-use mobile application interface.

Data Sourcing and Preprocessing

In order to minimize racial bias in AI, we need to increase diversity in the data. By training the machine learning model on a balanced variety of skin tones, it is possible to minimize accuracy differences between skin tones. To address this, we used the first dataset intentional in including all skin tones, the Diverse Dermatology Images Dataset. All images have been deidentified and are publicly available. The preprocessing of the DDI dataset involved several steps. Initially, libraries such as PyTorch, PIL, pandas, and NumPy were imported into Google Colaboratory. Transformation parameters were defined, specifying the means ([0.485, 0.456, 0.406]) and standard deviations ([0.229, 0.224, 0.225]) for normalization. Following this, an image transformation pipeline was developed. It converted images to RGB format, resized them to 299x299 pixels, and lastly converted images to tensors. Additionally, a custom DDI_Dataset class was created, inheriting from torch.utils.data.Dataset.

For the first model, the malignancy classifier, images were filtered and sorted into two folders: malignant and benign. For the lesion diagnosis system, the 78 disease categories were encoded, assigning a unique numerical value to each label.

Malignancy Binary Classification

After preprocessing to reduce noise and increase uniformity in the ultrasound images, the DDI can be applied. The primary objective in this phase is to classify a detected lesion as malignant (cancerous) or benign (noncancerous).

To begin, a DataLoader (batch size 32) was instantiated to shuffle the data and avoid training bias. An attention mechanism with a custom AttentionLayer class was developed: relying on a residual connection for enhanced performance. This layer performs self-attention by applying convolution operations for query, key, and value, followed by softmax-based attention score calculation. The attention layer was then integrated into the ResNet-18 backbone to optimize the model for binary classification. The model leveraged cross-entropy loss functions and an Adam optimizer to minimize the difference between predicted and actual outputs. Training was completed over 13 epochs, each of which included forward and backward passes with gradient optimization. Subsequently, the ResNet-18 backbone extracted the important features and split them into training and testing sets. These features were standardized with a StandardScaler for the logistic regression model to evaluate the final classification accuracy.

Lesion Diagnosis System

The DDI was also incorporated into the lesion diagnosis system. The objective of this phase was to classify an identified lesion into one of 78 conditions. These conditions include abscesses, basal cell carcinoma, cystic acne, dermatofibroma, epidermal cysts, folliculitis, graft vs host disease, hematomas, hyperpigmentations, squamous cell carcinomas, and warts among others.

Similarly to the malignancy classifier, another DataLoader with multi-threaded loading was instantiated in the lesion diagnosis system. The core of the lesion diagnoser was a custom Vision Transformer Adaptor (ViT-B/16), adapted for 78-class classification. To enhance channel-wise attention, a SqueezeExcitation layer was implemented: effectively recalculating each channel’s importance. The ViT adaptor was trained over 28 epochs, integrating a cross-entropy loss function for multi-class classification and an Adam optimizer with a learning rate of 1e-4. Within each epoch, image batches were loaded, predictions were computed, loss was calculated, and backpropagation was applied to optimize model parameters. The most important features in classification were extracted from the trained ViT model, and output embeddings were obtained. These features were run through a logistic regressor for the final classification, and relevant accuracy metrics were calculated: detailed in the Results section.

Mobile Application

These two models were exported using CoreML and brought into XCode. MelaninMed was developed using SwiftUI, with backend support provided by Google Firebase to handle secure data storage and user authentication. Users create profiles to store both user and attached scan data securely in Firebase. Upon initial download, users encounter onboarding screens, which provide an introduction to the application’s purpose and instructions for creating a user profile. Users can also enable notifications at this stage.

After onboarding, users are directed to a login screen upon reopening the app. Home view, the main interface, provides an overview of the app's functionality, including a list of all the diagnosable conditions, and allows users to access their scan history, manage or delete past scans, and receive reminders to consult a specialist for potentially malignant lesions.

In the Profile tab, users can edit personal details and quickly access information on nearby dermatologists. They can either manually input an address, or, utilizing the Apple MapKit API, simply allow access to their current location. Once location information is received, the application automatically displays local dermatologists and links their website and rating information into the map marker.

The Scan view guides users through selecting the area of the body to be scanned and provides instructions to ensure a high-quality image. Using UIKit, users can capture and retake photos as needed. The AI models, detailed previously, then process the scan and output malignancy predictions and a potential diagnosis. This data is securely attached to the user profile in Firebase.

The Reading tab provides a curated library of topical dermatology resources.

Lastly, the Chatbot view integrates a state-of-the-art AI chatbot, powered by the GPT-4 API, which allows users to ask questions regarding treatment plans or diagnoses. The chatbot provides conversational guidance as an “AI Dermatologist”, enhancing user support within the app.

Results

Evaluation of Solution

The first tenet of a successful solution was an intuitive interface. This requirement is met with the easy-to-use, attractive mobile application, streamlining the AI diagnosis process.

The second tenet focused on obtaining accuracy metrics exceeding 90%. To assess the performance of the applications, several key metrics were evaluated, including overall accuracy, sensitivity, and specificity, with the aim of achieving values above 90%. Additionally, the false positive rate and false negative rate were assessed, with the goal of minimizing these rates to under 10%.

Accuracy Metrics of Malignancy Classifier | Metric | Value | |---------|--------| | Accuracy | 0.9937 | | Sensitivity | 1.0000 | | Specificity | 0.9894 | | False Positive | 0.0106 | | False Negative | 0.0000 | Accuracy Metrics of Lesion Diagnosis System | Metric | Value | |---------|--------| | Accuracy | 0.9452 | | Sensitivity | 0.9518 | | Specificity | 0.9991 | | False Positive | 0.0009 | | False Negative | 0.0482 |

As accuracy, sensitivity, and specificity were over 90% for both models, the second tenet of a successful solution is satisfied. The model is highly accurate and viable in a clinical context.

The third tenet was being able to provide rapid diagnoses, under 30 seconds. To validate that the speed requirement was met, 30 trials were run. In each trial, a randomly selected image was submitted and screen-recorded, capturing the time taken from pressing the "diagnose" button to receiving the result. These recordings verified the application’s ability to provide rapid diagnoses, consistently showing response times under 1 second, even with the app loading additional elements for prediction display.

Lastly, the fourth tenet of a viable implementation is similar model performance among all skin types. To evaluate, 36 images of FST 1-2, 3-4, and 5-6 were deliberately excluded from the model to serve as test data. In a series of 36 trials, the model’s performance was assessed, where any incorrect classifications from either the malignancy classifier or the diagnosis were recorded as failures. The results were statistically analyzed using the Kruskal-Wallis test, yielding an H statistic of 1.1428 and a p-value of 0.5647, indicating no significant difference in accuracy across the skin types tested. The findings, presented in Table 3, demonstrate consistent performance among all skin types, confirming that MelaninMed meets the established fourth criteria and is an equitable alternative to current AI skin detection mobile applications.

Accuracy Stratified by Skin Type | Fitzpatrick Skin Type | Total Images | Correct (Trial 1) | Correct (Trial 2) | Correct (Trial 3) | |-----------------------|--------------|--------------------|--------------------|--------------------| | FST 1-2 | 36 | 36 | 35 | 36 | | FST 3-4 | 36 | 36 | 36 | 36 | | FST 5-6 | 36 | 36 | 36 | 35 |

By satisfying all four criteria, it is evident that MelaninMed offers an effective and equitable mobile application solution for skin lesion classification.

Challenges we ran into

Building MelaninMed posed the best kind of challenge: going into a project knowing that even if you don't succeed, you'll learn so much in the process. I almost spent more time debugging and watching tutorials than coding.

One of the first hurdles I encountered was my lack of experience with Swift. As someone who had primarily worked with Python and Java for back-end applications, transitioning to Swift and front-end development felt daunting. It was a challenge to learn enough Swift to implement the complex, attractive functionality I wanted to design.

Incorporating the Apple MapKit API also presented its own set of challenges. Understanding how to utilize the API for geolocation, and automatically search for dermatologists, as well as link each map marker to a Google search or website required a steep learning curve. There were moments when I felt lost deep in the Apple support forums, but perseverance paid off as I successfully leveraged the API.

Moreover, I employed advanced machine learning techniques to enhance the app's accuracy. The sophistication of custom building a Vision Transformer Adaptor, instead of relying on pretrained models or neural networks, was a massive time sink. Another obstacle arose when I finally completed the models, but couldn't export them. Through trial and error, I learned that CoreML best preserved the integrity of my original model, but I had to upload the entire package to Google Drive since my computer would not support web downloads that large. The first successful export and application integration broke the machine learning model, which classified every submitted image as a squamous cell carcinoma - freaking me out when I tested it on myself!

I even almost leaked my GPT API key, for the AI dermatologist chatbot, on GitHub.

Despite these hurdles, MelaninMed was successfully created: strengthening my skills as a developer and making headway in a future of more equitable healthcare and technology.

Accomplishments that we're proud of

I'm proud of several characteristics that define MelaninMed’s creation and impact on the healthcare industry. MelaninMed achieved a diagnostic accuracy of 99%, with an equally high performance amongst all skin tones, setting it apart from other skin cancer detection mobile applications. The process to obtain such a high accuracy took lots of trial and error: testing different computer vision models like VGG16, CNNs, YOLO, and U-Net. I experimented with different hyperparameters, like learning rates, dropout, and batch sizes, as well as integrating techniques like data augmentation and normalization to enhance performance. My limited computing resources made this process lengthy. However, being able to persevere and develop two rapid, highly-accurate computer vision models is a defining accomplishment of this project.

In addition, I am proud of MelaninMed’s multi-platform functionality. I was able to integrate several key technologies that enhanced the app’s user experience. By securing user data in Google Firebase and using Firebase for authentication, I ensured that user information remains easily accessible, yet completely private.

I also leveraged APIs for the first time: integrating GPT-4 AI for personalized dermatological advice, ensuring that users receive expert-level recommendations directly within the app. This feature not only improves user engagement but also provides valuable, real-time assistance. Additionally, the incorporation of Apple’s MapKit API allowed for accurate location services, helping users find nearby dermatologists (with websites, ratings, and contact information) quickly and easily. I used UIKit to build an intuitive interface, enabling users to take and retake high quality lesion scans. My usage of three completely new APIs is a success I am proud of.

What we learned

The most important lesson I learned from MelaninMed is just how dependent AI systems are on their data. With "bad" data that isn’t diverse or inclusive of all populations, AI models simply cannot be effective, no matter how complex or accurate the algorithms are. Without a diverse dataset to build on, MelaninMed would've perpetuated the racial bias in skin cancer detection - possibly unknowingly. Especially in sensitive fields like healthcare, efforts to curate inclusive datasets are essential to advancing technology.

Moreover, I strengthened my technical skills in a completely new language, Swift. Prior to this project, I hadn't dived into mobile application development as deeply as I did with MelaninMed. Learning Swift allowed me to build a high-performance iOS application while also gaining a deeper understanding of how mobile apps function in Apple's ecosystem.

What's next for MelaninMed

The results of MelaninMed reveal two points of interest: a high-accuracy application (99%) that surpasses several existing AI skin detection tools, as well as setting the standard for minimizing racial bias in skin cancer classification. The solution's performance across various skin tones has shown equitable results, ensuring that individuals from diverse backgrounds receive the same high-quality care.

For the 2.0 version of MelaninMed, I would focus on vertical integration to streamline the entire process from diagnosis to treatment. Currently, the app excels in providing rapid and accurate skin cancer diagnoses, but the next step is making it easier for users to transition from receiving a diagnosis to receiving the care they need.

I would like to condense the transition between diagnosis to treatment plan by partnering with telehealth services and enabling users to browse, schedule appointments with, and connect with dermatologists all in the app. This ensures a more comprehensive experience with the app. It can even allow users to receive real-time medical advice and obtain prescriptions without needing to leave the app. This would be particularly beneficial for users in rural or underserved areas, where access to specialized care is often limited.

Moreover, I will build a support forum where users could connect with one another during this difficult process. A positive diagnosis is an incredibly difficult experience, and it might help to allow users to anonymously connect with others going through the same emotions they are. This community-driven feature would help users feel supported throughout their healthcare journey.

By focusing on support from diagnosis to treatment to emotional support, MelaninMed can become more than just a diagnostic tool: it can become a community.