Inspiration

Skin cancer is one of the most common cancers globally, and early diagnosis is crucial for survival. However, developing accurate diagnostic AI models requires large, diverse image datasets, which are often scarce, expensive, or biased.

In the same time, synthetic skin cancer images can be used to train our dermatologists to make a good diagnosis due to the vast numbers of skin images the skin cancer image generative ai can produce. Ideally, the synthetic skin cancer images should be validated first by the senior dermatologists as the Subject Matter Experts (SMEs).

What it does

This project proposes a novel pipeline that generates synthetic skin cancer images from patient tabular data (such as age, sex, lesion location, and metadata). The generative model creates realistic-looking medical images, which can be used to augment existing datasets and help train more robust skin cancer classifiers.

How we built it

• Step 1: Clean and preprocess the ISIC dataset. • Step 2: Separate the tabular data as the features (X) and the images as the target (y). • Step 3: Add one column of random number to ensure the generative ai model will always generate new synthetic images, and not monotone images. • Step 4: Divide the total dataset into batch consists of 16 rows each, and do the normalization, e.g. divide the image pixels by 255. • Step 5: Train the GAN model using custom training via GradientTape block for both the generator and discriminator to produce the realisic synthetic dermatology images. • Step 6: Visualize and validate the outputs using Subject Matter Experts (if available). • Step 7: Benchmark with other generative AI model, such as Convolutional 2D Transpose, to understand which generate the best realistic synthetic images.

Challenges we ran into

• The generated images are synthetic and may not capture all real-world nuances. • Model trained on publicly available dataset, no clinical validation yet. • Due to limited GPU access on Kaggle and hackathon time constraints, this GAN and Conv2DTranspose demo was trained for only small epochs. It demonstrates the core idea and model pipeline, and can be scaled for better results.

Accomplishments that we're proud of

• Converts tabular patient data into synthetic images. • Built in Python using GAN (Generative Adversarial Networks) and 2-D Conv Transpose model. • Can be integrated into a Full Stack Development, using Javascript, Vue js, Bootstrap, CSS as the front-end, and Laravel and Python as the back-end. • Can be integrated into Mobile App Development, using 3 languages (English, Japanese, Indonesian) as a multi-language app, and dual mode (dark/light), using best practices like incorporating Jetpack Compose into the UI development, clean code structure using MVP, clean Http Request using Retrofit, easy to maintain/collaborate, and readibility. • Trained on open Kaggle competition datasets, the ISIC skin cancer 2024. • Runs on Kaggle Notebook for reproducibility and zero-cost access.

What we learned

• Good infrastructure supports AI/ML development a lot, hence it's important to build the strong and reliable data centre for AI trainings, researches, collaborations, and inferencings. • Good dataset supports AI/ML development a lot, hence it's important to establish strong collaborations from multiple and diverse institutions to share the dataset together, like how ISIC (International Skin Imaging Collaboration) established collaborations from 6 continents to tackle skin cancer problems together, to build a better future for the next generations.

What's next for Synthetic Skin Cancer Image Generation Using GAN & 2D Deconv

• Convert 2D synthetic images into 3D representations for better examination and education of skin cancer diseases. • Incorporate multimodal inputs. • Augment the synthetic image dataset using noise, crop, blur, etc. Because in the real-world, medical imaging often is not perfect. But the doctors need to be able to make a good diagnosis whether the imagings are in good or bad conditions. Therefore by training our doctors using imperfect imaging, we can train them to make a good diagnosis in the presence of imperfect patient imaging. • Build a real-time assistant to generate and diagnose synthetic skin lesions.

Built With

  • h5py
  • kaggle-notebook
  • keras
  • matplotlib
  • numpy
  • pandas
  • pickle
  • pil
  • python
  • scikitlearn
  • seaborn
  • sequence
  • tensorflow
Share this project:

Updates