π§ β¨ Synthetic Brain MRI Generation & Classification with AI
π Abstract
This project leverages Artificial Intelligence to address ethical and privacy concerns in healthcare by generating synthetic brain cancer images. The aim is to create realistic synthetic datasets that mimic real patient data, enabling effective training of ML models without compromising privacy.
π§ Introduction
Training machine learning models on sensitive healthcare data raises serious ethical and legal issues.
To tackle this, we explored the use of Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN) to generate synthetic brain MRI scans.
We then evaluated the performance of classification modelsβnotably Convolutional Neural Networks (CNN) and VAEsβon real, synthetic, and hybrid datasets.
ποΈ Dataset
We used the Br35H dataset, which includes:
π§ͺ 3000 MRI scans
- π§ 1500 tumor-positive
- β 1500 tumor-negative
π Varied image sizes & scanning techniques
β οΈ Non-IID structure, posing challenges for consistent model training
πΌοΈ Example Images
(a) 'Flair' Brain MRI β Size: 587Γ630
(b) 'T2' Brain MRI β Size: 197Γ256
π§ͺ Methodology
π§Ό Pre-processing
Resized and normalized all images to
[0, 1]Maintained aspect ratios
Unified backgrounds and dimensions
𧬠Data Generation
GAN (Generative Adversarial Network)
- Generator vs Discriminator in a min-max game
VAE (Variational Autoencoder)
- Encoder-decoder architecture mapping to latent space
π Trained separately for tumor-positive and tumor-negative images to ensure balance
π§ Classification
π Models
CNN Architecture: Feature extraction + classification layers
VAE for Classification: Used reconstruction error as a signal for anomaly (tumor) detection
π οΈ Techniques
π Global Image Absolute Error Magnitude (GIAEM)
π DBSCAN: Density-based spatial clustering
π― Singular-example KMeans
π Global KMeans: Clustered reconstruction errors across dataset
π Experimental Results
π¨ Generative Task
| Model | Result |
|---|---|
| GAN | β Struggled with realistic tumor generation, high noise |
| VAE | β More realistic images, but difficulties in tumor regions |
π€ Classification Task
| Model | Performance |
|---|---|
| CNN | π’ High accuracy on real data, poor generalization to synthetic |
| VAE | π‘ Varying results; Global KMeans achieved best balance |
π Performance Metrics
| Model | Train Data | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|---|
| CNN | Real | 96.67 | 95.42 | 97.99 | 96.69 |
| Synthetic | 58.16 | 57.53 | 56.94 | 57.24 | |
| Mixed | 49.97 | 49.98 | 99.93 | 66.64 | |
| VAE (GIAEM) | Real | 60.44 | 60.95 | 67.11 | 56.76 |
| Synthetic | 80.08 | 68.26 | 66.22 | 67.09 | |
| Mixed | 81.24 | 69.99 | 64.11 | 65.95 | |
| VAE (Global KMeans) | Real | 82.93 | 73.55 | 67.58 | 69.66 |
| Synthetic | 79.39 | 68.26 | 69.44 | 68.80 | |
| Mixed | 78.80 | 39.88 | 49.27 | 44.08 |
π§Ύ Conclusions & Future Work
β οΈ The Br35H dataset posed difficulties due to its heterogeneous nature
π VAE proved to be the more effective generative model
π§ CNN excelled on real data but lacked generalization
π¬ Future work should:
- Explore more consistent datasets
- Enhance GAN performance
- Expand unsupervised techniques like DBSCAN and KMeans
π References
CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection β Abdul Waheed et al.
Synthetic Medical Images Using F&BGAN for Improved Lung Nodules Classification β Defang Zhao et al.
Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image Augmentation for Tumor Detection
Infinite Brain MR Images: PGGAN-based Data Augmentation for Tumor Detection
Br35H: Brain Tumor Detection 2020
DBSCAN
KMeans
π¨βπ» Team
𧬠Generation Team: Francesco D'Aprile, Sara Lazzaroni
π Classification Team: Anthony Di Pietro, Tommaso Mattei
Sources
At the following Google Drive link, you can find the dataset (both real and synthetic) and the weights for the classification and generation models. https://drive.google.com/drive/folders/1JJT4MP_5GSH_CU1Blvm6GpfhzZvHZFlb
π More Info
π Check out the full paper in doc/ for more details!
Log in or sign up for Devpost to join the conversation.