Multimodal Precision for Skin Disease Detection

About the Project

Skin rashes and lesions are notoriously difficult to diagnose. Many conditions, including dermatitis, psoriasis, melanoma, and systemic lupus erythematosus (SLE), present with overlapping visual features, even for experienced clinicians.

To improve diagnostic precision, we pursued a novel approach that pairs visual symptoms with molecular signals from the blood.

Rather than relying on images alone, we built a multimodal AI system that integrates skin photographs with blood-based transcriptomic measurements to predict whether a patient has a skin-related disease.

Proof-of-Concept Website: click here!

How We Built It

Image Understanding with Foundation Models

We leveraged a powerful pretrained dermatology vision-language model, MONET, built on a CLIP ViT-L/14 architecture. MONET was trained on over 105,000 dermatological images paired with medical text descriptions, enabling it to:

Recognize dermatologic concepts with dermatologist-level accuracy
Provide interpretable visual representations
Maintain transparency throughout the AI pipeline

This provided a strong visual encoder capable of extracting meaningful disease-related features from skin images.

Transcriptomics Integration

To complement the image data, we incorporated blood gene expression profiles.

Rather than using the full transcriptome, which is high-dimensional and often noisy, we:

Performed multicohort meta-analysis across multiple skin disease datasets
Identified differentially expressed genes associated with disease
Restricted the transcriptomic input to biologically relevant genes

Mathematically, instead of learning from:

\( X \in \mathbb{R}^{n \times G} \)

where \(G\) represents the full set of measured genes, we trained on:

\( X_{\text{filtered}} \in \mathbb{R}^{n \times g}, \quad g \ll G \)

This improved both model efficiency and biological signal quality.

Multimodal Fusion

We integrated:

Visual embeddings derived from MONET
Gene expression features from blood transcriptomics

into a joint predictive model.

We then deployed the system through a user-friendly web interface that allows clinicians to upload images, input transcriptomic data, and receive disease predictions.

Challenges We Faced

Transcriptomic Data Quality

One of the primary challenges was identifying usable public gene expression datasets.

Many microarray datasets were:

Poorly normalized
Inconsistent across samples
Not directly comparable across studies

Reprocessing raw data from scratch exceeded the time constraints of TreeHacks.

To address this, we prioritized datasets that were already properly processed and validated, enabling rapid integration while maintaining data quality.

Additionally, because our imaging and transcriptomic data were not paired at the patient level, we carefully designed the neural network architecture to effectively learn multimodal representations despite this limitation.

What We Learned

Each member ventured to learn a new data implementation approach in this project. We describe below the novelty of multimodal datasets in clinical applications as the use of mRNA-based gene expression data is biologically rich yet undervalued in diagnostics.
Multimodal models capture complementary biological signals more effectively than single-modality approaches
Data preprocessing quality is critical to downstream model performance
Unpaired data samples still hold predictive power at the cost of describing biomedical features separately and distinctively.
Model architecture is critical in extracting meaningful topological features. We derive inspiration from the architecture of a recently described EHR-Omics prediction model [Matarso Nat Mach Intell, 2024] for Vision-Omics.
Omics remains highly valuable. We believe future diagnostics will enhance the predictive ability of diagnosing correct disease conditions.

Impact and Vision

Most existing skin disease classifiers focus primarily on melanoma or skin cancer and rely exclusively on images. Rare diseases and autoimmune conditions remain underrepresented and difficult to diagnose using single data modalities.

This diagnostic challenge often leads to prolonged diagnostic timelines spanning several years for many patients.

Our integrative approach:

Combines visual phenotypes with molecular biomarkers
Improves diagnostic precision in ambiguous cases
Provides a scalable framework for future multimodal medical AI systems

We envision this platform as a clinical decision-support tool that enables more accurate, faster diagnoses and advances precision medicine.

What’s Next for LUNA

Our prototype demonstrates the promise of multimodal learning, but substantial opportunities remain to expand both the dataset and modeling capabilities.

First, we will incorporate transcriptomic datasets that were excluded due to poor normalization. Since raw expression files are mandatory uploads in public repositories, we can retrieve and properly renormalize these data to ensure consistency across cohorts.

Second, we will expand beyond microarrays, which measure gene expression via fluorescent probes, by integrating RNA sequencing (RNA-seq) data. RNA-seq directly quantifies mRNA transcript abundance and is increasingly prevalent in public datasets, enabling improved biological resolution and statistical power.

In parallel, we will continue curating larger and more diverse skin image datasets.

From a modeling perspective, we plan to conduct systematic hyperparameter tuning and explore architectural enhancements to improve multimodal fusion.

Finally, we aim to translate this platform into a formal research study and potential clinical product, rigorously evaluating whether integrating skin imaging with blood transcriptomics significantly outperforms single-modality diagnostic approaches.

Built With

geo(geneexpressionomnibus)
gpu
metaintegrator
monet
python
pytorch
r
sklearn

Submitted to

TreeHacks 2026

Created by

I worked to develop the architecture and design of the model. I assisted with finding the requisite datasets that we could analyze.

Isha Arora
I searched public repositories for blood-based transcriptomics (microarray) data, downloaded and processed data. Then, to identify genes that change significantly between healthy and diseased people, I ran gene expression meta analysis. We used these significant genes as the predictors for our model.

ahowley1
C.G.P handled the implementation of the imaging pipeline from collection, processing, to analysis.

cprovido

Updates

Isha Arora started this project — Feb 15, 2026 05:20 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.