Inspiration

Using ML Skills in Bio

What it does

Multiplexed assay of variant effect (MAVE) data integration across experiments presents significant methodological challenges that limit clinical utility. We present a comprehensive framework for analyzing cross-experiment consistency, implementing quality control protocols, and developing imputation strategies for sparse MAVE datasets. Using SPTAN1 variant data as a case study, we demonstrate that only 60.2% of mutations exhibit high consistency across experiments (consistency score ≥ 0.7), while experiment-specific biases range from -0.083 to 1.055 z-score units. Our analysis reveals fundamental limitations in current integration approaches and proposes generative models (Dirichlet and Boltzmann LDA) as promising alternatives to traditional imputation methods. The framework establishes quality control metrics and validation protocols essential for reliable clinical interpretation of MAVE data.

How we built it

Cursor carried hard

Challenges we ran into

Our original proposal was a lot more ambitious, but after looking through the data and thinking about potential methods, we picked something basic as a baseline/starting point towards this normalization.

Accomplishments that we're proud of

What we learned

Plan better

What's next for Normalizing SPTAN1 Across Different Experiment

Use a different method than Z scores and consider using generative models (Dirichlet/Boltzman LDA)

Built With

Share this project:

Updates