Morphologic AI

Cell area results, revealing differences between compounds and dosages
Our (AI generated) logo
Nuclear intensity, showing evidence of batch effects

The human cell holds the key to unlocking new frontiers in human medicine. With advanced computer vision technologies, Morphologic AI is set to decipher its intricate workings, training algorithms on vast datasets to reveal patterns, with profound implications for drug discovery.

Morphologic AI is leveraging advanced computer vision technology to better understand cellular morphology changes due to disease, drug treatment, or transcriptional changes. Techbio company Recursion has demonstrated the use of ML embeddings from fluorescent cell images to predict multiple cellular features from light microscopy images without fluorescent annotation. While black-box deep learning embeddings from these images have been generated through Recursion's proprietary network, they are not easily interpretable, reducing their usefulness for the broader biotech field in attempting to understand how cellular morphology relates to the state of a cell.

Using Recursion’s publicly available HUVEC cell image datasets, we have segmented multi-cell images to extract individual cells and map key morphological features on a per-cell basis, providing a more biologically interpretable way to describe drug treatment or disease-related perturbations. Our mapped features can then be compared to Recursion’s embeddings to assess how well human-explainable features align with the system's unsupervised machine learning, and to determine if inclusion of additional morphological readouts could yield better insights. We also aim to explore whether combining these interpretable features with unsupervised embeddings enhances data interpretation.

We are utilizing the information from the morphological effects of 1500+ small molecules from the Recursion datasets to extract high-dimensional information on drug clustering, This will provide drug engineering teams with additional insights into the form-function relationship of current drugs, expanding the limitations of using low-throughput assays and SMILES correlations in creating safer and more effective drugs.

Ultimately, we are creating a pipeline that inputs cell images with annotated features, extracts and embeds their morphological data, and applies these insights to broad applications across cell types, diseases, and drug treatments.

What it does

Our goal is to utilize cell segmentation and morphological readouts from multi-cellular organization within each image, to provide a more biologically interpretable set of features describing each perturbation, in such a way that could potentially be extended to either different cell types in the future, or ultimately to understand how cells in a 2D tissue context may change in response to these therapeutic interventions. This will be accomplished in a few steps:

Performing the segmentation, and engineering the extraction of specific biological features that could tell us more about the underlying state of the cells themselves
Mapping those features against the embeddings to determine how well these explainable features compare to the unsupervised embeddings. If mapping is not concordant, potentially we could add the embeddings as an additional set of features, in order to see if the sum of embeddings plus supervised biological features, can better explain the differences in terms of treatment trajectory the embeddings undergo.

How we built it

We, in essence, developed Python scripts that represents a cellular analysis pipeline designed to extract rich. Biologically meaningful data from high-content imaging of HUVEC cells. Key Features: Multi-channel image processing for comprehensive cell analysis Advanced segmentation of nuclei, cell bodies, mitochondria, and golgi apparatus Enabling precise quantification of morphological changes. Extraction of a wide array of morphological and intensity-based features Inter-organelle relationship analysis, including distances and overlap features Parallel processing capabilities for high throughput analysis

Challenges we ran into

Our project tackled significant big data challenges inherent in high-content cellular imaging. We processed and analyzed terabytes of multi-channel microscopy data from the RxRx datasets, including RxRx19a, RxRx19b, and RxRx3 The amount of data was dizzying, ! (took us ~2 days just to load it onto AWS) Our team adapted CellPose into a batch cell segmenter to be able to automatically process through all this data A lot of wrangling with limits, and some side quests to hack together jupyter servers etc. We also faced the challenge of integrating diverse data types, including image data, chemical structures (SMILES), and gene knockout information.

Special thanks to Berton from Recursion, who was super quick at responding to requests for clarifications, including uploading a critical file missingthat we could not regenerate due to memory limitations, that permitted us to work with the rxrx3 embeddings

Accomplishments that we're proud of

We discovered the ability to interrogate the potential of batch effects - changes to the image features not due to biological variability, but potentially technical variability - that would have been buried in the "deep learning embeddings" and harder to parse out / potentially confounded with other cofactors.