Ectopic expression of proteins can endow cells with new capabilities, such as rendering neurons photo-sensitive through the expression of opsins, which enables optical control in techniques like optogenetics. Optogenetics, alongside related methods such as chemogenetics and sonogenetics, has transformed neuroscience and shows great promise for neuromodulatory therapies targeting neurological disorders. Ectopic proteins are introduced by packaging the gene along with a promoter into a viral capsid (e.g., AAV6-hSyn-Chrimson) and delivering it into nervous tissue or the bloodstream. One key advantage of genetically-targeted techniques like optogenetics is their cell-type specificity; however, only a limited number of promoters are currently available. Moreover, there is a significant gap in computational tools that can predict in which cell types a particular protein will be expressed.
In this work, we propose Protein Design with Cellular Context, a method that leverages protein and DNA language models to predict protein expression across various cell types in the mouse brain. Using a dataset of mouse transcription factors (TFs), we generate a TF embedding with ESM3, while a promoter embedding (Gene+Promoter Embedding) is created using DNABERT based on promoter sequences from genes expressed in the mouse brain. These TF and promoter embeddings are then combined using a gated attention module to predict whether a specific opsin protein will be expressed in each cell type of the mouse brain, using gene expression data (RNA-seq from the Allen Brain Atlas). The key contributions of this work include:
- Construction of TF, promoter, and RNA-seq datasets for cell types in the mouse brain.
- Incorporation of cell-type context into a classifier architecture to predict differential protein expression across every cell type in brain tissue for a given organism.
- Modal endpoint for fine-tuning ESM models, enabling further refinement for specific applications. We expect this tool to be helpful in the design of novel promoters and cell-type specific gene therapies.
Built With
- dnabert
- esm
- instadeepai
- python
Log in or sign up for Devpost to join the conversation.