Lipid nanoparticles are promising delivery mechanisms for a variety of drugs into the cytoplasm. They have been employed to deliver a wide range of nucleic acids, most notably the mRNA in Pfizer’s and Moderna’s COVID-19 vaccines. The main component and most important factor in improving delivery efficiency are the ionizable lipids, for which there is a narrow interval of acceptable pKas, usually between 6.1 and 6.7. Lipids must be synthesized to be screened, a process that can take months and many expensive reagents. Our project would allow for more efficient screening of ionizable lipids by creating a model to predict their pKa and delivery efficiency. Hence, researchers would only need to synthesize the lipids that the model considers a hit, greatly reducing the time and resources spent on low-quality lipids. We tried approaching the problem by creating a database from patents and publications. We then tried different structure encoding and supervised learning approaches, including kNN, random forest, and XGBoost. This resulted in a reasonably good model, considering factors like limited dataset and noisy data. We also hope to make the final product accessible for researchers without a coding background; as such, we have a user-friendly website where individuals can enter in SMILE sequences of ionizable lipids and a pKa value will be returned. At the moment, the website predictor is not fully connected to our model but we hope to improve upon it in the future.
Log in or sign up for Devpost to join the conversation.