Drug_Discovery-COVID-19
Small Molecule Development using AI
In my opinion, AI's biggest application is Healthcare. Computaional Drug Design is one area in Healthcare where AI can have one of its biggest impacts.
Any given drug can be represented in the form of SMILES. SMILES in Molecular Biology stands for Simplified Molecular-Input Line-Entry System. It is just a way to represent compounds in text format. So, my goal for the project should be to find the best drug that can work on the Novel Corona Virus Protease.
The Protein Data Bank (PDB), released the protease of the Novel Corona Virus - 6LU7.
When you're is testing how well a drug can act on a virus, one thing you want to look for is the Binding Affinity of that drug.
A "Drug" in Biochemistry terms is called a Ligand and a the virus can be called a Macromolecule. So, in simple terms, Binding Affinity can be described as to how well a ligand can bind onto the protease of a macromolecule and act on it. This action of a ligand binding onto a macromolecule is called Docking.
Approach
Since this project was run in Google Colab, you don't need to install anything locally. All you need is a computer, an active internet connection and lots patience, dedication and perseverance. Just open a Colab Notebook, and follow whatever's written in the COVID-19.ipynb. You should start by uninstalling Tensorflow 2.x and install Tensorflow 1.x since GPT2 works very smoothly with TF 1.x.
Copy the generated SMILES to a file in your local system. Later, go to the Smiles_Testing.ipynb notebook and visualize the generated SMILES. Honestly speaking, more than half the SMILES generated will have errors in them. That is why I suggest you to first copy them to a .txt file and then copy the final SMILES to a .csv file. I used DataWarrior mainly to convert a .csv file to a .sdf file to test in PyRx.
After that, I used the .sdf file for docking with 6LU7. After achieving the results, They were converted to a .csv file. You can find the results in Final_Results.csv.
To find how to perform docking in PyRx watch this tutorial.
What's next?
These drugs should be verified by an expert in the field of research to verify the synthetic feasibilty of the drugs. They should then be pushed for clinical trials and then be used for treating patients.
Challenges
I failed about 6 times before I finally solved this problem. My failures were due to a lack of clarity about where to get a good dataset from, what technique I should use to discover new drugs that could potentially cure something like COVID-19 and how to comprehend the results I got. After multiple unsuccessful attempts of solving this problem from scratch on my local computer, I realized that I can use a pre-trained model and treat this problem as a text-generation problem. The results I obtained were fairly conclusive, and the best for the training time and computational power I had available running on Google Colab.
Learn More
Find the repository
Built With
- artificial-intelligence
- git
- natural-language-processing
- python
Log in or sign up for Devpost to join the conversation.