One of the most defining aspects of myself is my love for music and playing piano. I love listening to and playing music, but I really wanted to listen to a piece that I could call my own. I dreamt about writing a song that was on the radio, or even on a small podcast channel. Over the summer, I wanted to tap into my creativity and compose music, but since I suck at music composition, the computer geek side of me thought, “what if I let a computer make music?” So I spent the entire week gathering a couple thousand piano pieces by my favorite composers, (Chopin, Beethoven, Bach, Mozart, Brahms, Liszt, Schubert, Handel, Haydn, Tchaikovsky, Debussy, and the list goes on) from a small German site I had spent 20 euros on, and I trained a transformer neural network to generate new music. I had just learned about this type of model from the recent research paper, Attention Is All You Need ([1706.03762] Attention Is All You Need), that came out in 2017, and I thought I would try to implement it. All it did was look at a sequence of notes and chords and predict which note/chord should be next. Doing this thousands of times, I would be able to generate my own song in the style of all the composers I loved. I used nearly the same technique to generate new drug-like compounds that could tackle coronavirus, but it ended up being a lot more complicated than a music model.
What it does
It uses two machine learning models to generate the best inhibitors for a target protein. It will drastically speed up and reduce the cost of the drug discovery process.
How I built it
Model #1: LSTM model that generates new compounds and augments existing ones Model #2: 1D CNN model that predicts the binding affinity of a drug-compound to a target protein Iterative process: Model #1 generates/augments compounds, Model #2 scores them based on the predicted affinity, take the best 20, and start a new iteration.
Challenges I ran into
Time constraint - I had to train two completely different models with very large and vert different datasets in 24 hours. I also had to make generation methods and an iterative process that can use both models to generate the best drug-candidate. LSTM models are also very slow, so I had to use NVIDIA's CUDA and change my model to a CuDNNLSTM model that was capable of better utilizing the GPU.
Accomplishments that I'm proud of
The LSTM model generates valid, drug-like compounds most of the time.
What I learned
I learned a lot about the pharmaceutical industry. It's insane to think about how the tiny medicine pills that some of us take on a daily basis actually have decades and billions of dollars of development history behind them.
What's next for PharmaceuticAI
Binding affinity is an important quality for a drug, but there are other objectives that are also important that scientists have to balance when creating the best drug. Selectivity (the drug should bind only to the target and not off-targets) and toxicity (the drug should not be toxic eg. cyanide) should also be taken into account. The accuracy for the affinity model could have also been improved. I could try using 3D interactions data instead of 1D sequences.