DEEP LEARNING FOR QUANTUM CHEMISTRY

Poster

Introduction

Quantum chemistry is a branch of science that is solely interested in understanding the motion of electrons around atoms and molecules. Knowing with great accuracy the behavior of these particles is the bedrock for understanding modern chemistry, and a necessary ingredient for many applications, including catalysis, drug design, and drug discovery. The physics of these quantum systems was formulated almost a century ago and can be described by a simple differential equation known as the Schrodinger equation. For relatively small systems such as the hydrogen atom, this equation can be solved analytically. However, for larger systems, this problem is analytically intractable due to the exponential scaling of the degrees of freedom of the electrons with system size.

Over the years, scientists have developed very sophisticated computational methods to tackle the electronic structure problem for many particles. Despite relentless efforts, the task of obtaining a solution to the so-called many-body Schrodinger equation still remains a computational challenge. Even with modern supercomputers, it ab initio electronic structure calculations routinely take days, weeks, or even months to complete. Not only does this impede research from being carried out in a timely manner, but it is also a very costly activity in terms of energy and the potential for CO~2~ emissions into the environment.

To deal with the setbacks associated with conventional quantum chemistry methods, researchers have been trying to find faster and cheaper ways to make theoretical predictions. Recently, neural networks have shown significant potential to achieve this. Thanks to the data accumulated over the years from computational chemistry calculations, it is now possible to train machine learning models and make predictions on real chemical systems.

Convolutional Neural Networks (CNNs) are a powerful utility for deep learning tasks such as computer vision. In these CNN architectures, it is sufficient to use a discrete filter in our CNN architectures since the training is usually done on grid-based data (or at the very least, data that can be represented on a grid). However, due to the dynamic and physical properties of quantum systems, discrete filters are not suitable for such problems. Schutt and co-workers recently developed SchNet: a CNN architecture that uses continuous filters [1]. To our knowledge, this is currently the state-of-the-art when it comes to solving quantum chemistry-related problems.

In this work, our goal is to obtain insight into the functionality of SchNet, understand how or why continuous filters work, and finally, make a few modifications to this model and see how that affects the performance of this model. SchNet is a novel CNN approach. We have found very little discussion of this model in the literature although it is widely used in chemistry. Most of the discussions on SchNet simply cite the original paper and move on to applying the model to their architecture. We hope to, in part, remedy that through this work.

Our motivation for embarking on this project stems from our experience in doing quantum chemistry computations. Often times, we have to suffer long waiting periods of time for our jobs to fully execute. The techniques learned in this course have inspired us to explore a deep learning approach to accelerate quantum chemistry computations. We believe that this experience will be essential for our own future research which involves the application of neural networks in quantum chemistry.

Data

In this project, we use the QM9 dataset [2], which is a widely used to benchmark the prediction of properties of molecules in equilibrium. This dataset consists of 133,885 organic molecules with up to 9 heavy atoms of types C=carbon, O=oxygen, N=nitrogen, F=florine. All the geometries in this dataset are in equilibrium, meaning that all the forces acting on each atom are equal to zero. The dataset contains the atomic geometries and charges atoms in each molecule. We use these to train our model and make a prediction of the energy. It is convenient to work with QM9 since all the geometries are in equilibrium. We therefore only need to predict the energies without explicitly computing the forces.

Model Architecture

SchNet was introduced in 2017 by Schutt et al. [1]. The authors have provided open-source code for their architecture, which will be our starting point as we attempt to modify their implementation. The architecture of the original SchNet model is presented in the authors' original work [1].

Input layer The model takes atomic positions and atomic charges as inputs. These inputs are represented in the neural network using a tuple of features.

Embedding layer In the embedding layer, atoms are initialized using an embedding that corresponds to the type of atomic charges. The embeddings are initialized randomly and optimized during training. SchNet contains a total of 64 embedding layers.

Interaction blocks The interaction blocks are responsible for learning and updating the atomic representations of the model based on the input data. Each interaction block is made up of 1) 64 atom-wise layers which contain the weights and biases. This is where the recombination of feature maps is done. 2), 3 sets of 64 continuous filter convolution layers (this is where we attempt to make our modifications), and 3) an activation layer with a shifted softplus activation. Additionally, each interaction block contains a residue connection similar to ResNet models [3].

Continuous filter convolutional layers This is where the continuous filters are generated. Firstly, a rotational invariance is obtained computing the distances between atoms from the input data. You can think of these interatomic distances as strides for the filter to operate on. The main idea of the concept of a continuous filter rests on this equation. The power of the continuous filter lies on the fact that these "strides" are not evenly spaces (i.e., atomic dynamics are not organized or properly confined like pixels on a grid) and can therefore carry more meaningful information as opposed to a discrete filter. Skeptical? Indeed, we were too. This is why in our work we make modifications to the above representation and compare the results to the original architecture. We will note that, in addition to the above equation, the author's of SchNet include radial basis functions (rbf) in order to minimized the correlation between the generated filters. In our work, we try to modify this too and make comparisons with the original SchNet. We leave the rest of the cfconv block unchanged, which contains 2 sets of dense 64 layers, and 2 ssp activations.

Sum Pooling The outputs from the interaction blocks is then passed through atom-wise layers, ssp activation, and finally a pooling layer. Once the model has undergone training, we can now make some predictions of the energy.

Methodology

The original architecture of SchNet [1] can be downloaded and installed from the authors' GitHub page: https://github.com/atomistic-machine-learning/schnetpack. Note that the version we use for this project (latest version) is written in PyTorch. The most important part of the SchNet is the convolution part in the interaction block: cfconv. We will therefore introduce our modifications to this block. In the original SchNet model, the filters in cfconv are designed to be rotationally invariant by using the interatomic distances (i.e., the difference between atoms) as input for the network filter. We attempt two versions of modifications to this block. 1) instead of taking the interlyer distances between the atoms, we simply pass the input unaltered. Everything else stays the same. 2) We replace the rbfs with discrete convolution layers. We explore these modified versions and report the results.

Preprocessing

We start by downloading the dataset using the QM9 class in SchNet. The data is then split into training, validation, and test data using a helper function spk.train_test_split provided in SchNet. The split data is then stored in the split.npz file. Finally, we load the data into the model using the AtomsLoader class, also provided in SchNet helper scripts.

Training

We take the atomic positions and atomic charges from the QM9 dataset as inputs, and the energies as labels. SchNet provides some statistics about the data, e.g., the mean values, standard deviation, etc. This can give us some idea of what the target property looks like. This is especially helpful in avoiding the non-physical initialization of some parameters in the model. The Trainer class is used to train the model. Although this class comes with the SchNetPack, we still need to define the loss function. In this case, we use the mean squared error of the energy. Here, we set hyperparameters such as learning rate, epoch size, and so on. The output of the training is stored in a log file with the mean absolute errors per epoch, including the training time.

Metrics

As a metric for success, we compare the mean absolute errors between the original SchNet model and our modified SchNet.

Results

See final writeup for results and other details.

Ethics

As we implement these deep learning techniques, it is important to be mindful of the potential social implications that these algorithms have. Most of the predictions made in quantum chemistry applications are used by experimentalists to inform their lab techniques, e.g., in drug design and pharmaceuticals. The long-term consequences of entrusting a computer to guide how you design your drugs is still uncharted territory as this science has not yet been fully developed. However, we recommend that even as we develop such kind of applications, they should not be substitute for the guidance and intuition of a human chemist. The goal of these algorithms is to guide humanity, not replace it.

Challenges

The first challenge we ran into was the fact that the original SchNet code used PyTorch instead of tensorflow. Neither of us had used PyTorch before, and knowing that it was infeasible to recode an architecture as large as SchNet in tensorflow, we had quickly adapt and learn a bit about the PyTorch library in a very short period of time. Secondly, we encountered a lot of difficulty understanding how to actually use SchNet once we had it setup on our local machines. The greatest challenge stemmed from how to use database files. The SchNet GitHub repository did not thoroughly explain how to properly pass the database file in the command line when running their model. This led to several installations and uninstallation of the architecture, which ended up being more time-consuming than initially anticipated.

Reflections

Overall, we can say that our project is a success because, for one, we have achieved the base goal, which was to compare the performance of two kinds of filters. In the project, we have seen that the modified models give the results we expected. By comparing the results, we have gained a better understanding of how SchNet works and why it is better than models, including our modified versions of SchNet. Even while keeping the entire architecture mostly unchanged and alterring only the continuous filter part, we so a significant drop in accuracy in the modified models. We can confidently say that continuous filters are suitable for quantum chemistry problems.

On the other hand, we did not achieve our goal of introducing our modified models in a Graph Neural Network platform that compares the performance of various types of convolutional models. This platform, known as MatDeepLearn (https://www.nature.com/articles/s41524-021-00554-0, https://github.com/vxfung/MatDeepLearn) compares the performance of other state-of-the-art networks in quantum chemistry. We found this idea much harder to implement due to the complexity of MatDeepLearn.

We would like to explore ideas such as these further and even develop original models in quantum chemistry from scratch. For instance, if time permitted, it would have been interesting to see a totally different deep learning approach such as attention and transformers, for example, being applied to problems in quantum chemistry. If there's anything we have learned from this project, it is this: deep learning is a versatile computational technique and its full potential hasn't been yet explored in other fields. I mean, who knew that CNNs, an approach reserved mainly for image processing, can perform learn chemical space and make accurate predictions? This is only the beginning of what can be done with deep learning! However, due to time constraints, this is where we rest our case for now.

Division of labor

XJ and TK contributed equally to the success of this project. XJ did most of the code modifications to SchNet while TK did most of the writing and poster preparation.

References

[1] Kristof T Sch ̈utt, Pieter-Jan Kindermans, Huziel E Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert M ̈uller. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. arXiv preprint arXiv:1706.08566, 2017.

[2] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.

[3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.