Predicting hydration layers on surfaces using deep learning

Characterisation of the nanoscale interface formed between minerals and water is essential to the understanding of natural processes, such as biomineralization, and to develop new technologies where function is dominated by the mineral–water interface. Atomic force microscopy offers the potential to characterize solid–liquid interfaces in high-resolution, with several experimental and theoretical studies offering molecular scale resolution by linking measurements directly to water density on the surface. However, the theoretical techniques used to interpret such results are computationally intensive and development of the approach has been limited by interpretation challenges. In this work, we develop a deep learning architecture to learn the solid–liquid interface of polymorphs of calcium carbonate, allowing for the rapid predictions of density profiles with reasonable accuracy.

Schematic of the process of taking training data from the calcite database. The calcite slab, with marked unit cell, is shown from top, with the box of cropped density. It is ensured that the surface is shared in the subsequent box locations. The atoms of calcium, oxygen, and carbon are represented by blue, red, and brown spheres. (The atomic structure is imaged using VESTA 8 ) (voxel size of 0.2 Å). We smeared the densities with a gaussian to extend the effect of the point sized atoms -the C sur f ace , O sur f ace , and Ca sur f ace -and to reduce the statistical-sampling noises in the water density -the O water -from the trajectory. Next, we split this data-set into volumes of 10 × 10 × 20 Å 3 , in a way that each block contained the top two layers of the surface and the interfacial O water density, as seen in Fig. S1. We chose this volume size because it minimises GPU memory required during the training, while suitably covering the effect of defects on the hydration layer density. This results in a pre-processed data size of 26,784; consisting of training, validation, and test datasets of sizes 19,280, 4,824, and 2,680, respectively.
Similar to the calcite case, we generated a dataset of 1024 cases comprising defects on the surface of aragonite. Combining it with the preprocessed calcite data, we obtained a final dataset with 51,336 structures; consisting of training, validation, and test datasets of sizes 39,960, 9,240, and 5,136, respectively. We trained the ML model with this combined dataset to estimate its overfitting and to determine its generality.

Machine learning
The U-Net comprises 3D convolution neural networks chained like the encoder-decoder models 9 . Our network had three scale pooling until the encoded latent space and skip connections, which connect the layers in the encoder to the ones in the de- coder, as seen in fig S2. All the 3D convolution layers, denoted as φ (x l , w), with x l as input at layer l and w as weights, are followed by a leaky ReLU activation layer 10 α(φ (x i , w)) to induce non-linearity in the model, cif. equation 1. The skip connections, red lines in fig S2, concatenates the output of the layer prior to pooling, during encoding, to the output of the layer up-scaled to the same scale, during decoding. These update rules are as follows: where α l is the differential of the activation layer with input x l . In equation 2 the x cat n−1 is the concatenation of the output after up-scaling and output prior to pooling. This allows for a greater significance of the input over the deeper layers, in decoder, which mitigates the loss of the semantics from the input. The equation 3 shows the gradient of the loss function L with respect to the weights w l of the layer prior to pooling, using backpropagation 11 . Back-propagation uses the chain rule of differentiation to calculate this gradient in terms of the gradient at the next layers; which is calculated in terms of the gradient in the subsequent layers, until the loss value. These gradients are used to update the weights during training. At the layer prior to pooling, this gradient is expressed as a summation of the gradient at the layer after the skip connection, and the gradient at the layer after the pooling. This permits the shallower layers during encoding, to have a significant gradient during training from deeper layers. Thus, less training is required to learn the semantics, due to the skip connections.
We applied a soft self-attention mechanism 12 in the attention variant of the U-Net. The output of the lowest encoded latentspace scale (up-scaled to skip value) is used as the key. This allows for non-local semantic learning in the attention mechanism. The skip value is also used as the query and, hence, the mechanism is called self-attention. The linear transformations applied to the query and the key are voided of spatial information, by using 1 × 1 × 1 kernel convolutions, thus keeping the network small 12 . We derived the attention value using sigmoid activation. This attention is multiplied element-wise to the skip connection, given as: During training, the weights are updated using ADADELTA scheme 13 . The mean absolute error (MAE) -the L1 loss function Fig. S3 The errors during the training of the U-net and the attention U-net over the calcite database. A step scheduler is used to control the learning rate, and hence, the errors drops drastically at certain epochs.
-is used as the loss function to calculate the gradients for the backward propagation. This is calculated between the predicted and the simulated water densities, Θ(x) and y respectively, given as: We trained these models on a NVIDIA ® Tesla ® V100 GPU, with a 6-core CPU for loading data from the storage.  Fig. S4 Prediction of hydration layers, over a surface with Ca 2+ vacancy, using the U-Net. (a) Comparison of 2D slices in simulated and predicted water density at z heights corresponding to the peaks in the simulated data. The density (ρ) is scaled with the bulk water density (ρ o ) for the 2D slices. (b) the mean water density in the 2D xz plane, (c) the 1D water density along the z direction. In the 2D data, the atoms C, O, and Ca, are represented as circles of brown, blue, and red colour, respectively.