Open Access Article
Amer Marwan
El-Samman
*a,
Incé Amina
Husain
a,
Mai
Huynh
a,
Stefano
De Castro
a,
Brooke
Morton
a and
Stijn
De Baerdemacker
ab
aUniversity of New Brunswick, Department of Chemistry, 30 Dineen Dr, Fredericton, Canada. E-mail: aelsamma@unb.ca
bUniversity of New Brunswick, Department of Mathematics and Statistics, 30 Dineen Dr, Fredericton, Canada. E-mail: stijn.debaerdemacker@unb.ca
First published on 15th February 2024
Graph neural nets, such as SchNet, [Schütt et al., J. Chem. Phys., 2018, 148, 241722], and AIMNet, [Zubatyuk et al., Sci. Adv., 2019, 5, 8] provide accurate predictions of chemical quantities without invoking any direct physical or chemical principles. These methods learn a hidden statistical representation of molecular systems in an end-to-end fashion; from xyz coordinates to molecular properties with many hidden layers in between. This naturally leads to the interpretability question: what underlying chemical model determines the algorithm's accurate decision-making? By analyzing the hidden layer activations of QM9-trained graph neural networks, also known as “embedding vectors” with dimension-reduction, linear discriminant analysis and Euclidean-distance measures we shed light on an interpretation. The result is a quantifiable geometry of these models' decision making that identifies chemical moieties and has a low parametric space of ∼5 important parameters from the fully-trained 128-parameter embedding. The geometry of the embedding space organizes these moieties with sharp linear boundaries that can classify each chemical environment within <5 × 10−4 error. Euclidean distance between embedding vectors can be used to demonstrate a versatile molecular similarity measure, comparable to other popular hand-crafted representations such as Smooth Overlap of Atomic Positions (SOAP). We also reveal that the embedding vectors can be used to extract observables that are related to chemical environments such as pKa and NMR. While not presenting a fully comprehensive theory of interpretability, this work is in line with the recent push for explainable AI (XAI) and gives insights into the depth of modern statistical representations of chemistry, such as graph neural nets, in this rapidly evolving technology.
We are specifically interested in providing interpretability to graph neural nets (GNN) that bypass the computation of chemistry's electronic properties. Such neural nets hold promise in statistically learning the solutions (or approximate solutions) of the costly Schrödinger equation, a feat that would tremendously speed up the exploration of chemical space.43–45
In this work, we narrow in on this new and rapidly progressing area of graph modelling of chemical data, using GNNs such as SchNet. By analyzing the graph neural network's activations in response to molecular input, we find that the activations (called “embeddings”) fit sharply within what is universally understood as chemical environments/moieties. Furthermore, we show that one can associate a Euclidean-distance measure to the hidden atomistic neural net activations, allowing for a straightforward molecular similarity measure in terms of Euclidean distances. This Euclidean-distance-preserving space of embeddings can be analyzed with Linear Discriminant Analysis (LDA) to show clear-cut boundaries between different chemical moieties.
Early indications of this concept have been observed by Zubatyuk et al.47 and Smith et al.48 while validating the feature vectors of their AIMNet neural net and ANI-1x model, respectively, with t-distributed stochastic neighbor embedding (t-SNE). In both architectures, the learned feature vectors of their neural net naturally clustered into distinct regions representing distinct chemical environments found in the QM9 and ANI datasets. However, due to the distortions of the non-linear t-SNE projection used in their studies,49 it is not possible to further analyze and quantify this space as a representation of chemistry on its own with its own useful characteristics. More recently, Lederer et al. showed that a type-assignment matrix50 and adjacency matrix can be used to arrive at an unsupervised learning objective to assign atoms to their chemical moieties.51
In our work, we take a different approach. In the interest of revealing the hidden contents of a GNN model itself, we do not design a machine learning model or do any additional training. Instead, we analyze a pretrained GNN model of chemistry with Euclidean-distance-preserving techniques to show a representation of chemistry that already contains structural integrity in terms of chemical moieties without the need for more machine learning. A simple Linear Discriminant Analysis (LDA) model shows that a GNN model already contains a representation that divides boundaries between chemical moieties with high resolution (classification error of 3 × 10−4). In addition, the internal structure of the model can be analyzed with Euclidean-distance measurements which act as a similarity measure between these moieties. We also provide precursors on how this representation can be used for transfer learning purposes towards other local chemical properties such as pKa and NMR.
The approach we take in this work is a global one.52–55 We seek to understand the elementary decision-making and the variables that underlie the system's predictions as a whole. Whereas local explanations, such as feature-attribution methods, saliency maps, deep visualization and others56–65 can render case-by-case explanations, they are not aimed at providing an appreciable understanding that encapsulates the whole black-box model in one interpretable model. In many instances decision-trees are seen as global interpretable models since they can be thought of as performing a set of elementary decisions.66 The purpose of this work is to seek an interpretation on that level for GNNs: what is the system of decision-making that the model undergoes for a prediction and what are the variables at play that affect decisions (and predictions)? With that respect, we follow the subtle difference in definitions for interpretability and explainability from Roscher et al.,67 in which the former refers to a mapping of an internally learned abstract feature to a human-defined concept, such as moieties or functional groups in chemistry, whereas the latter employs features from the interpretable domain to explain the decision making of the model for specific examples.
Behler and Parinello would solve these problems, proposing a neural network architectural design that is more compatible with molecular systems.71–73 Based on an atom-centered approach to predictions, Behler–Parinello neural nets partitioned each molecule into atoms. These atomistic neural nets contributed to a total potential energy by pooling all the predicted atomwise contributions. To include rotationally-invariant interatomic interactions between each atomwise partition, Behler and Parinello used manually-crafted symmetry functions of the interatomic distance as input. In other words, the neural net was not end-to-end (did not make predictions directly from xyz coordinates) but assumed an initial representation of interacting molecular systems via the symmetry functions.
SchNet's architecture,19–22 and many other MPNNs,74–76 are similar to Behler–Parinello neural nets, see Fig. 1, particularly that they are often partitioned atomwise. However, MPNNs can have a completely end-to-end architecture and are thus not restricted to make hand-crafted assumptions about the interatomic interactions. Instead MPNNs derive their own representation of interacting atoms from the molecular graph. The messages between atomistic “nodes” can themselves be parameterized to fit the interatomic distances between the atoms and thus “fitted” to make an accurate prediction. After these interactions, a final representation for each atom in the molecule is stored in the so-called “embedding vector”. The embedding vector is the neural-network-representation of an atom-in-a-molecule. This internal representation is finally used to make a prediction of that atom's contribution to the total property by running it through a standard feedforward neural net. The atomwise properties are then summed to give a total molecular property. Throughout the process, the internal representation for each atom, and the entire molecule, remains a hidden feature of the algorithm, leaving the precise nature of the chemical model unknown. We seek to shed light on the type of chemical model that GNNs build using their embedding vectors, which allow them to achieve their accurate predictions.
D that is dependent on layer l and atom i with charge Zi (see Fig. 1 for a visual summary of the network's architecture). The dimension of the embedding vector D determines the information storage capacity of the embedding vectors, and should therefore be chosen sufficiently large by the user. During a feedforward pass through the network, the embedding vectors are updated after each individual layer l| xil+1 = xil + vil, | (1) |
000 small-sized organic molecules (∼5 Å to 10 Å in size) with optimized geometries at the B3LYP/6-31G(d,p) level of theory. The network was trained on total electronic energy at 0 K, although QM9 includes other associated properties such as dipole moment, enthalpy, etc. The algorithm used for training had six interaction layers and an embedding vector dimension D = 128. Other relevant parameters in the network are 128 convolutional filters, 50 Gaussians, and an interaction cutoff distance of 50 Å. The first 100
000 molecules of QM9 were used as training data points, the next 10
000 as validation data points, and the rest was left for testing. Gaussians were used for an initial expansion of the interatomic distances to provide a flexible starting representation for the model. The cutoff distance was purposely chosen to be very large (relative to QM9 molecules) so that all atoms were included in the interaction, giving the model freedom to have a global representation rather than force it into a local representation.78 Although generally expensive and often leading to an overfitted model, a balance was achieved between high accuracy and generalizability on QM9, as shown by the loss plot in Fig. 5 in the Appendix section which displays a final validation MAE of 0.020 eV. The generalizability of the SchNet model was verified explicitly by running an additional experiment in which training was performed on a QM9 data subset in which all alcohols were systematically removed, after which test set errors on both test sets with or without alcohols produced comparable results. Various embeddings sizes, such as 30, 60, along with the 128, gave similar results for the analysis. We present the analysis of the 128-embedding model.
Using the 10
000 molecules test set extracted from the QM9 datable, the model was then evaluated and updated embedding vectors were extracted for all layers. No additional SchNet training was performed on these molecules. The fully-updated embeddings xfinali (as highlighted in orange in Fig. 1) were then analyzed using dimension-reduction techniques, linear discriminant analysis, and Euclidean distance measures. More specifically, the extracted embedding vectors were parsed into each element-type (e.g. all oxygen embedding vectors are isolated), and the analysis was performed on the set of all embedding vectors of a certain atom type, across all 10
000 molecules (one molecule can contribute multiple embedding vectors). More specifically, the analysis in the Results section is focused on the oxygen-type embedding vectors, but the methodology can be applied to any chosen element (see Appendix Table 5 for the LDA analysis on all element-types). The analysis for vil and xil for the intermediate layers (l ≠ final) can also be found in the Appendix section, Table 4, and produces very similar results to xfinali. The trained model and generated embedding vectors are freely available via a Dataverse Repository.1 The extracted embedding vectors in the dataset were also labelled with integers representing the various chemical environments found in QM9. For this, we manually surveyed the functional groups of QM9 and automated the labelling of them using the adjacency matrix extracted from the .mol files. It is important to note that for all datasets used in this study, the geometries are either optimized at the ground-state DFT level or experimentally determined (see Section 3.3), and therefore exclude transition-state, bond-breaking, or explicitly charged molecules (with the exception of a few ammoniums in the QM9 database).
To demonstrate generality of this method to other GNN architectures, we also extracted embedding vectors from a pretrained AIMNet ensemble model47 on the same 10
000 QM9 test molecules to compare the analysis with that of SchNet. This AIMNet ensemble model trained on ANI-1x data (includes molecular energies, atomic forces, and more) computed using ωB97x/def2-TZVPP level. AIMNet's embeddings (and GNN architecture) are built considerably different than SchNet's. First, AIMNet uses symmetry functions for input; which have both angular and radial parts.23,72 The symmetry functions are used as the features that describe the local environment around each atom. The radial and angular features are embedded via an outerproduct on an atomic feature vector space (AFV). In short, a trainable layer combines the flattened radial and angular tensors and learns a constant-sized embedding from them. This is how the embeddings in AIMNet are built. We extracted this embedding representation for the QM9 dataset using pretrained AIMNet Ensemble model.
![]() | (2) |
![]() | (3) |
000–117
500, not involved in SchNet training), the LDA model was then evaluated on the remaining 2500 molecule subset QM9 testset (the molecules indexed 117
500–120
000 in QM9).
![]() | ||
| Fig. 2 (a) t-SNE of oxygen-type embedding vectors of QM9-trained SchNet with labelling as defined in (b). | ||
Despite the caveats related to long-range distortions, one can already observe an intuitive organization between moieties in the t-SNE. For instance, all carbonyls (carboxylic acids, ketones, aldehydes, amides, and more) are grouped together in the bottom right; all primary, secondary, and tertiary alcohols in the center; and carbonates, carbamides, and carbamates on the top of the figure. It is evident that the pretrained SchNet model already contains enough information to distinguish chemical moieties.
To project a Euclidean-distance faithful representation, we move on to using linear PCA projection, which provides a minimally-distorted projection. Fig. 3 shows the 2D PCA projection on the oxygen-type embedding vectors of the QM9 test set labelled with the same chemical environments key shown in Fig. 2b.
![]() | ||
| Fig. 3 (a) PCA of Oxygen-type embedding vectors of QM9-trained SchNet labelled accorded to Fig. 2b. The focus is on straight-chain alcohols. It can be seen even in the 2D projection that distances between the alcohols' embeddings converge as the chain gets larger. The Euclidean distance does indeed decrease for each successively larger straight-chain alcohol in the full PCA space. In that space, the distances between methanol, ethanol, propanol, butanol, and pentanol are 3.98, 2.12, 1.45, and 0.98, respectively. (b) PCA of oxygen-type embedding vectors of pretrained AIMNet neural net tested on the same QM9 dataset labelled according to Fig. 2b. | ||
The projection reveals how the various chemical environments that are organized in the global embedding space are consistent with a notion of molecular similarity. For example, the projection consistently shows that carbonyls are “closer” to alcohols in a Euclidean sense than they are to hydroxylamines. This is a chemically intuitive result that cannot be faithfully reproduced in the distortions of the t-SNE projection. Moreover, Fig. 3a illustrates how embedding representations become increasingly closer to one another as the molecular environment of the associated atom becomes more similar. To illustrate this, we analyzed Euclidean-distances between the oxygen embedding vectors of several primary straight chain alcohols. As detailed in the figure, we indeed find a converging distance between the embedding vectors of the oxygens as the chain gets longer. Thus, all straight-chain or straight-chain-like alcohol groups will be grouped in the same region of embedding space within the primary alcohol cluster. Although Fig. 3a only provides a 2D projection of the principal components, the figure provides an approximate representation of how the full Euclidean distances, computed in full (D = 128) embedding space, relates the individual embedding vectors to one another.
This suggests that only a few dimensions of the full 128-dimensional embedding space are truly relevant to capture the chemical identity of the associated atom. Indeed, the PCA eigenvalue spectrum, presented in Fig. 4, reveals that only a few significant eigenvalues (∼5–6) are required to account for 75% of the variance in the data. This is a remarkably low number compared with the dimension of the original embedding space, hinting at the possibility of determining low-dimensional chemical heuristics or rules for explaining the attributes of each of these individual dimensions.
For comparison, we also extracted AIMNet's QM9 embedding vectors and analyzed them with PCA projection. The result is shown in Fig. 3b. AIMNet's embedding are more compact than SchNet, as AIMNet was trained on a more diverse dataset (ANI-1x) obtained through active learning.96 This may be an explanation to why AIMNet's representation of QM9 is more compact as it must leave space for a wider representation. However, the relative positions of the various classes is strikingly similar even though AIMNet is a different GNN architecture and involves a significantly different embedding process briefly described in Section 2.4. This points to the notion of a weak universality in GNNs, as recently introduced by Chughtai et al.,97 in which universal underlying principles are shared by different GNNs, however in slightly different ways.
The existence of a Euclidean distance measure opens the opportunity to quantify the “closeness” of atoms from different moieties. In Fig. 5a, we chose a random molecule from the QM9 dataset (prop-2-yn-1-ol) and evaluated embedding vector distances with all other molecules in the dataset. Clearly, the closest molecules to prop-2-yn-1-ol are molecules of the same class and have similar structural motifs; primary alcohols with an α-alkyne group. Fig. 5b shows a close-up of distances to all other primary alcohols. As distance increases, the similarity is diminished but in a very gradual way. First, showing linear-like moieties with α-alkynes, then gradually moving the alkyne away to further parts of the molecule as embedding distance increases. This can be confirmed until molecules that do not contain any alkyne and do not resemble the reference primary alcohol anymore.
![]() | ||
| Fig. 5 Euclidean distance analysis of chosen cases from QM9. (a) Overall distance distribution between embedding vector of the reference oxygen of prop-2-yn-1-ol to the rest of the oxygen embeddings in the dataset. The color key follows Fig. 2b. (b) Close-up of the nearest embeddings to prop-2-yn-1-ol which are also primary alcohols and additionally have similar structural motifs. (c) Same analysis on 3-oxopentanitrile and (d) on tri-ringed structure shown. | ||
The same analysis can be done on, e.g., 3-oxopentanenitrile (shown in Fig. 5c) where the presence of an alkyne is important for similarity, but one that is specifically at the α position to the ketone. Triple bonds that are right next to the ketone (i.e. not α) appear in more distant embedding vectors, after all the α-triple-bonded-ketones in the dataset. Lastly we show an example of the tri-ringed structure (Fig. 5d) which shows gradual change in the ring elements and ring structure with distance between the oxygen's embedding vectors.
We performed LDA on the oxygen-type embedding vectors to get a minimum classification error of 3 × 10−4 for all classes defined in Table 1. See Table 1 for errors on individual classes. The confusion matrix of the predicted vs. true is dominantly diagonal. The excellent performance strongly supports the existence of distinct regions in high-dimensional space that are highly associated with chemical environments in the trained embeddings. To show the significance of this, we performed the same classification task on the smooth overlap of atomic positions (SOAP)98 representations, a popular method for molecular similarity measures, on the same set of oxygen atoms, the results are shown in Table 1. We performed two separate tests (SOAP1 & SOAP2) with two different sets of hyperparameters, the former consistent with a long cut-off range like the SchNet training (12 Å, beyond the sizes of the molecules in the QM9 dataset, 6 radial functions and 6 spherical harmonics), whereas the latter imposed a more local cut-off (6 Å, and 6 radial basis functions and 4 spherical harmonics). The classification results for SOAP1 and SOAP2 are given in Table 1. Whereas SOAP1 appears to struggle with distilling the local character of functional groups, imposing locality in SOAP2 produces perfect classification results. However, as the LDA only scales linearly in the number of data points, the memory requirements are significantly lighter than SOAP, for which only a 1000 molecules could be used for this test without requiring a large amount of memory (>2 GB). Therefore it is fair to state that embedding vectors are at least on par with coordinate-based features such as SOAP at capturing chemical environments in a compact representation, the catch being that all geometric considerations have already been encoded during the SchNet pretraining stage.
As a first test, we related the pKa values of certain atoms with their embedding vectors. More precisely, we addressed the assumption that moieties for which the embedding vectors are close in Euclidean distance should also have comparable pKa values. We employed a portion of the IUPAC pKa database,99 consisting of 600 clean and accurate pKa data points, and plotted the difference in pKa values between all possible pairs vs. the embedding distance between the pairs, see Fig. 6b. The triangular shape of the distribution confirms that chemical environments that are close in embedding space necessarily have pKa values that are also close. In Fig. 6a, we narrow in on the distribution by selecting a random carbamide oxygen found on 2H-1,2,4-triazine-3,5-dione from the database and plotting only relative distances with respect to this oxygen. From this figure, we can see how certain groups such as carbamides are closer in pKa to amides than they are to aldehydes and that they are significantly different from hydroxylamines. This is basic chemistry knowledge that a trained chemist may have, however the associated embedding space stemming from a learned representation allows for qualitative and quantitative organization of this chemical intuition.
![]() | ||
| Fig. 6 (a) pKa difference versus embedding vector distance from a reference carbamide oxygen found on 2H-1,2,4-triazine-3,5-dione to all other oxygens in the IUPAC pKa database. (b) pKa difference vs. embedding distance between all pairs of oxygen atoms in the IUPAC pKa database. The black labels are oxygens of the same class, the grey labels are oxygens of another class. (c) Truth versus linear regression predictions of oxygen pKa values of protic sites from 128 dimensional embedding vectors, the R2 value of the fit is 0.91. (d) Predicted vs. experimental pKa values for molecules affected by inductive and resonance effects, experimental data taken from ref. 101, pKa data taken from an IUPAC high confidence pKa molecular database.102 All oxygen labels are taken from Fig. 2b. | ||
A second test has been performed with 13C nuclear magnetic resonance (NMR) data, extracted from the NMRShiftDB2 model on a selection of 200 QM9 molecules.100 This model uses the Hierarchically Ordered Spherical Environment (HOSE) molecular descriptor to describe atomic neighborhoods. NMR shifts are particularly interesting to consider for our purpose as they are considered a sensitive fingerprint of atomic environments. Again, the differences in NMR shift between all possible carbon pairs have been plotted against the associated C-embedding distance between the pairs in Fig. 7b. Fig. 7a shows a selection of Fig. 7b in which only relative distances with respect to an ethane carbon are considered. The NMR distance can be taken to be a proxy to molecular similarity and we find that groups that are close in embedding space have similar NMR shifts.
![]() | ||
| Fig. 7 (a) 13C-NMR shift difference vs. embedding distance for carbon atoms with respect to a reference ethane carbon in the NMR dataset. (b) 13C-NMR shift difference vs. embedding distance between all pairs. Black labels are carbons of the same class, grey labels are carbons of another class. (c) Truth values versus linear regression predictions for 13C-NMR shifts from 128-dimensional embedding vectors, the R2 value of the fit is 0.95. (d) Carbon moiety labels. NMR values were found using NMRDShiftDB2 model100 applied on first 200 QM9 molecules. | ||
Finally, we explore the potential to use embedding vectors for transfer learning. We considered a simple linear regression model to predict pKa values and 13C-NMR shifts from pretrained embedding vectors of SchNet. As can be anticipated from our discussion, a linear regression model is relatively successful in predicting both observables from embedding space with a modest accuracy, giving a testing error of 1.48 and 23.3 ppm for pKa and 13C-NMR respectively (see Fig. 6c and 7c). It can be seen from the errors in Table 4 and the determination coefficient of the linear fit (R2 = 0.91), that some degree of induction and resonance effects on the pKa can be captured by the embeddings. Given the small sizes of the datasets used, these results provide a promising starting point for understanding transfer learning in GNNs, which we aim to address in future studies and applications.
An important hyperparameter in machine learning applications is data volume. More often than not, the amount of training data from computational or experimental studies is limited, potentially hampering the interpretability of the embedding vector representation. To this end, we investigated the effect of dataset size on the interpretability of the embedding vectors. We trained the same SchNet architecture as described in Section 2.3, however on different datasets with decreasing sizes (see Table 2). For the training sets, random molecules were chosen from the full QM9 data, with 10% additional data points for validation purposes. As is expected, we notice a sharp increase in validation error (MAE) when reducing the dataset from 100
000 molecules to 1000. In order to compare the classification capabilities, we extracted the embedding vectors of the same 10
000 molecules employed in Section 3.1 and ran the LDA classification task on them. We observed a similar drop in accuracy moving into smaller training data as the MAE, however plateauing towards a 97% accuracy on the functional group classification. These numbers suggest that the model still succeeds in categorizing the training data into chemical moieties for smaller training data sets, however lacked the capability to reduce the uncertainty and refine the feature space for regression tasks. To further quantify this observation, we computed the ratio of the average radial width of all functional group clusters with respect to the average distance between the individual clusters, finding that the average size of the clusters grows relative to the average distance between the clusters (see Table 2).
| Size | MAE (eV) | LDA mean error | ρ |
|---|---|---|---|
| 50 | 1.23 | 0.025 | 1.02 |
| 100 | 0.88 | 0.029 | 1.08 |
| 500 | 0.64 | 0.024 | 0.95 |
| 1000 | 0.78 | 0.024 | 0.77 |
100 000 |
0.02 | 3.0 × 10−4 | 0.45 |
One important open question leading from this study is to further pinpoint the algebraic properties and geometry of the embedding space, as well as identify the chemical role of each of the significant dimensions of the PCA. Another question that has been left untouched is the role of the underlying computational data. The SchNet GNN has been trained on electronic energies that have been computed at the density functional theory level. How much of the fundamental quantum mechanical ingredients that go in the computed energy data, in casu the densities or the B3LYP functional in the DFT computation, have been implicitly identified by the GNN and absorbed into the embedding vector representation. Can similar conclusions be drawn when pretrained on different chemical training data, such as enthalpies or dipole moments? These questions will be addressed in future studies.
000–110
0000 in QM9) is 0.023 eV with standard deviation of 0.004 eV. Fig. 8 shows the training of model 1 (MAE = 0.020 eV).
| Trial | MAE (eV) |
|---|---|
| 1 | 0.020 |
| 2 | 0.017 |
| 3 | 0.023 |
| 4 | 0.029 |
| 5 | 0.019 |
| 6 | 0.025 |
| 7 | 0.023 |
| 8 | 0.025 |
| 9 | 0.021 |
| 10 | 0.027 |
![]() | ||
| Fig. 8 Training loss and validation loss of SchNet model with 128 atom basis, 128 gaussian filters, 50 gaussians, and a cutoff of 50 Å. | ||
in a similar fashion. Table 4 below shows the mean absolute error of interaction updates on the embedding vector, the mean Euclidean distance between embedding vectors of different layers, the mean Euclidean distance between embedding of different classes, and lastly, the mean LDA accuracy on chemical classes using the intermediate layers.
| Layer | 〈|vl|〉 | 〈Dist(xl − xl−1)〉 | 〈Dist(xfgl − xfgl)〉 | LDA mean error |
|---|---|---|---|---|
| x 0 | 0.25 | — | 0.00 | 9 × 10−1 |
| x 1 | 0.30 | 3.56 | 2.64 | 2 × 10−3 |
| x 2 | 0.28 | 4.29 | 3.90 | 5 × 10−4 |
| x 3 | 0.26 | 3.88 | 5.13 | 8 × 10−4 |
| x 4 | 0.28 | 3.64 | 5.74 | 2 × 10−3 |
| x 5 | 0.25 | 4.02 | 6.32 | 5 × 10−4 |
The interesting thing to note about the table is that while the interaction updates remain relatively the same over the layers (and so does the average distance between embeddings of different layers), the average distance between different moieties continues to increase over the interaction layers (Fig. 9).
![]() | ||
| Fig. 9 PCA plots of oxygen-type embeddings of each successive interaction layer l ≠ final (see Fig. 3 for the final layer l = final), (a) layer 0, containing the initialized embedding which is the same for all oxygens, (b–f) layers 1 to 5. | ||
| Element | LDA mean error |
|---|---|
| H | 5 × 10−3 |
| C | <1 × 10−5 |
| N | <1 × 10−4 |
| O | 3 × 10−4 |
| F | <1 × 10−3 |
| This journal is © The Royal Society of Chemistry 2024 |