Will
Gerrard
a,
Lars A.
Bratholm
a,
Martin J.
Packer
b,
Adrian J.
Mulholland
a,
David R.
Glowacki
*a and
Craig P.
Butts
*a
aUniversity of Bristol, Bristol, UK. E-mail: craig.butts@bristol.ac.uk; glowacki@bristol.ac.uk
bChemistry, R&D Oncology, AstraZeneca, Cambridge CB4 0QA, UK
First published on 20th November 2019
The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar information Of Nuclei) machine learning system provides an efficient and accurate method for the prediction of NMR parameters from 3-dimensional molecular structures. Here we demonstrate that machine learning predictions of NMR parameters, trained on quantum chemical computed values, can be as accurate as, but computationally much more efficient (tens of milliseconds per molecular structure) than, quantum chemical calculations (hours/days per molecular structure) starting from the same 3-dimensional structure. Training the machine learning system on quantum chemical predictions, rather than experimental data, circumvents the need for the existence of large, structurally diverse, error-free experimental databases and makes IMPRESSION applicable to solving 3-dimensional problems such as molecular conformation and stereoisomerism.
Fast empirical predictions of chemical shifts for 2-dimensional chemical structures have been used for decades, with the additivity rules exemplified by Pretsch1 and HOSE-code2 variants forming the basis of many analyses. However their applicability is limited by being based on 2-dimensional structures and cannot readily deal with 3-dimensional conformational or stereochemical analysis. Some modifications to treating 3-dimensional structures have been made by e.g. flat-but-stereochemically-aware HOSE codes3 or single conformer models of experimental systems4–6 but the improvements in 3-dimensional accuracy are limited as conformation and flexibility must necessarily be accounted for completely to achieve maximum accuracy. Multiple-bond 1H–1H coupling constants are more directly linked to 3-dimensional structure, however generically applicable Karplus-style empirical relationships, such as the widely used equation reported by Haasnoot et al.,7 suffer from lower accuracy when confronted with complex chemical functionality while equations designed for specific sub-structures, e.g. carbohydrates,8 are not applicable to the whole of chemical space. Finally, many NMR parameters, for example 1-bond 1H–13C scalar coupling constants, 1JCH, which are sensitive to both chemical connectivity and 3-dimensional structure are rarely used in isotropic studies precisely because there are no general fast predictive methods for 1JCH.
For all of these reasons, the accurate prediction of NMR parameters in modern 3-dimensional structure determinations relies increasingly on the use of quantum chemical calculations, typically based on Density Functional Theory (DFT).9–12 Optimal DFT methods can be accurate to within 1–2%, e.g.1JCH predicted with <4 Hz accuracy to experiment13–15 (on values that range from roughly 100–250 Hz) and <0.2/<2 ppm16,17 (on ranges of ∼10/∼200 ppm) for 1H and 13C chemical shifts respectively. The substantial downside of DFT is the significant computation time required when using methods that can provide sufficient accuracy in NMR predictions. Accurate DFT-based predictions of chemical shift and scalar couplings typically take hours to days of CPU time for a single rigid molecule of even relatively low (∼500) molecular mass. The largest proportion of this CPU time is occupied by the NMR computations, especially when computing scalar coupling constants. Naturally, in cases where multiple conformers or isomers must be considered (and thus predictions for multiple structures are required) this becomes days to months of computation for a single study.
Machine learning methods offer a solution to the time-demands of DFT NMR predictions, achieving them in seconds rather than hours or days. Such machines, trained on experimental data, for 1H and 13C chemical shifts based on 2-dimensional structures are well-established.18–21 These systems are trained on hundreds of thousands of validated experimental chemical shifts arising from tens of thousands of chemical structures. Training such machines for prediction of scalar couplings is more challenging because accurate and validated experimental databases do not exist on this scale (e.g.1JCH values) and they can be critically dependent on 3-dimensional structure (e.g.3JHH/CH values). On the other hand, a machine could be trained using large datasets of DFT-computed NMR parameters, such as chemical shifts and scalar couplings, derived from 3-dimensional structures. Such large DFT-derived datasets can be generated systematically with minimal effort and are not limited to offering accuracy only for structures that are similar to previously experimentally determined molecules. With a large enough training database, such a machine would be expected to approach the accuracy of DFT calculation of NMR parameters for 3-dimensional structure analysis, but with several orders of magnitude reduction in time for the NMR predictions. This approach was recently reported for solid-state chemical shift predictions by Paruzzo et al. (SHIFTML,22) where the computational demand of DFT calculations on extended lattices are high and comparable to those needed for multi-conformer calculations on solution-state systems.
In this paper we describe the development of our first generation of solution-state NMR prediction machines – IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar information Of Nuclei), trained on DFT-predicted values rather than relying on scarce or error-prone experimental data. We have chosen to demonstrate the versatility of machine learning of NMR parameters using both 1H and 13C chemical shifts and 1JCH couplings. We include scalar couplings in addition to chemical shift, as the former are less amenable to machine learning based on experimental data, and 1JCH precisely because it has been demonstrated to be valuable for elucidating both 2-dimensional connectivity and 3-dimensional structure5,23 but requires DFT to predict/interpret for most cases. Providing a fast and accurate predictive tool for 1JCH will be especially valuable and could encourage wider acceptance of this and other accessible NMR parameters in structure determinations. We demonstrate that IMPRESSION can predict all these NMR parameters for organic molecules, including 3-dimensional discrimination, with up to DFT accuracy but several orders of magnitude faster and can be applied to experimental data with comparable outcomes to DFT.
IMPRESSION uses a Kernel Ridge Regression37 (KRR) framework to learn the 1JCH scalar couplings and 13C and 1H chemical shifts of molecular structures. KRR was successfully used by Paruzzo et al. to develop SHIFTML.22 Neural networks have also been used to predict chemical shifts in small molecules from experimental data,6,38,39 however we found no clear advantages in using feed forward neural networks in this work as the accuracy was comparable to KRR for the datasets used, with the kernel methods being much faster to train with the given training set size. In order to encode the similarity between chemical environments of each molecular structure we tested three approaches previously described – Coulomb matrices,40 aSLATM,41 and FCHL42 all available from the QML python package.43 We refer the reader to Section S1.1 in the ESI† and the respective papers describing each representation for more details. All of these kernel similarity measures compare atomic environments, so in the case of 1JCH, we used the product of the separately calculated kernel similarities for the 1H and 13C nuclei as this performed better than either atomic environment alone. The KRR procedure is further described in the ESI (Section S1.1†).
Both aSLATM and FCHL were found to outperform Coulomb matrices (Fig. 1), which is expected as Coulomb matrices only include 2-body interactions, while aSLATM and FCHL both include three-body interactions as well. As FCHL provided the best performance for all three parameters and was substantially more computationally efficient than aSLATM, it was used in the final development of the full IMPRESSION machine.
Fig. 2 IMPRESSION machine learning predictions compared to DFT computed NMR parameters for δ1H (left), δ13C (centre) and 1JCH couplings (right) without variance filtering. |
Notably however, a very small number of predictions for the test set were much less reliable. For example, 186 (∼2.3%) of the δ1H values had errors >1 ppm between IMPRESSION and DFT, with a maximum error (MaxE) of 11.22 ppm. Similar outcomes were observed for the other parameters with 187 δ13C values (∼2.5%) with errors >10 ppm (MaxE = 63.33 ppm) and 14 (∼0.2%) of the 7788 predicted 1JCH values having errors of >10 Hz (MaxE = 24.63 Hz). Diagrams of the structures containing the five most significant outliers for each NMR parameter are shown in Fig. S19–S21 in the ESI.† Examination of the chemical environments of the most significant outliers show that they arise from unusual functional groups such as those containing sp-hybridised atoms, or unusual 3-dimensional environments such as atoms near pi-systems of aromatic rings. These outliers suggest that, as desired, the machine learning system is indeed very sensitive to the 3-dimensional relationships of the atoms in the structure. However this same sensitivity also makes IMPRESSION less accurate for chemical environments which are not very similar to environments across the 882 molecular structures used to train IMPRESSION.
Crucially, we are able to a priori identify poorly described environments using the same variance-based approach used to generate the training set. By assessing the variance in the prediction of a given NMR parameter across a 5-fold cross-validation, we can quantify our confidence in each individual prediction since environments which are poorly described by the chemical structures in the training set will have high variance in this cross-validation. There is indeed a clear correlation of variance against prediction error for the independent test set (Fig. 3). The tables in Fig. 3 suggest that the bulk of the environments are predicted very accurately, and that the high variance environments are the dominant source of the large outliers.
In principle, removing IMPRESSION-predicted values which show high variances in cross-validation should provide a “pre-prediction variance filter” that will substantially improve the quality of, and thus the confidence in, IMPRESSION predictions. Selecting an appropriate variance cut-off for each NMR parameter is then simply a balance between desired prediction quality and the number of predictions which will be excluded by that cut-off. Reports of DFT accuracy with respect to experiment for 1H and 13C chemical shift predictions vary significantly, but typically in the range of 0.2–0.4 ppm/2–4 ppm, with the best reported accuracies down to <0.2/<2 ppm (ref. 16 and 17) in optimal cases. Similarly, Buevich et al. recently highlighted that current best-in-class DFT methods predict 1JCH experimental values with accuracies of 2–4 Hz, when presenting an optimised workflow for calculating 1JCH values which achieved an RMSE of 1.61 Hz.
We therefore identified variance cut-offs for IMPRESSION predictions that provide a good compromise between accuracy and excluded values for the test set, which were found to be 1 Hz for 1JCH, 0.1 ppm for δ1H and 5 ppm for δ13C. Applying these pre-variance filter values improves the fits between IMPRESSION and DFT to levels that are comparable with literature reports for MAE/RMSE of DFT vs. experiment (MaxE is rarely reported for large experimental validations, but the reader can find comparators from our experimental validations described below in Section 2.3). For δ1H the 0.1 ppm filter excludes 5 environments (<0.1%) and improves the fit to MAE = 0.23 ppm, RMSE = 0.32 ppm; MaxE = 2.16 ppm. For δ13C a 5 ppm filter provided a good fit (MAE = 2.17 ppm; RMSE = 3.25 ppm; MaxE = 37.87 ppm) while excluding 538 (∼7.2%) of the environments. For 1JCH a 1 Hz filter improved the fit to MAE = 0.81 Hz, RMSE = 1.17 Hz; MaxE = 13.37 Hz while discarding only 207 (<3%) of the environments.
As highlighted by the learning curves, further improvement to the machine predictions of DFT NMR results can be made by increasing the size of the DFT-derived training dataset by around an order of magnitude. However at this stage variance-filtered IMPRESSION compares well enough with respect to DFT that it was taken forward. It should also be noted at this point that IMPRESSION only accelerates NMR prediction, it does not accelerate the 3D structure generation by DFT (which can still take hours/days). This overall time, i.e. 3D structure generation + NMR prediction, could be reduced further by using 3D structures derived from molecular mechanics rather than DFT. While not the key focus here, the use of molecular mechanics structures as inputs to a re-trained IMPRESSION machine was explored. While practical, this resulted in a ∼30–50% increase in the average prediction errors for δ1H and 1JCH presumably arising from a mismatch between the detail of molecular mechanics geometries and those used to calculate the DFT NMR parameters (see Section S2 in the ESI for details†). Interestingly, δ13C predictions were relatively insensitive to this change, perhaps reflecting better description of carbon environments by molecular mechanics forcefields. This is an exciting avenue to explore further, but to focus the discussion here on the ability of IMPRESSION to reproduce DFT NMR predictions, the subsequent experimental comparisons are based on the IMPRESSION machine trained on the same DFT-geometries used for the DFT NMR predictions.
IMPRESSION took only 60 CPU seconds to predict the full set of 612 1JCH values but with some substantial outliers (MAE = 4.52 Hz; RMSE = 10.49 Hz; MaxE = 120.3 Hz). Applying the 1 Hz variance filter gave: MAE = 2.01 Hz, RMSE = 2.69 Hz, MaxE = 10.01 Hz (removing 143 values) which was essentially identical accuracy to that obtained from the DFT method for these same filtered environments: MAE = 1.83 Hz, RMSD = 2.60 Hz, MaxE = 14.63 Hz. An overlay of the error distributions for DFT and the 1 Hz variance-filtered IMPRESSION vs. the experimental values (Fig. 4) demonstrates the comparability between machine learning and DFT for 1JCH predictions. This represents quite excellent performance of the machine for reproducing experimental data in just a few seconds, with quality for the majority of environments as good as the best MAEs (1.5–4 Hz) described by Buevich et al. as typical for DFT methods, with <25% of the values being tagged as unreliable by the variance filter. Of course, if a slight loss in prediction quality is acceptable for a given study, then more predicted values could be retained by using a slightly looser variance-filter.
Similar accuracy could be obtained for IMPRESSION predictions of 734 1H chemical shifts for 36 structures reported by Smith and Goodman44 in their DP4 dataset (again, single conformers were used for both DFT and IMPRESSION predictions). IMPRESSION predictions gave MAE = 0.29 ppm, RMSD = 0.38 ppm, MaxE = 1.59 ppm with a variance filter of 0.1 ppm but in this case no environments were removed with the variance filter and provided essentially the same outcomes as the ωb97xd/6-311g(d,p) DFT method on the same single conformer structures (MAE = 0.28 ppm, RMSE 0.37 ppm, MaxE 1.62 ppm, see Fig. 4 for an overlay of errors). The IMPRESSION predictions for δ13C using the 5 ppm variance filter identified during training and testing of the machine compared slightly less well to the DP4 experimental dataset (MAE = 3.44 ppm, RMSE = 4.30 ppm, MaxE = 13.06 ppm, removing 24 environments) than DFT (MAE = 2.78 ppm, RMSE = 3.48 ppm, MaxE = 14.33 ppm). A tighter 1 ppm variance filter for the δ13C predictions was examined, but gave only a slight improvement in prediction quality MAE = 3.20 ppm, RMSE = 4.00 ppm, MaxE = 13.03 ppm while removing 120 out of the 458 carbon environments.
At every stage in this study we found that the IMPRESSION δ13C predictions have a wider distribution of errors than the other NMR parameters when compared to the quality of the DFT from which they are trained. This is unsurprising given that the structural environments of 13C nuclei in molecules are inherently more complex than 1H given the higher valency and thus more complex bonding environments and geometries, so in future development, larger training datasets focussed on optimising δ13C predictions will be beneficial.
Combining IMPRESSION predictions for 1JCH with 1H and 13C chemical shifts also provides correct identification of the naturally occurring structure, but IMPRESSION and DFT now both see structure 2 as the next best candidate (Fig. 5, right). This is due to the experimental δ1H values having better agreement with the predictions for diastereomer 2 than 1a for DFT and also IMPRESSION. While this is obviously problematic for structure elucidation purposes, it clearly arises because of a deficiency in the DFT prediction of 1H chemical shifts, which is then faithfully reproduced by IMPRESSION. For the individual MAE values across all three parameters see ESI Section S5.†
Similarly, we found that IMPRESSION predictions can be used to correctly assign the diastereotopic protons in strychnine. IMPRESSION and DFT predictions of 1JCH for the diastereotopic protons in strychnine were consistently in line with each other (details can be found in Section S4 of the ESI†) and for the three methylene groups where there is a significant difference (≫2 Hz) in experimental 1JCH values both methods correctly assign these protons (Fig. S16†).
Finally, we validated IMPRESSION chemical shift predictions for natural product structures. We conducted DFT and IMPRESSION predictions on structures from a recent report which suggested structural reassignments for oxirane-containing natural products on the basis of DU8+ DFT calculations.46 To avoid complications with incorrect DFT prediction of conformer energies leading to poor population averaging of NMR parameters from the constituent conformers, we limited the validation to ‘rigid’ structures in the report that contained only one dominant conformer after conformational searching. Pleasingly, while our results did not always agree with the DU8+ analysis, IMPRESSION was just as effective as our underlying ωb97xd/6-311g(d,p) DFT method in discriminating each original and revised chemical structure (see Section S3 in the ESI for more details†). Once again this confirms that IMPRESSION is capable of making predictions that are of comparable quality to it's underlying DFT method ωb97xd/6-311g(d,p), and thus any improvements in the DFT method used to train IMPRESSION will be subsequently expressed in the quality of IMPRESSION predictions.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc03854j |
This journal is © The Royal Society of Chemistry 2020 |