Open Access Article
Ruslan Kotlyarov
*,
Alexander Howarth and
Jonathan M. Goodman
*
Yusuf Hamied Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, UK. E-mail: jmg11@cam.ac.uk
First published on 5th March 2026
The evaluation and assignment of candidate structures to NMR spectra can be facilitated by the DP4 method, which assumes that one of the candidate structures is correct, and the DP5 method, which calculates the probability of a correct assignment for each candidate individually. Both of these methods require DFT calculations and thus a significant amount of computer resources. In this paper we present DP5q, a new version of DP5, which uses a graph convolutional neural network and quantile regression to replace the DFT-based algorithm. This dramatically increases the speed of the calculation at the cost of a modest decrease in accuracy. We demonstrate the efficacy of this rapid calculation both on a test set of thousands of molecules and also on cases selected for the difficulty of assigning the structure.
Structure confirmation is relatively easy to accomplish when an experimental spectrum of a pure substance has already been recorded under the same conditions, such as temperature, solvent, and acidity. However, when a new compound is made, there are no reference spectra to compare.
The assignment of candidate structures to NMR spectra can be a challenging task, particularly when multiple structures have similar spectra. This has led to many examples of misassignment that have later been corrected.1–23 This issue is becoming more acute with increased automation, miniaturisation, and parallelisation of chemical synthesis. The amount of data generated within a single experimental campaign grows beyond individual chemists' capacity to process it. A fast, automated way to confirm a structure from a spectrum would address this growing challenge.
Our previous studies in the area of computer-aided structure elucidation led to CP3 (ref. 24) and DP4 (ref. 25) methods, which select the structure from the best-matching structure from the user-provided list, DP4-AI,26–28 which automates the process, and, most recently, DP5 (ref. 29), which determines the confidence with which a single structure may be assigned to a single spectrum. All of these methods require DFT analyses and the resources needed for these calculations are the rate determining steps for the process. In this work, we present a new version of DP5 analysis, DP5q, which obviates the need for DFT calculations while maintaining the accuracy to a high level. This dramatically improves the throughput for the DP5 process.
Estimation of uncertainty for chemical problems is well-precedented and is used in several models that can predict NMR spectra.34,35 IMPRESSION,34 which was developed by Gerrard et al., featurises molecular geometries using FCHL representations36 and uses kernel ridge regression to predict both the shift value and its variance. Work by Jonas and Kuhn,35 on the other hand, used a graph-convolutional neural net to generate new learned features and predicted shift and its uncertainty using two separate heads. Both approaches assume normal distributions for their predictions.
![]() | ||
| Fig. 1 Quantile regression as an uncertainty estimation method is easy to incorporate into existing deep learning model architectures for chemical shift prediction. | ||
The approach is easy to integrate into the architecture of CASCADE,38 a graph neural network: instead of a single output, the final layer of the model can be reconfigured to produce multiple outputs corresponding to each specified quantile. For example, we can configure and train the model to predict the median, the first quartile, and the seventh decile of the probability distribution for a chemical shift. Realising that each additional quantile requires significantly fewer trainable parameters compared with the total number of the parameters within the model, we have evaluated chemical shift predictions for all percentiles from the 1st to the 99th, as the 0th and the 100th percentiles would simply correspond to minimum and maximum shifts observed in the training dataset. This allows us to balance effective interpolation of the cumulative distribution function for accurate analysis and the number of trainable parameters in the model.
DP5 analysis29 goes further and constructs a dataset with tens of thousands of FCHL representations36 of atomic environments and their associated DFT errors. For an observed atomic environment, a probability distribution is constructed based on similar atomic environments to generate the probability of observing such an error. These atomic probabilities are then combined into the absolute molecular probability of correct structure assignment.
Our new neural net-based approach removes the need for DFT calculations altogether. Instead, the model constructs the probability distribution for chemical shifts directly. The speed of quantile regression-based DP5q analysis no longer depends on the size of the reference dataset, which allows us to arbitrarily scale up the size of the training data without increasing the computational cost.
(eqn (1)) for each quantile prediction ŷτ using quantiles specified at the point of model creation (eqn (2)). In the limit of a low δ parameter (here, set to 1 × 10−4 for good numerical convergence during training), modified Huber loss reduces to mean absolute error and our result for a single quantile τ becomes indistinguishable from pinball loss.
![]() | (1) |
![]() | (2) |
An example of the proposed loss function for a single quantile τ is shown in Fig. 2. For all quantiles, the correct prediction has no associated loss. However, the parameter τ is used to weight the loss for over- and under-predictions. For example, if τ is set to 0.75, the loss for under-prediction is three times larger than the loss for over-prediction. Conversely, if τ is set to 0.5, the penalty for over- and under-prediction is equal and guides the model to predict the median of the distribution.
![]() | (3) |
By minimising the sum of losses
for each quantile τ (eqn (3)), we hope that predictions for all quantiles will both accurately reflect the probability distribution and approach the true value y, thus preventing the model from overestimating its uncertainty. In this study, we have used n = 99 quantiles, corresponding to percentiles from 1 to 99, i.e., τ = (0.01, 0.02, …, 0.99).
We can enforce the non-decreasing character by adding the second loss term
(eqn (4)), which penalises higher quantiles having lower values, i.e., preserves ranking order and monotonicity. With a larger number of prediction quantiles, it should be easier to maintain the desirable effects of the proposed loss function. Here, ε is a small positive number set to 1 × 10−6 to ensure a positive difference between two consecutive quantiles.
![]() | (4) |
We, therefore, adapted the sum of the two components as the final loss function for the model (eqn (5)).
![]() | (5) |
For each curated molecule, a 3D conformer was generated using ETKDG, as implemented in RDKit.41 The structures had their geometries optimised using the MMFF force field. Shielding tensors were calculated using DFT (mPW1PW91/6-311g(d)) in line with our previous studies28 without further geometry optimisation and then converted into chemical shifts using reference shielding tensor values for tetramethylsilane. After the calculations, any molecule with a large difference between predicted and observed shifts (50 ppm) was deemed to be an erroneous interpretation or assignment of the spectrum and discarded, leaving 22
349 records for 22
248 unique molecules and their associated experimental chemical shifts.
000 training steps using the Adam optimiser. Early stopping after 10 epochs without improvement in validation loss was used to prevent overfitting.
![]() | (6) |
![]() | ||
| Fig. 4 Calculation of the error score for the subsequent DP5 analysis using the experimental shift value and the predicted probability distribution. | ||
The DP5q score for atom i is defined in eqn (7) (blue area in Fig. 4) and the molecular DP5 score for n atoms would be their geometric average (eqn (8)). A high DP5q score therefore means that the observed shift is consistent with the chemical environment in the molecule.
| DP5qi = 1 − Pi | (7) |
![]() | (8) |
In combinatorial studies,29 we obtain ‘incorrect’ data without generating new molecules by taking advantage of existing databases. We notice that the database is a set of molecules with interpreted NMR spectra. Since there is a one-to-one match between a molecule and a spectrum and vice versa, all molecules but one in a data set will be an incorrect interpretation of a given spectrum. Therefore, using a dataset of size N provides us with N ‘correct’ spectrum–structure pairs and at most N2 − N ‘incorrect’ spectrum–structure pairs. For each pair, we compute mean absolute error and the DP5q score using eqn (8).
Within each group, we then assigned spectra to right and wrong structures and checked if the scores for correct structures are visibly different from the scores for incorrect structures. The utility of the DP5q score is to discriminate between right and wrong proposals.
We have taken the test set of 5000 molecules the NMR predicting model (Exp5k dataset from ref. 38) has not seen and assigned a spectrum to all of the appropriate structure-spectrum pairs using the algorithm developed by Lewis et al.44 Here, we have obtained 5000 correct assignments and 5614 incorrect assignments where the incorrect proposal had the same molecular formula.
Their distributions of mean absolute errors and DP5q scores are shown in Fig. 6. While the distributions of mean absolute errors for correct and incorrect proposals overlap significantly, the distributions of DP5q scores are much more distinct, with scores for incorrect proposals concentrated around zero. This demonstrates that the DP5q score is a useful way to select correct proposals.
Two summary metrics can be combined together to assess confidence in the structure interpretation (Fig. 7). We group the neighbouring data points into bins and calculate the ratio of correct interpretations to the total number of interpretations within each bin. This ratio may be interpreted as the probability of a spectrum being interpreted correctly given the combination of mean absolute error and DP5 score.
Here, we clearly observe a steep decline of the DP5q score with increasing mean absolute error. This is reasonable to expect as correct interpretations of the spectra must match the data closely. We then see that beyond a mean absolute error of 4 ppm, the DP5 score almost does not vary with increasing error. This is expected, as the structures with high errors are near certain to be incorrect.
The graph also highlights a clear separation between correct and incorrect proposals. Correct proposals tend to cluster at higher DP5q scores and lower mean absolute errors, while incorrect proposals are more dispersed and skewed towards lower DP5 scores. This separation demonstrates the robustness of the DP5 metric in distinguishing correct structures from incorrect ones.
The combination of two metrics also helps us isolate high-confidence proposals with greater care. For example, proposals with low DP5 scores combined with low mean absolute errors are considered more likely than proposals with the same low DP5 scores combined with high mean absolute errors.
Additionally, the steep decline in DP5 scores for small increases in mean absolute error indicates that the method is highly sensitive to small deviations in predicted spectra. This sensitivity is advantageous for identifying subtle differences between closely related structures, making the DP5 approach particularly useful for challenging cases of structure elucidation.
Here, a structure for a molecule was proposed based on multiple methods, including several NMR experiments, mass, UV-vis spectroscopy, and infrared spectroscopy. However, subsequent attempts to re-create the proposed structure did not succeed, with recorded spectra not matching that of the authentic sample. New structures are then proposed and confirmed by de novo chemical synthesis or by comparison with spectra of known natural products.
We took 24 case studies and conducted DP5q analysis using single conformers, optimised at the force-field level. For 22 cases, the score for the correct proposal was higher than the score for the incorrect proposals (Fig. 9). This result is comparable to that of 23 examples where the mean absolute error of a revised structure is lower than the mean absolute error of the original structure. We note that this is a very challenging test set: someone has already analysed the spectra in detail and published a conclusion that was later found to be incorrect. For more common cases, by distinguishing automatically between large numbers of diverse structures the results are likely to be even more reliable.
![]() | ||
| Fig. 9 DFT-free DP5q for structure reassignment case studies. In 22 cases out of 24, the correct structure had a higher score than the initial proposal. | ||
This is a key result: we achieve the DFT performance at the neural network cost. What previously took CPU months is now achievable in seconds. The only example that the model could not tackle was S13. Therefore, it merits a closer look (Fig. 10).
![]() | ||
| Fig. 10 For S13, the initial and revised proposals have drastically different molecular weights and formulae. | ||
Here, we see that the original proposed structure was a symmetric binaphthyl, while the revised structure lacks any symmetry elements. Furthermore, the structures have different numbers of distinct NMR environments, 11 and 13 respectively. Both proposals, however, have four non-labile proton environments and four non-quaternary carbon environments. If the spectra have a low signal-to-noise ratio, the other, less intense, signals may be indistinguishable from the baseline. This shows that our approach depends on correct interpretation of the NMR spectra: if signals are lost into noise, an accurate analysis may not be possible. The molecular masses of the two proposals are quite different as well. This highlights the importance of using several structural determination techniques at once, so that spurious proposals based on a single modality are rejected by an orthogonal method and good proposals are corroborated. Addressing this issue is a topic of on-going research.
Using the ETKDGv3 algorithm within RDKit as a quick conformational search tool improves the accuracy of our analysis, with higher scores predicted for the correct compound in all 24 cases (Fig. 11). On the other hand, a simple comparison of mean absolute errors returned the correct result 23 times out of 24. This highlights the importance of the representative conformer ensemble in accurate prediction of NMR shifts, even when DFT spectrum calculation has been eliminated from the workflow.
The combination of DP5q scores and mean absolute errors (MAE) of the carbon spectra (Fig. 12) shows that they are linked to some extent, as higher scores tend to correspond to lower errors. The distribution of correct and incorrect proposals resembles the one observed in the combinatorial studies (Fig. 7). There is a small overlap between DP5q scores of correct and incorrect proposals, but the correct proposals still tend to have a lower error and higher DP5q score. This is a good result and shows that the DP5q score is an effective method even for structurally complex molecules, such as natural products.
![]() | ||
| Fig. 13 Molecules used in the DP4-AI test set. AT3, TS3A, TS4 and NL1A had no corresponding carbon spectra and were, therefore, excluded from analysis. | ||
We consider three scenarios with varying degrees of sophistication. In the first scenario, we use a single conformer, optimised at the force-field level. This is the fastest way to run DP5 analysis. In the second scenario, we use multiple conformers generated using low-mode conformational search, with geometries optimised at the DFT-level and energies recalculated using single-point DFT calculations. This is the most sophisticated way to run DP5q analysis, and its previous use for stereochemistry determination28 has shown it to be the most accurate. In the third scenario, we use multiple conformers generated using the ETKDGv3 algorithm41 as provided with RDKit, further optimised at the force-field level of theory.
To visualise the resulting DP5q scores, we divide them by the number of possible diastereomers. For example, if the molecule has 8 diastereomers, the plotted DP5q scores will be divided by 8. This helps us both indicate the relative magnitudes and absolute values for DP5 probabilities.
Here, stereochemistry was determined correctly for 18 examples out of 42 (Fig. 15), with a 0.13% chance of obtaining a result this good at random. This is an encouraging improvement, which is consistent with the trend where more sophisticated treatment improves the overall accuracy.
This approach may take a few minutes instead of a fraction of a second and helps us determine correct stereochemistry in 25 out of 42 cases, based on 1D13C NMR data only, using no 1H NMR or multidimensional spectroscopy (Fig. 16). This result has a 2.9 × 10−6% probability of occurring by chance. The method meets and exceeds purpose-developed DP4 analysis at a similar level of theory,29 with 19 out of 42 cases identified correctly, the DFT-based DP5 approach, where the correct diastereomer has been found in 16 cases out of 42,29 and the CASCADE mean absolute error comparison, where the lowest-error diastereomer was the correct one 21 times out of 42. Therefore, we recommend this for the comparison of stereoisomers and other very similar structures.
DP5q analysis performs admirably on the very complex structural revision case studies. Inclusion of quick conformational search results in scoring the correct proposal higher in 24 cases out of 24. DP5q analysis is also helpful for determining stereochemistry of a diverse range of molecules, with the correct diastereomer selected in 25 cases out of 42, which exceeds performance of DP4 analysis (15 examples correct) using a similar level of theory.28 Such performance makes DFT-free DP5q ideal for high-throughput NMR data curation. To aid such workflows, we recommend using ETKDGv3 conformer generation for best results and selecting a threshold of 0.2 for the DP5q score to reliably discard incorrect structure proposals.
This work sets a strong foundation for incorporating more sophisticated spectroscopy methods, including J-value analysis, 2D NMR spectra interpretation, and processing the mixtures. In addition to that, DP5q can be used as a scoring function for the plethora of molecular optimisation methods available, potentially enabling automated structure revision.
The NMRdb dataset used in this study is available at https://github.com/ruslankotl/DP5.
Supplementary information (SI): data on the acceleration of DP5q analysis over DP5, a TMAP plot showing how predicted uncertainty varies with the chemical environment, and detailed outputs of DP5 analyses. See DOI: https://doi.org/10.1039/d5sc06988b.
| This journal is © The Royal Society of Chemistry 2026 |