Reference-free NOE NMR analysis†

Nuclear Overhauser Effect (NOE) methods in NMR are an important tool for 3D structural analysis of small molecules. Quantitative NOE methods conventionally rely on reference distances, known distances that have to be spectrally separated and are not always available. Here we present a new method for evaluation and 3D structure selection that does not require a reference distance, instead utilizing structures optimized by molecular mechanics, enabling NOE evaluation even on molecules without suitable reference groups.


Introduction
The nuclear Overhauser effect or enhancement (NOE) is one of the most extensively and intensively used NMR parameters to determine molecular 3D structure. In contrast to chemical shi and J-coupling, which are effects of the electron shell, NOE is a relaxation process caused by a direct dipole-dipole interaction between nuclei.
In 1953, Overhauser predicted that microwave irradiation on electron transitions in certain metals would transfer magnetization from electrons to nuclei, enhancing nuclear spin polarization a thousandfold. 1 This was experimentally demonstrated in the same year by Carver and Slichter, 2,3 and the concept opened up the eld of dynamic nuclear polarization (DNP). 2,3 An analogous enhancement between two nuclei, the nuclear Overhauser effect was rst observed by Solomon in 1955 and was explained as a so-called cross relaxation between nuclear spins in proximity. 4,5 From there, it took ten years for NOE to become appreciated as an analytical tool applied qualitatively to assignment and structure selection problems. [6][7][8] The quantitative interpretation of NOE intensities for the use of measuring internuclear distances followed swily. [9][10][11] While initially frequency selective 1D experiments had to be performed for each proton of interest, the two-dimensional NOE spectroscopy (NOESY) experiment provided a way to acquire complete NOE data in a single acquisition run. 12,13 The last gap in NOE applicability was medium-sized molecules, that have inherently weak NOE if molecular tumbling rates are close to the Larmor frequency, which was closed by the rotating frame Overhauser spectroscopy (ROESY) experiment (the creative name "cross-relaxation appropriate for minimolecules emulated by locked spins", CAMELSPIN, given by its inventors did not catch on). 14,15 Treating a two-proton system quantitatively, the integrated cross-peak volume in a 2D NOESY has an r À6 dependence on the spatial separation between the nuclei. 9,10 In systems featuring more spins, relayed transfer can be observed. 16 This aspect of NOE, called spin diffusion, dominates relaxation in large molecules or at high Larmor frequencies, where it limits experimental parameters and complicates evaluation. 13,17 While spin diffusion is usually negligible in small molecules, NOE was used strictly in a semi-quantitative way for a long time due to other relaxation effects. This changed when Hu and Krishnamurthy revisited an old technique of peak normalization that they called "peak amplitude normalization for improved cross-relaxation correction" (PANIC). [18][19][20][21] It was elaborated extensively by Butts's research group, opening a new chapter of accurate and precise NOE-derived distances. 22,23 Integrated NOESY peak volumes aer PANIC are where a AB corresponds to the integrated cross-peak volume, a AA to the diagonal volume, and b is a proportionality factor. [18][19][20] The parameters making up b are known and extensively discussed in the literature. 11,13 In essence, the prediction of b is complicated because it depends on the local correlation time s C which depends on global and local dynamics. As such, gross approximations are typically used.
Commonly, geminal protons in CH 2 groups or vicinal protons on double bonds or aromatic systems are used, since their distance varies little with conformational changes and typical distances are tabulated. 24 Looking at prominent recent works on NOE, the concept of an internal reference for absolute NOE distance measurement was applied with great success by the small molecule community. 23,25,26 For molecules without suitable internal distance reference or overlapping reference signals of insufficient chemical shis separation, such analysis is still deemed impossible and not attempted.
To improve the accuracy of the reference distance, the structure in question is nowadays oen generated in silico and optimized by molecular mechanics or DFT. In this process, it became apparent that in many cases the reference atoms at close distance show different relaxation behavior than the other, more distant, atom pairs in the rest of the molecule. An elegant approach to improve the clarity of structure selection is therefore to adjust the reference distance to better match the whole molecule. 27,28 While the concept of such a reference was never abandoned, we think this approach holds the key to NOE analysis without that reference, as will be shown below.

Results and discussion
We propose that the proportionality factor b can be determined on the y from a simple tting procedure using the distances from the (in silico generated) molecular structure and the experimental NOE data by minimizing a quality criterion like the sum of squared errors, for the whole molecule, without the need of an internal reference.
Scheme 1 Strychnine (1), the correct configuration is labelled RSSRRS in this work. Conicasterol F (2), the correct configuration is the 8,14-aepoxide as shown. Artemisinin (3), the correct configuration is labelled RSRSRR. Taxol (4) configuration selection was focused on the taxane skeleton, not on the highly flexible side chain. To showcase the new method, a 2D-NOESY experiment with 500 ms mixing time was acquired on strychnine (1) in CDCl 3 . 29 A total of 21 integrated intensities were obtained and normalized with respect to their diagonal signal intensity. 21,30 Among the peaks are 4 intra-CH 2 signals that could conventionally be used as reference and 17 cross-peaks between more distant protons that hold structural information. Molecular mechanicsgenerated structures (using MMFFs) of strychnine and 12 energetically feasible diastereoisomers were used for comparison and structure selection (see the ESI † for more details). Each isomer is labeled using the stereochemical descriptors R/S for each chiral carbon in the order C-7, C-8, C-12, C-13, C-14 and C-16. For example, the correct absolute conguration of strychnine is 7R,8S,12S,13R,14R,16S, shown in Scheme 1, hence it is labeled as RSSRRS. The same labelling scheme is used for all molecules in this study.
The concept of reference-free NOE tting is displayed in Fig. 1. No data point is explicitly dened as reference, instead, for each trial structure, all data is combined and the error minimized according to eqn (3). The resulting curve (dashed line) represents a model that allows calculation of distances from intensities if desired, like a conventional reference would. For clarity and space reasons, the tting to (A, B) the correct conguration of strychnine, RSSRRS, (C, D) its epimer at C-12, RSRRRS, and (E, F) isomer RSRSSR, corresponding to best tting, second best tting, and poor tting quality, respectively, are displayed. Fig. S1 of the ESI † shows the tting for all diastereoisomers of the ensemble.
The data displayed in Fig. 1 shows that the distances between geminal protons, i.e. intra-CH 2 distances, hardly change with conguration and conformation, which is why they are conventionally used as references. Their integrated intensities are about one order of magnitude higher than all other intensities and show only small variation. All other NOE intensities and distances vary strongly and this is where the structural discriminatory power lies.
Due to their large intensity, intra-CH 2 data dominate this t but depending on molecule and spectral quality, they may not be available. To resemble such a molecule without usable CH 2 intensities altogether, these data points were excluded in the next step and the reference-free t was performed on this reduced dataset (lled black points in Fig. 1). The resulting curves (solid lines) are more spread for the different structures but the discriminatory power is retained as will be shown below. Fig. 1 also shows that for correct (but also incorrect) structures, the curves obtained from the tting procedure lie within the expected range of conventional references. This enables absolute interpretation of NOE integrals, i.e. distance measurement, for all kinds of molecules, even those without appropriate reference proton pairs. Different from strychnine, the next three examples contain methyl groups, which undergo internal rotations. These motions require averaging of proton-proton distances, depending on rotation rates compared to molecular tumbling. Even in small molecules methyl group rotation is fast compared to molecular tumbling and distances should be averaged as hr À3 i instead of hr À6 i, as nicely described in the seminal article by Koning, Boelens, and Kaptein in 1992. 31 Going from the reference-free tting procedure to structure selection, Fig. 2 shows the result of trial structures evaluated using different scoring functions, RMSD or c 2 and an R 2 -like score. The fourth conguration, RSSRRS, corresponds to the correct structure, as conrmed by all scoring functions. While RMSD identies the correct structure ( Fig. 2A), it is not always the best measure of model performance. A good way to convey agreement between data and model is R 2 , the so-called coefficient of determination. It is reecting how large the spread of data is and how well it follows the model. Data that poorly follow the model can show negative R 2 values for non-linear functions, like in the current method. Since intra-CH 2 distances have almost no discriminatory power but would dominate R 2 , a different score, R red 2 , is dened as R 2 excluding intra-CH 2 corresponding data points (but they still contribute to the t). As can be seen in Fig. 2B, as with RMSD, the parameter R red 2 selects the correct structure (highest value). As a bonus, the difference between correct and incorrect structures is visually well pronounced.
The selection can also be performed for the case without intra-CH 2 data, as mentioned before, corresponding to lled circles and solid lines in Fig. 1. Fig. 2C shows the results of this truly reference-free selection; since intra-CH 2 data is excluded completely, R red 2 ¼ R 2 . The results are almost as clear as in the previous case and again, the correct structure is selected. Inclusion of intra-CH 2 data can be advantageous in some cases, but as intra-CH 2 NOE oen deviate from the behavior of all other proton interactions, the last scoring method, R 2 without intra-CH 2 data points, is the most applicable method and will be used in the following examples. The method presented here requires prior determination of constitution and assignment of all signals. Accordingly, it is shown here for complete assignment of all protons, as is available for strychnine. 30 Of course, full assignment is not always availablefor example, assignment of diastereotopic CH 2 groups oen requires knowledge of relative conguration and conformation.
To show viability even without such knowledge, a permutation approach can be utilized to obtain the full assignment alongside the structure selection. 32,33 Strychnine contains six CH 2 groups but the C17-methylene shows insufficient proton chemical shi separation to be used, so there are 32 ways to assign geminal proton pairs. Each trial structure was therefore evaluated for all 32 combinations and the individual best assignments were used. For the correct structure, these scores are unchanged, conrming the assignment. Incorrect structures can reach better scores by alternative assignment, lowering the separation between correct and best incorrect structure. However, as the unlled bars in Fig. 2 show, the correct structure is still selected by all scoring functions.
The reference-free method was then applied to conicasterol F (2), a marine natural product, to reproduce the determination of relative conguration without hydrogen atoms attached to the carbon atoms in question. 34 In the original study, the molecule was determined by 2D-NMR methods to be either the 8,14-aepoxide or the 8,14-b-epoxide isomer. Quantitative 1D-ROE data was interpreted using geminal vinylic protons as reference and a subset of key distances that differ signicantly between the two isomers were selected. The t between ROEderived distances and DFT-calculated structures identied the 8,14-a form as correct conguration of conicasterol F, which was conrmed by 13 C chemical shi predictions.
Using the ROE-data and coordinates provided in the study, the reference-free analysis results are essentially the same for the rened subset of ROE intensities as well as for the unrened full dataset as can be seen in Fig. 3. The conguration selection is reproduced effortlessly and the 8,14-a-epoxide is selected.
Next we applied the method to the important anti-malaria drug artemisinin (3). 35,36 Artemisinin has 7 stereogenic centers and therefore 64 congurations or 32 pairs of enantiomers. The correct absolute conguration of artemisinin is 1S,4R,5S,6R,7S,10R,11R (see Scheme 1) and represented here as SRSRSRR. It is more exible than strychnine (1) and several congurations have more than one conformation at energies relevant at room temperature. The generation of stereoisomers was followed by a conformational search with 3 kcal mol À1 cutoff energy and removal of redundant conformations (see the ESI † for more details). A total of 12 integrated NOE intensities, including 2 intra-CH 2 integrals, were obtained from a 2D-NOESY experiment and normalized with respect to their diagonal signal intensity. 1 H signal assignment was taken from literature. 37 All conformations then underwent NOE analysis excluding the two intra-CH 2 datapoints and the highest scoring conformation was used to represent its conguration in the selection.
The resulting score for all congurations of 3 is shown in Fig. 4. The correct structure of artemisinin is selected with R 2 ¼ 0.84, with a second-best t for the epimer at C-4 with R 2 ¼ 0.24, all other congurations show negative scores. The t for the correct structure is also shown in Fig. 4, the curve shows that in this case, the proportionality constant derived from the tting procedure lies well within the calibration achieved by using one of the two conventional reference signals. This means, distance measurements based on the reference-free method are as accurate as distance measurements based on a conventional reference.
Paclitaxel (4) was chosen as a fourth example, an important chemotherapy medication sold under the brand name taxol among others and historically referred to by that name. [38][39][40] Taxol has a total of 11 stereocenters, corresponding to 1024 pairs of enantiomers, and most congurations have a large conformational space of more than 10 conformations, due to the high exibility of the side-chain at C-13. Analyzing all Fig. 3 Relative configuration determination of 2, conicasterol F with 8,14-a configuration (left) and 8,14-b isomer (right) in linear (top) and logarithmic plot (bottom). 1D-ROE data and coordinates were taken from a reference. 34 The authors analysed the two calculated structures and identified a subset of distances that change between the isomers to improve their selection (C reduced data). In the reference-free selection both the reduced (C) and the full subset (C and B) show essentially the same fit results and score. Conventional and referencefree analysis select 8,14-a as correct structure.
This journal is © The Royal Society of Chemistry 2020 Chem. Sci., 2020, 11, 9930-9936 | 9933 possible structures, as well as performing a conformational analysis, is beyond the scope of this publication. Instead, a set of 8 diastereoisomers derived from the correct conguration were generated by inverting key stereocenters of the taxane skeleton. This was followed by a conformational search with 3 kcal mol À1 cutoff energy and removal of redundant conformations (see ESI † for more details). 1 H signal assignment was taken from literature. 41 For this example, the application of the reference-free NOE method was focused on the selection of conguration of the taxane skeleton, leaving the NOEs involving the hydrogens of the side-chain out of the tting procedure. Using a total of 53 NOE enhancements, all conformations underwent the reference-free NOE analysis to be scored and the highest conformer score was used to represent each conguration. The result of the selection is shown in Fig. 5, and again, the correct conguration is selected.

Conclusions
NOE is a very interesting and unique method for 3D analysis of molecules. In spite of well-known problems concerning conformational exchange and internal motions like methyl group rotation, and the possibility of additional, indirect, transfer pathways (spin diffusion), the simple two-spin approximation, assuming intensity proportional to r À6 , has been used successfully for a long time for small molecules.
The internal reference distance that is almost always used for calibration, however, is nothing but a global scaling factor of distances for the whole molecule and might not even match the distances it is supposed to measure. When computer generated structures are available, instead, a t can be performed to provide a calibration for distance measurement. For structure selection within a limited pool of congurations or conformations, this demonstrably does not limit the discriminatory power. Of course, the method requires a suitable pool of trial structures and can provide false positive results if the correct structure is not included.
Since this is a tting procedure similar to the way RDCs and RCSAs are analyzed, it is aimed at structure selection and not at a de novo determination of 3D structure from molecular constitution. If the correct conguration is known, however, the method is also a powerful tool to calculate absolute distances even in absence of a calibration reference. The analysis is not limited to NOESY spectra and can similarly be applied to 1D-NOE data or rotating frame Overhauser experiments (ROE/ROESY).
For a rigid molecule like strychnine, a single conformation is sufficient for selection.
Multi-conformational averaging can be a straightforward extension if the populations are known, for example from energy levels and Boltzmann distribution. 42 Similarly, conformation analysis can be achieved by varying populations to match experimental data. 43 We expect a single t for all conformations, corresponding to a single correlation time s C , to be sufficiently accurate for conguration selection as long as the exchange rate is faster than the relaxation rates. 24 The method shows a new perspective of NOE analysis and structure selection that is reference-free and therefore can be applied to all molecules, even those without appropriate geminal or vicinal aromatic (or cis double bond) proton pairs at all. For these molecules, NOE data can now be used for computer-assisted conguration selection in a multi NMR parameter protocol alongside residual dipolar couplings (RDCs), residual chemical shi anisotropy (RCSAs), chemical shis, and J-couplings. 37,44,45 It is an excellent choice to be applied in elds such as peptide and carbohydrate research, organic synthesis, natural products, and medicinal chemistry.

Conflicts of interest
There are no conicts to declare.