Benchmark fragment-based 1 H , 13 C , 15 N and 17 O chemical shift predictions in molecular crystals †

The performance of fragment-based ab initio 1H, 13C, 15N and 17O chemical shift predictions is assessed against experimental NMR chemical shift data in four benchmark sets of molecular crystals. Employing a variety of commonly used density functionals (PBE0, B3LYP, TPSSh, OPBE, PBE, TPSS), we explore the relative performance of cluster, two-body fragment, and combined cluster/fragment models. The hybrid density functionals (PBE0, B3LYP and TPSSh) generally out-perform their generalized gradient approximation (GGA)-based counterparts. 1H, 13C, 15N, and 17O isotropic chemical shifts can be predicted with root-mean-square errors of 0.3, 1.5, 4.2, and 9.8 ppm, respectively, using a computationally inexpensive electrostatically embedded two-body PBE0 fragment model. Oxygen chemical shieldings prove particularly sensitive to local many-body effects, and using a combined cluster/fragment model instead of the simple two-body fragment model decreases the root-mean-square errors to 7.6 ppm. These fragmentbased model errors compare favorably with GIPAW PBE ones of 0.4, 2.2, 5.4, and 7.2 ppm for the same 1H, 13C, 15N, and 17O test sets. Using these benchmark calculations, a set of recommended linear regression parameters for mapping between calculated chemical shieldings and observed chemical shifts are provided and their robustness assessed using statistical crossvalidation. We demonstrate the utility of these approaches and the reported scaling parameters on applications to 9-tertbutyl anthracene, several histidine co-crystals, benzoic acid and the Cnitrosoarene SnCl2(CH3)2(NODMA)2.


Introduction
Hydrogen, carbon, nitrogen and oxygen-containing functional groups play a central role in the structures, chemical reactivity, and solubility of biological and pharmaceutical compounds. 1 Advances in instrumentation and methodology over recent decades have made nuclear magnetic resonance (NMR) spectroscopy a particularly potent tool for investigating structural features associated with 1 H, 13 C, 15 N, and 17 O nuclei. However, even with modern multi-dimensional NMR experiments, structure elucidation can prove challenging due to the complexity of the spectra and the subtle effects of chemical environment on the chemical shifts.
Computational tools play an increasingly prominent role in NMR spectral assignment and structure elucidation. Early ab initio chemical shielding prediction began with small cluster models, often employing simple charge-embedding schemes to mimic the crystal environment. 2,3 The inherent limitations of such models resulting from their approximate treatment of the crystal lattice limited their widespread application. On the other hand, periodic density functional theory (DFT) is wellsuited for modeling chemical shieldings in extended crystal systems, and the plane wave DFT-based gauge-including projector augmented wave (GIPAW) method [4][5][6] has now become the method of choice for chemical shift prediction for molecular crystals.
The success of the GIPAW technique has contributed significantly to the rapidly expanding field of NMR crystallography, which combines solid state NMR, diffraction methods and ab initio chemical shielding predictions to solve crystal structures. [7][8][9][10][11] However, despite the widespread success of plane wave DFT methods, they suffer from two main limitations. First, plane wave calculations are limited to GGA-type density functionals in practice. Hybrid density functionals can offer improved accuracy for NMR chemical shift prediction, [12][13][14] but they typically require at least an order of magnitude more computational effort to evaluate in a plane wave basis compared to a GGA functional. In contrast, the cost premium for hybrid density functionals in Gaussian basis sets is typically less than a factor of two.
The second limitation lies in the mapping of absolute shifts obtained from calculations to empirically determined chemical shifts. This mapping is generally performed using a simple linear regression model relating experimental and chemical shifts. However, the linear regression parameters are specific to a given functional/basis set combination. 15 Regression models obtained from plane wave/pseudopotential GIPAW calculations in periodic crystals are not transferable to chemical shieldings computed from all-electron models in non-periodic systems, such as an enzyme active site.
By decomposing the molecular crystal into a series of interacting molecules, fragment methods provide an accurate, lowcost alternative to plane wave techniques for computing a variety of chemical properties, including NMR chemical shieldings. 12,[16][17][18][19][20] Fragment methods pave the way for routine use of hybrid density functionals or perhaps even a high-accuracy wave function-based correlation treatment of magnetic properties. Further, fragment methods allow the same density functionals, basis sets and empirically derived scaling parameters to be applied across systems ranging from molecular crystals to molecules in solution or even biomolecules.
We have recently shown that both GIPAW and an electrostatically embedded two-body fragment model reproduce the experimental 13 C isotropic shifts in a set of 25 organic molecular crystals to within a root mean square error of 2.1-2.2 ppm when using the PBE functional. 13 However, using a hybrid density functional like PBE0 21 or B3LYP 22 in the fragment model instead decreases the error by a third to 1.4 ppm. 13 The same study also found good performance for the principal components of the shielding tensors. Furthermore, the chemical shielding scaling parameters obtained here have been successfully applied to the NMR characterization of the 2-aminophenol quininoid intermediate in the mechanism of tryptophan synthase. 23 Building upon the success of fragment-based methods in the context of 13 C chemical shift predictions, 13 the present work explores the application of fragment, cluster, and combined cluster/fragment approaches to predicting the isotropic chemical shifts of 1 H, 15 N and 17 O nuclei, and it compares the performance of these models to the widely used GIPAW approach. For the sake of consistency, we also slightly revise the earlier 13 C chemical shift test results 13 here using the identical geometry optimization protocol and sets of density functionals across all four nuclei. These four nuclei were chosen because of their ubiquity in organic and biological systems and their importance in NMR studies of these systems. Their widespread use in NMR also means that relatively large sets of experimental shifts could be drawn from studies found in the literature. Because reliable shielding tensor data is harder to find for these nuclei, we focus only on isotropic shifts in this work.
Previous work has shown that the many-body expansion converges more slowly for 15 N and 17 O compared with 1 H and 13 C. 12 It is particularly important, therefore, to assess the viability of fragment methods for these nuclei. To do so, we have compiled three new benchmark sets of molecular crystals consisting of 80 1 H, 51 15 N, and 28 17 O experimentally measured isotropic chemical shifts. These benchmark sets augment our previously developed 13 C set consisting of 169 shifts. These new sets include diverse chemical shifts that span the ranges observed in most common biological and pharmaceutical species. 24,25 The use of molecular crystals with well-defined and largely static structures helps mitigate the influence of confounding variables such as solvation or conformational dynamics. It enables more direct assessment of fragment-based methods and produces linear regression parameters that properly account for an explicit chemical environment. Because common benchmark data sets are extremely useful for validating and comparing models developed by different researchers, complete optimized crystal structures and tabulated chemical shifts for these test sets are provided as ESI. † Developing a robust set of chemical shielding scaling parameters for 1 H, 15 N, and 17 O nuclei is challenging. 15 N and 17 O exhibit broad chemical versatility, and the shifts of all three nuclei types can be affected by proton dynamics, particularly in hydrogen bonding situations. Nitrogen, for example, bonds with most other elements, has oxidation numbers ranging from À3 to +5, coordination numbers from 1 to 6, and bond orders up to 3. 24 Such diversity manifests in a chemical shift range spanning B1100 ppm. An even greater diversity in chemical shifts is observed for oxygen, with the chemical shifts for water and dioxygen separated by nearly B2000 ppm. 25 However, in the context of organic molecular crystals and biologically relevant applications, these ranges span B400 and B1000 ppm for 15 N and 17 O, respectively. 26 Proton isotropic shifts are generally limited to a more modest B20 ppm range. On the other hand, hydrogen is much more susceptible to dynamics such as methyl group rotation or rapid proton exchange between hydrogenbonded partners or even nuclear quantum effects. In all cases, care must be taken to adequately represent the desired range of chemical shieldings and to avoid errors introduced in the calculations by neglecting the dynamical averaging occurring in the NMR experiments.
Despite efforts to include a wide variety of chemical environments in each molecular crystal test set, biases resulting from the composition of the test set could impact both the reported accuracy of a given method and the general transferability of the scaling parameters. To assess the degree to which the composition of the training and validation sets impact performance and transferability, a series of statistical cross-validation numerical experiments were performed. In agreement with cross-validation results obtained previously for 13 C chemical shifts, 13 the analysis here reveals relatively little impact of training/validation set composition on both overall accuracy or scaling parameters.
Using these benchmark sets, we examine the impact of electrostatic embedding, two-body fragment and cluster cut-off distance, and the choice of density functional on the chemical shift predictions. Specifically, we compare the hybrid density functionals PBE0 and B3LYP, the GGA-based functionals PBE 27 and OPBE, 28 as well as the meta-GGA functional TPSS and its hybrid variant TPSSh. 29 The meta-GGA and hybrid meta-GGA functionals performed well for 15 N chemical shift tensors in a recent study using symmetry-adapted cluster calculations. 14 Although this work primarily focuses on the fragment-based approaches, we have also benchmarked GIPAW across the same test sets for comparison. Despite its widespread use for chemical shift prediction, extensive GIPAW benchmarks for nuclei like hydrogen and oxygen are hard to find. The test sets developed here provide scaling parameters for mapping GIPAW chemical shieldings for these four nuclei to chemical shifts, which may prove useful in other applications.
Through these benchmarks, we demonstrate that with the PBE functional, the fragment-type approaches perform competitively with plane wave GIPAW, especially for hydrogen, carbon, and nitrogen. As we found in our previous study of 13 C chemical shifts, 13 however, hybrid density functionals provide statistically improved accuracy over GGAs. In particular, fragment PBE0 calculations out-perform GIPAW PBE ones in terms of the root-mean square (rms) errors and the maximal errors for 1 H, 13 C, and 15 N. Oxygen chemical shifts, which are more sensitive to many-body effects, prove more difficult to model with the fragment approach. Nevertheless, the 17 O cluster/ fragment PBE0 results are almost as accurate as the GIPAW PBE ones, with the differences in root-mean-square error between the two functionals being comparable to typical experimental uncertainties in 17 O chemical shifts. Given the high computational expense associated with using hybrid functionals in plane wave GIPAW calculations, these results provide further support for using fragment-based approaches instead of GIPAW methods for NMR chemical shielding predictions in organic molecular crystals.
Finally, we provide a collection of illustrative examples demonstrating the utility of fragment-based methods coupled with the chemical shielding scaling parameters derived from these training sets. First, we help assign the previously unpublished 1 H/ 13 C heteronuclear correlation spectrum for the 9-tert-butyl anthracene ester (9-TBAE) molecular crystal. Second, the prediction of both 1 H and 15 N isotropic shieldings for a particularly challenging set of histidine co-crystals is assessed. Third, we briefly examine issues of proton exchange dynamics using 17 O chemical shielding predictions applied to crystalline benzoic acid. Finally, we assess the accuracy of predicted 17 O chemical shieldings for a challenging organometallic C-nitrosoarene whose experimental chemical shift 30 lies far outside the range of oxygen chemical shifts included in the test set.

Many-body expansion for ab initio shielding tensor calculations
The chemical shielding tensor describes the screening of the external magnetic field experienced at the nucleus by the surrounding electron density. This change in local magnetic field at the nucleus results directly from interaction between the external magnetic field and the local electron density. Formally, the NMR chemical shielding tensor element r ab for atom A is defined as the second derivative of the electronic energy with respect to the a-th component of the external magnetic field B a and the b-th component of the nuclear magnetic moment on atom A, m A b : As discussed previously, 12 differentiating the many-body decomposition of the energy allows one to express the total chemical shielding tensors of atom A on molecule i in a crystal as a sum of one-body, two-body, and higher-order terms, where r A i is the shielding tensor for atom A on the isolated monomer i, D 2 r A ij is the two-body contribution to the shielding tensor arising from the interaction of monomer i with monomer j, and D 3 r A ijk is the three-body contribution to the shielding tensor and is defined as Note that because atom A lies on monomer i and not monomer j or k, terms like r A j and r A k in eqn (3) and (4) are actually zero.

Fragment and cluster methods
For each atom on each symmetrically unique molecule in the unit cell, the summations in eqn (2) are carried out over all unique sets of fragments (dimers, trimers, etc.). Evaluation of the one-and two-body contributions can be done inexpensively, but the cost of the three-body and higher terms grows rapidly, giving rise to a significantly larger computational burden if contributions beyond two-body are included. However, assessment of the many-body expansion for 13 C chemical shielding tensors has demonstrated that the contributions from threebody and higher terms are small relative to the inherent errors from density functional theory, especially if electrostatic embedding models are employed to mimic the crystal lattice. 12,13 Accordingly, a two-body fragment model approximates the shielding tensor as where r A,emb.
i and D 2 r A,emb.
ij are the one and two-body contributions with each monomer and dimer calculation carried out in an electrostatic embedding environment which will be discussed below.
A two-body fragment-based calculation in a molecular crystal is carried out by defining a cut off radius R c around the asymmetric unit, as illustrated in Fig. 1. The chemical shielding tensor for each atom in the asymmetric unit is then approximated by calculating the two-body contributions for all dimers involving that monomer in the asymmetric unit and any other monomer lying within the defined cut off according to eqn (5). For example, using the labeling and cut-off defined in Fig. 1, the total chemical shielding tensor for atom A on monomer i is given by: þD 2 r A;emb: By focusing these many-body expansions on molecules in the asymmetric unit, the fragment approach readily exploits space group symmetry to achieve additional computational savings. Nevertheless, it still computes chemical shielding tensors for every symmetrically unique atom in the crystal. Further computational savings can be achieved by exploiting locally dense basis sets, which use smaller basis sets on all atoms outside the molecules in the asymmetric unit, as described in Section 3.2. If a two-body fragment method proves insufficient, a clusterbased calculation can be constructed by including all monomers within the defined cut off radius R c in a single supermolecular chemical shielding calculation. This effectively amounts to summing the many-body expansion through all orders for the subset of molecules lying closest to the molecule of interest. Additional chemical shielding contributions from more distant molecules in the crystal lattice can then be approximated in a pairwise (twobody) fashion. Specifically, one computes the two-body contributions between the molecule of interest and any other molecule outside the initial cluster but inside a second two-body fragment cut off (orange region in Fig. 1). Cluster-based calculations improve the treatment of local many-body effects, albeit with a substantial increase in computational cost since DFT methods typically scale O(N 3 ) with system size for moderately sized systems.
Electrostatic embedding provides a secondary strategy for approximating both long-range and many-body effects which can be beneficial for both cluster and fragment-based calculations. In the present work, we use a simple electrostatic embedding model based upon point charges obtained from Gaussian distributed multipole analysis (GDMA). [31][32][33] GDMA point charges are positioned at each atomic center for every monomer within a user-defined charge-embedding radius of molecule i in the central unit cell. The presence of these point charges mimics many-body effects by polarizing the monomer and dimer fragment calculations. Long-range electrostatic effects are captured by extending the charge embedding cut off well into the surrounding lattice. Previous work has demonstrated that a 30 Å chargeembedding cut off ensures convergence in the calculated shieldings, 12 and that cut off is used here as well.

Computational efficiency
For a typical small-molecule organic crystal, the computational efficiency of GIPAW PBE and fragment 2-body PBE are similar. Our software implementation automatically fragments the crystal, generating the necessary jobs which can then be run with an external electronic structure package, such as Gaussian. The software then combines the results of those fragment jobs into the final set of shielding tensors. For a crystal with a single molecule in the asymmetric unit (Z 0 = 1) and a 6 Å two-body cut off, the fragmentation procedure generates one monomer calculation and typically B20-25 dimer calculations.
Perfect computational efficiency comparisons between GIPAW and the fragment approach are difficult due to differences in the basis sets, grids, etc. Nevertheless, for a typical small-molecule organic crystal with a handful of molecules in the unit cell, twobody fragment PBE and GIPAW PBE require similar amounts of total CPU time. Several computational features of the fragment method are notable, however. First, the cost premium for using the hybrid functionals in the fragment approach is typically only B50%, making them routinely applicable. In contrast, hybrid functionals are rarely employed in GIPAW due to their high computational expense.
Second, due to the local nature of the two-body interactions and exploitation of space group symmetry, the cost of the fragment approach depends on the number of molecules in the asymmetric unit, rather than the total number of molecules in the unit cell. The time to compute chemical shielding tensors for a four-molecule unit cell crystal and an eightmolecule cell polymorph will be about the same, as long as Z 0 = 1 for both. Doubling the number of molecules in the asymmetric unit from Z 0 = 1 to Z 0 = 2 will double the cost of the shielding tensor calculation.
Third, the fragment approach achieves high parallel efficiency. Each of the two dozen or so fragment jobs can be run simultaneously with minimal internode communication. Each individual fragment job can be run in parallel across 1-2 dozen processor cores. Together, these two tiers of parallelization enable the chemical shielding tensor calculation to be run very efficiently on hundreds of processor cores. If one has several hundred processor cores available, chemical shifts on a Z 0 = 1 crystal structure of a 70-80 atom molecule can be obtained within a few hours, irrespective of the size of the unit cell.
The cluster/fragment approach is substantially more expensive than the two-body fragment, since it requires computing chemical shifts for a large 10-15 molecule cluster. Still, such calculations are very feasible for clusters containing hundreds of atoms or more due to efficient and/or linear-scaling chemical shift algorithms. [34][35][36] Fig. 1 Schematic illustrating the application of fragment and clusterbased methods to molecular crystal systems. R c denotes the cut off distance for a given model. Each cell is labeled for ease of reference.
were chosen based on the availability of both high-resolution room temperature NMR chemical shift data in the literature with unambiguous spectral assignment as well as high-quality X-ray or neutron diffraction crystal structures. Structural data for each compound in the study was obtained either from the literature or from the Cambridge Structure Database (CSD) maintained by the Cambridge Crystallographic Data Center.
The hydrogen, carbon, nitrogen and oxygen benchmark sets consist of 13, 25, 24 and 15 crystal structures, respectively. In the interest of providing a uniform set of scaling parameters obtained using a single computational procedure outlined below, we update our predicted 13 C shifts here for the carbon test set, though the differences in the 13 C predicted shifts here and in our earlier work 13 are trivial (less than 0.1 ppm difference in root-mean-square error). Each of the species in the hydrogen, nitrogen, and oxygen sets are depicted in the ESI, † along with their common names and CSD reference codes. The carbon set species were shown previously. 13 Each set was chosen to include a variety of intermolecular interactions which are representative of common interactions in pharmaceutical and biological compounds. Experimental chemical shifts for 1 H and 13 C are reported relative to neat TMS under magic angle spinning (MAS) conditions, 37 the 15 N shifts are given relative to external solid NH 4 Cl also under MAS, and 17 O shifts are relative to liquid water. Conversions to other chemical shift scales can be accomplished as described by ref. 37; for expediency, several of the most common conversions are given in the ESI. † The CSD reference codes and experimental references for NMR data for the three test sets are:
The use of fixed room-temperature experimental lattice parameters effectively compensates for the appreciable increases in volume that occur between the minimum electronic energy structure and the finite temperature structure. 59,60 Note that the use of fixed experimental lattice parameters reduces sensitivity to the specific dispersion correction. The root-mean-square errors in the predicted 13 C chemical shifts obtained here using PBE-D2 geometries differ by less than 0.1 ppm from those obtained previously for the same crystals using PBE-TS geometries. 13 Molecular crystal fragmentation through two-body was carried out using our hybrid many-body interaction (HMBI) code. 16,61,62 Individual fragment shielding tensor calculations were performed using Gaussian09 63 with the B3LYP, PBE0, PBE, OPBE, TPSS, and TPSSh density functionals. Calculations were performed using a locally dense basis set and the GIAO approximation. Unless otherwise stated, a 6 Å fragment cut off R c was used for both fragment and cluster/fragment models, and a 4 Å cluster was used in cluster/fragment calculations. All calculations used a 6-311+G(2d,p) basis [64][65][66][67] for all atoms on the central monomer of interest, a 6-311G(d,p) basis for all neighboring atoms out to 4 Å, and a 6-31G 68,69 basis for all atoms beyond 4 Å. This locally dense basis set 70,71 combination has proved effective in previous studies 12,72 and is simply referred to here as our ''mixed basis''. The ATZP basis 73 was used on the tin atom for the nitrosoarene in Section 7.4.
As described in previous work, 12 a large DFT integration grid consisting of 150 radial and 974 Lebedev angular points was used to approach rotational invariance and mitigate the introduction of noise from fragment contributions given by symmetrically equivalent molecules with different orientations. After constructing the raw shielding tensors via the fragment or cluster/fragment approach, the tensors are symmetrized and diagonalized to compute the principal components. Isotropic shieldings are reported as the average of these diagonal values. Note that if one is only interested in predicting the isotropic shifts, one can use the raw isotropic shieldings for each fragment instead of the full tensors. However, the tensor approach used here is more general.
Distributed multipoles computed with the GDMA package 33,74 were used to construct the embedding environment. The GDMA charges were calculated using the same functional and 6-311+G(2d,p) basis set as the chemical shielding calculation. The GDMA point charges were then placed on all molecules lying within 30 Å of any atom in the asymmetric unit cell, as described in previous work. 12,13 Gauge-including projector augmented wave (GIPAW) chemical shielding calculations were performed at the same optimized PBE-D2 (Quantum Espresso) geometries without further relaxation. Calculations were performed using CASTEP 75 with the PBE functional, ultrasoft pseudopotentials generated on-the-fly and an 850 eV plane wave basis set cut off. Electronic k-points were sampled on a Monkhorst-Pack grid to give a maximum separation between k-points of 0.05 Å À1 . The basis set cut off and k-point density were chosen based on chemical shift convergence tests (see ESI †). The basis set plane wave cut off energy was chosen to converge absolute chemical shifts to within 0.4 ppm and, more importantly, relative chemical shifts to 0.05 ppm or better. Chemical shielding was found to be fairly insensitive to the density of the k-point grid. Full space group symmetry was used in all GIPAW calculations.

Chemical shift linear regression and statistical cross-validation
The experimentally observed chemical shift d i represents the difference between the absolute chemical shielding s i of nucleus i and the absolute shielding s ref of a reference compound. Thus, comparison of predicted shifts with experimental NMR spectra requires mapping between the computed absolute shieldings s i and the experimentally referenced chemical shift d i . Numerous techniques exist for performing this mapping. 15 Here we adopt a linear regression approach which addresses the shift referencing and helps correct for systematic errors in the calculations.
In the absence of any systematic error, A would take a value of À1, and B would simply be the absolute shielding of the reference compound s ref . However, obtaining these parameters via a linear least-squares fit between the calculated and experimental data for each of the test sets provides scaling parameters for each type of nucleus which partially mitigate systematic errors present in the calculations. Further, assuming these regression parameters are fitted to a sufficiently broad and representative test set, they can be used to scale predicted shifts obtained for compounds not included in the test set and can even be applied in the context of non-crystalline systems. Fig. 2 illustrates the application of this approach using the hydrogen, carbon, nitrogen and oxygen test sets using PBE0 fragment two-body calculation.  13 C, (c) 15 N and (d) 17 O benchmark sets. Predicted shifts were obtained from two-body fragment calculations at the PBE0/mixed basis level with a 6 Å two-body cut off distance.
In an effort to assess the robustness of the linear regression parameters, exhaustive N-choose-5 cross-validation was performed for each test. The cross-validation procedure consists of partitioning the N crystal structures into a training set with N À 5 training structures and 5 validation structures. In this way, 1287, 53 130, 42 504 and 3003 different partitionings were formed for the hydrogen, carbon, nitrogen and oxygen test sets, respectively. Linear regression and error analysis was then performed for each partitioning to obtain distributions of errors and linear regression parameters.

Experimental methods
Most experimental chemical shifts considered here were obtained from the literature. However, we found relatively few examples of high-quality, small-molecule crystals with primary amine nitrogen chemical shifts in the literature. Therefore, new experimental 15 N NMR chemical shifts for many amino acids (structures 8 and 16 through 24 in the nitrogen test set) are reported here. The experimental 1 H and 13 C chemical shifts for 9-tert-butyl anthracene (Section 7.1) are also reported here for the first time.

9-tert-Butyl anthracene ester
Two-dimensional 1 H, 13 C heteronuclear-correlation (HETCOR) experiments 76 were performed at 14.1 T (600.01 MHz 1 H, 150.87 MHz 13 C) on a Bruker AV600 spectrometer equipped with a triple resonance 1.3 mm MAS probe with a sample spinning rate of 50 kHz (AE2 Hz). Less than 2 mg of microcrystalline sample were packed into each rotor. For these experiments, cross polarization (CP) was established using a 2 ms contact time with nutation frequencies of 125 kHz for 1 H and 75 kHz for 13 C; high power 1 H decoupling during 13 C acquisition was implemented using XiX (125 kHz, 2.85 t r ). 77 Chemical shifts were indirectly referenced to neat TMS using an external sample of adamantane in which the 1 H resonance was set to 1.87 ppm and the down-field 13  These crystal samples were used directly from the supplier without recrystallization. The crystal structures of the samples were confirmed via powder X-ray diffraction.

Results and discussion
We begin by first examining the performance of fragment and cluster/fragment models for each of the four nuclei, assessing both the impact of electrostatic embedding as well as two-body cut-off distance. Second, we compare the predicted shifts for fragment, cluster and cluster/fragment approaches across six different commonly used density functionals. Third, the accuracy of the various fragment-type approaches relative to the widely used GIPAW is assessed. Fourth, we examine the statistical robustness of the linear regression parameters using statistical cross-validation. Finally, we apply fragment methods along with our scaling parameters to four chemically interesting problems which involve species not included in the benchmark test sets.

Performance of fragment and cluster/fragment models
To ensure that the chemical shift predictions are well-converged with respect to the fragment contributions included, we first examine the impact of the two-body cut off distance for both fragment and cluster/fragment models. Fig. 3 illustrates the root-mean-square errors in the isotropic chemical shifts relative to experiment for (a) hydrogen, (c) carbon, (e) nitrogen and (g) oxygen as a function of two-body cut-off distance. Linear regression models of the form presented in eqn (8) were applied separately to each model/cut-off combination using the experimental isotropic shifts for the respective test sets (see ESI † for details). All calculations were performed using the locally dense basis set defined previously in Section 3.2.
All four nuclei demonstrate a dramatic reduction in rms errors once sufficient two-body terms are included to capture all hydrogen-bonding partners (2-3 Å two-body cut off). As noted previously for 13 C, 13 reasonable convergence is achieved for both hydrogen and nitrogen once all nearest-neighbor molecules are included in the fragment calculation using a 4 Å two-body cut-off.
As a result of the inherently local character of the chemical shielding, two-body contributions decrease rapidly with increasing distance between the two molecules. Extending the two-body cut off beyond 6 Å has a very small impact on the predicted 1 H, 13 C, and 15 N isotropic shifts (Fig. 3a, c and e). Isotropic shifts for 17 O display a slightly greater dependence on long-range two-body contributions (Fig. 3g) and could perhaps benefit slightly from an 8 Å two-body cut off. Nevertheless, for the sake of consistency with calculations on other nuclei, we adopt a 6 Å cut off throughout. Fig. 3a and b show that both electrostatic embedding and clusterbased calculations provide minimal improvements in the predicted 1 H isotropic shifts compared to a simple, un-embedded fragment two-body model. Carbon benefits slightly more from electrostatic embedding, 13 while nitrogen and oxygen demonstrate a much greater dependence on it. Fragment-based calculations for 15 N and 17 O without point charge embedding (not shown) proved unreliable, with absolute errors almost uniformly exceeding 20 ppm. These results indicate a greater sensitivity of 15 N and 17 O nuclei to both long-range electrostatics as well as local many-body effects when compared with 1 H and 13 C. For 15 N, electrostatic embedding is largely sufficient for capturing local many-body effects, as evidenced by the relatively uniform behavior of the fragment and cluster/fragment approaches both in terms of overall rms error (Fig. 3e) as well as the error distributions (Fig. 3f).
In contrast, explicit treatment of local many-body contributions using a small cluster becomes much more important for 17 O chemical shieldings. Both cluster and combined cluster/ fragment methods improve the accuracy of predicted isotropic 17 O shifts relative to the two-body fragment model. Cluster/ fragment calculations using a 4 Å cluster uniformly improve the overall rms errors relative to two-body fragment calculations by B2 ppm (Fig. 3g). These results highlighting the impact of local many-body effects on 17 O shieldings agree with earlier work indicating larger three-body contributions for 17 O compared with 1 H, 15 N and 13 C. 12 Of course, measuring oxygen chemical shifts experimentally is also challenging, due to the line broadening resulting from the quadrupolar interaction. Experimental uncertainties range B0.5-5 ppm 26,51-55 and likely contribute to the errors seen for oxygen here.
The error distributions presented as box plots in Fig. 3b, d, f and h demonstrate similar trends to those observed for the rms errors. Hydrogen, carbon, and nitrogen show comparable performance between fragment and cluster/fragment methods. On the other hand, oxygen clearly shows a noticeably broadened error distribution for the two-body fragment methods compared with cluster-based approaches. Further, individual chemical shifts converge at the same rate with respect to the two-body cut off as the rms errors presented in Fig. 3 show (see Fig. S6 in ESI †). We therefore conclude that in the presence of electrostatic embedding, a two-body fragment approach with a 6 Å cut off is the method of choice for calculating 1 H, 13 C, and 15 N shifts. On the other hand, high accuracy 17 O calculations benefit from the use of a central 4 Å cluster and perhaps a longer 8 Å twobody cut off.

Relative performance of DFT functionals
We now assess the performance of various commonly used density functionals for NMR chemical shift prediction in molecular crystals using these models. Specifically, we benchmark two GGA functionals (PBE, OPBE), two hybrid density functionals (PBE0, B3LYP) and a meta-GGA (TPSS) and its hybrid variant (TPSSh). TPSS and TPSSh performed well in a recent molecular crystal study. 14 Using the fragment, cluster, and combined cluster/fragment models, rms errors for each of the 1 H, 13 C, 15 N, and 17 O chemical shifts on their respective molecular crystal test sets are given in Fig. 4. Table 1 summarizes the rms errors along with linear regression parameters for each nucleus and density functional included in the present work. Results for GIPAW PBE are also included in the table and figures and will be discussed in Section 5.3. Fig. 4 reveals several insightful trends. First, the three hybrid functionals demonstrate improved rms errors relative to the GGA-type and meta-GGA functionals across all functionals and atom types. The PBE0 and B3LYP hybrid functionals clearly give rise to the lowest rms errors for 13 C, 15 N and 17 O, notably outperforming the GGA, meta-GGA and meta-hybrid functionals. For 1 H, the differences in rms error across the different functionals are a trivial few hundredths of a ppm. Second, the trends regarding the relative performance of the different computational models reported for PBE0 in the previous section hold across all functionals for 1 H, 13 C and 15 N nuclei. Root-mean-square errors for the GGA-type functionals differ by only a couple tenths of a ppm across fragment, cluster and cluster/fragment calculations. For oxygen, on the other hand, the rms errors for the two-body fragment model are up to several ppm larger than those for the cluster or cluster/fragment models, reiterating the importance of capturing many-body effects when predicting 17 O chemical shifts.
Third, it was recently reported that the meta-GGA density functional TPSS demonstrated improved agreement with experiment relative to GGA functionals using a GIAO/symmetry-adapted cluster model in predicting the principal components of the chemical shielding tensor for 13 C, 15 N, 19 F and 31 P nuclei. 14 Here, no such behavior is observed for the isotropic 13 C and 17 O chemical shifts. For nitrogen, the meta-GGA TPSS does perform slightly better than PBE or OPBE, but the differences are B0.1 ppm or less between TPSS and OPBE. For 1 H, all six functionals are essentially indistinguishable from one another in terms of accuracy.
Holmes et al. 14 also observed that PBE0 performed worse than the GGA and meta-GGA functionals they tested for nitrogen, while it and B3LYP perform the best here. The reasons for the discrepancy between that study and the current one are unclear, but the studies do differ in a few notable ways. First, ref. 14 focuses on principal components of the chemical shielding tensors, while the work here is based on isotropic chemical shifts. On the one hand, shielding tensor principal components contain more detailed information about the local environment that is averaged out in the isotropic shift, making them a more sensitive probe for the structure and more challenging for models to predict. On the other hand, measuring tensor components is inherently less precise than extracting isotropic chemical shifts, which could increase the experimental errors in the tensor data used in the benchmarking. Second, no van der Waals dispersion correction was employed when performing all-atom crystal structure optimizations in ref. 14. Dispersion is critical in molecular crystals, [80][81][82][83] though freezing the lattice parameters at their experimental values partially mitigates problems arising from its neglect. 84 Third, the clusters used in ref. 14 typically contain 13-17 molecules, which are similar in size to the 4 Å clusters used here, and did not employ charge-embedding. As seen in Fig. 3, the effects of charge embedding and longer-range interactions beyond 4 Å are modest but non-zero. While further investigation is needed to sort out these details, hybrid functionals stand out here as the functionals of choice for NMR chemical shift calculations for all nuclei under consideration.
Fourth, the slopes in the regression lines of the bestperforming hybrid functionals mostly deviate from the ideal À1 by a few percent. Such deviations are fairly typical 15 and they help compensate for various systematic errors, including the omission of dynamics, vibrational averaging, etc. [85][86][87][88][89] Interestingly, the deviations from À1 in the hydrogen slopes are much larger at 8-9%. This might reflect the particular importance of dynamics and nuclear quantum effects for hydrogen. 90 Finally, we examine the degree to which similarities in the predicted chemical shifts among various computational models and density functionals extend to individual atoms. To provide visual context for the scatter plot, the shaded region in these plots indicates plus or minus the rms error for the method on the y-axis. Fig. 5 plots the correlations between a collection of top-performing model/density functional combinations. Previous results for 13 C isotropic chemical shifts demonstrate relatively uniform performance at the individual atom level across both functional class and fragment model, 13 and comparable results are seen here in Fig. 5c and d. Fig. 5a illustrates the correlations in the errors for the isotropic 1 H shifts relative to experiment for PBE0 fragment and cluster/fragment calculations. Fig. 5b plots the similar comparison between fragment-based PBE0 and B3LYP calculations. The corresponding linear fits demonstrate both high correlation coefficients as well as slopes near unity, indicating that fragment and cluster/ fragment models using either PBE0 or B3LYP predict very similar 1 H shieldings. Similar results are observed for 15 N and 17 O when comparing across hybrid functionals (see Fig. 5f and h). On the other hand, both 15 N and 17 O show slightly reduced correlation for fragment and cluster/fragment errors ( Fig. 5e and g). In the case of 17 O these results are not surprising given the reduction in rms errors observed for cluster-based approaches. However, the differences in the predicted 15 N isotropic shifts between fragment and cluster/fragment methods is more surprising given the uniform performance in terms of rms errors.

Comparison of fragment and GIPAW approaches
Given the widespread use of plane wave GIPAW (particularly with the PBE functional) for chemical shift prediction in molecular solids, 6 it is interesting to compare the accuracy of fragmentbased approaches against GIPAW ones. The right column in Fig. 3 plots the error distributions for GIPAW PBE across the four test sets. Fig. 4 plots the rms errors for GIPAW PBE on each test set against the various fragment models and functionals described in the Section 5.2, and Table 1 summarizes the resulting linear regression parameters and rms errors.
Consider first how GIPAW and fragment approaches compare when using the same PBE functional. As shown in Table 1, GIPAW PBE predicts the chemical shifts with rms errors of 0.43 ppm for  Interestingly, GIPAW PBE (RMSE 0.43 ppm) performs notably worse for 1 H than the two-body fragment model in the same functional (RMSE 0.34 ppm). Though this difference is fairly small in absolute terms, the larger GIPAW errors stand out notably from the highly consistent results obtained for the fragment-type methods with different functionals in Fig. 4. The reasons underlying the larger GIPAW 1 H errors are unclear. Of course, the experimental uncertainties for 1 H are often B0.1 ppm, so the difference between the fragment and GIPAW approaches may not be statistically significant.
Only for oxygen does GIPAW PBE (RMSE 7.2 ppm) significantly out-perform the two-body fragment PBE model (RMSE 11.6 ppm). As discussed above, many-body effects are particularly important for oxygen, and these are only approximated via point charge embedding in the two-body fragment model. Switching to the combined cluster/fragment PBE model improves the oxygen chemical shifts substantially, reducing the rms error down to 8.8 ppm. Still, even this smaller rms error remains B25% larger than the GIPAW one.
As discussed in Section 5.2, a key advantage of the fragment approaches is that they enable routine use of hybrid functionals for chemical shift prediction with only modest additional computational effort. For the nuclei other than hydrogen, hybrid functionals appreciably improve the quality of the chemical shift predictions. In 13 C, two-body PBE0 or B3LYP rms errors of 1.5 ppm are about a third smaller than the GIPAW PBE ones (2.2 ppm). For 15 N, the two-body fragment PBE0 and B3LYP errors are 1.2 ppm (B20%) smaller than those from GIPAW PBE. For 17 O, switching to a hybrid functional reduces the rms error to 7.5-7.6 ppm, eliminating most of the difference in the errors between the cluster/fragment and plane wave methods. Indeed, the B0.3 ppm difference in statistical errors between cluster/fragment PBE0 and GIPAW PBE is comparable to or smaller than the magnitude of the uncertainties in the experimental oxygen chemical shifts.
How these rms errors translate into the overall error distributions can be seen in the box plots in Fig. 3. For 1 H, 13 C, and 15 N, two-body fragment PBE0 exhibits both more small errors (as indicated by the smaller box sizes, which delineate the middle 50% of the data) and smaller-magnitude maximum errors (as indicated by the box whiskers). For oxygen, the GIPAW PBE and cluster/fragment PBE0 have a similarly sized boxes (i.e. many of the errors are comparable), but the worst errors have larger magnitude with the cluster/fragment approach.
It is also interesting to compare correlations between individual shifts predicted with GIPAW and the fragment approaches. Several such plots are shown in Fig. 6. To provide visual context for the scatter plot, the shaded region in these plots indicates plus or minus the GIPAW PBE rms error. For 13 C, individual shifts Table 1 Linear regression parameters and rms errors for the 1 H, 13 C, 15 N and 17 O test sets using two-body fragment, cluster/fragment (with a 4 Å cluster), and GIPAW calculations. All cluster and fragment calculations employ the mixed basis, electrostatic embedding and a two-body fragment cut off of 6 Å. We recommend these scaling parameters be used for general applications predicted with GIPAW and two-body fragment PBE are very similar, as demonstrated by the excellent correlation in Fig. 6c. Virtually all of the variation between the two models is within the magnitude of the rms errors for the models (shaded region). On the other hand, individual two-body fragment PBE0 13 C shifts are somewhat different (and statistically better) than the GIPAW PBE ones (Fig. 6d). A moderate fraction of these differences are larger than the magnitude of the rms errors for GIPAW. For 1 H, GIPAW PBE correlates relatively poorly with either two-body fragment PBE or PBE0 (Fig. 6a-b), as might be expected from the overall larger rms errors seen with GIPAW. On the other hand, differences in the individual predicted shifts between the fragment and plane wave models are largely comparable in magnitude to the GIPAW rms error. For 15 N, the correlation between two-body PBE and GIPAW is moderate (Fig. 6e), despite their statistically similar errors. In other words, the two models do predict moderately different shifts for individual atoms, but the variations in individual shifts between the two models generally appear to be comparable to or less than the overall rms errors of B5.5 ppm. Just as for carbon, the fragment PBE0 shifts correlate less well with GIPAW shifts than the PBE ones do (Fig. 6f). Again, as with carbon, PBE0 shifts differ from PBE ones for individual atoms, and those differences lead to smaller statistical errors across the test sets. Finally, for oxygen, the correlations with GIPAW PBE are reasonable for both cluster/fragment PBE and PBE0 (Fig. 6g-h), despite the differences in the rms errors for the different models.
Taken together, these results indicate that individual chemical shifts predicted by the different models are generally similar. For the most part, the variations seen for individual chemical shifts among the different models are on par with the statistical errors relative to experiment observed in these test sets.
Finally, the errors seen here compare well with those found in other solid-state benchmarks. Previous DFT cluster and/or GIPAW benchmark studies in molecular crystals found errors of B0.3 ppm for hydrogen 7 and B1.5-2 ppm for carbon. 7,14,91 Fewer statistics on large sets of molecular crystals are available for 15 N and 17 O nuclei. Holmes et al. 14 report rms errors of B15 ppm for principal components of 15 N shielding tensors, compared to errors of less than B5 ppm for isotropic shifts here. Isotropic shifts are generally much easier to predict correctly, and the ratio of B3 between the errors in the principal components and isotropic shift is consistent with the analogous ratios observed in 13 C benchmarks. 13 17 O isotropic shifts spanning a broad range of organic and inorganic environments. 56 Our results are also compatible with smaller-scale GIPAW studies on nitrogen 92-94 and oxygen nuclei. 92,93,[95][96][97] See the review article by Bonhomme et al. 6 for additional GIPAW examples.
In summary, the results presented in these sections clearly demonstrate excellent performance for fragment-based NMR chemical shielding calculations for 1 H, 13 C, 15 N and 17 O isotropic chemical shieldings. Hybrid functionals like B3LYP and PBE0 consistently out-perform GGA functionals for all four nuclei. Statistically, the use of hybrid functionals allows fragment-based approaches to nearly match ( 17 O) or improve upon ( 1 H, 13 C, and 15 N) the quality of isotropic chemical shifts obtained with GIPAW PBE. Coupled with the general utility of Gaussian basis set wave function-based ab initio shielding calculations, the present work provides compelling support for more widespread use of fragment-based methods.
6 Statistical cross-validation of the regression models When training a model against a set of benchmark results, as is done here for the linear regressions that convert absolute chemical shieldings to observed chemical shifts (eqn (8)), there is always the danger that the fit parameters will not transfer well to systems that were not part of the training set. Accordingly, it is important to validate the performance of the models on species not included in the training set.
Instead of arbitrarily partitioning the test sets collected here into distinct training and validation subsets, we assess the robustness of the linear regression parameters obtained here using exhaustive N-choose-k cross validation trials for each nucleus and density functional. Here, N is the number of crystals in the benchmark set, and k is the number of crystals in the validation subset. The cross-validation procedure then considers all possible ways of partitioning the N crystals into N À k training crystals and k validation crystals. For example, the oxygen set contains N = 15 crystal structures. Setting k = 5 results in 3003 possible combinations in which the regression model is trained on 10 of the crystal structures and then validated on the remaining 5 structures. Comparing the statistical performance of the model over all 3,003 combinations to the original value obtained by fitting to the entire benchmark set provides a measure of the robustness of the linear regression models. This same approach was applied previously to our carbon test set. 13 That earlier work found that k values between 5 and 9 gave very similar cross-validation results in the 25 crystal 13 C test set. Table 2 summarizes the mean rms errors and linear regression parameters obtained across the thousands of fits. Standard deviations in the mean linear regression parameters are also reported. Comparison of these values with those obtained from the original fits against the entire benchmark sets (listed in Table 1) provides a measure of robustness of the regression models and chemical shift error estimates.
The mean cross-validation rms error for 13 C and 15 N nuclei across all functionals is only B1-5% larger than the corresponding value obtained by fitting against the entire set. Larger increases in the rms error are observed for 1 H and 17 O nuclei, ranging from B6-8% and B9-15%, respectively. These differences likely result from a combination of test set size, inherent challenges in predicting chemical shieldings for certain nuclei, as well as uncertainties in the experimental data. Note too that the overall cross-validation behaviors of the fragment and GIPAW approaches are qualitatively similar.
Comparing the mean linear regression parameters obtained via cross-validation (Table 2) with those reported in Table 1 also reveals close agreement in the regression parameters. The mean slopes differ only in the third decimal place, while the intercepts generally agree to within a few hundredths of a ppm. The standard deviations in the cross validation parameters are roughly one order of magnitude larger. For 13 C, 15 N and 17 O, these uncertainties in the regression model parameters have minimal effect on the predicted chemical shifts. Proportionally, the uncertainties in the 1 H scaling parameters are much larger, however, the relatively uniform rms errors for each partitioning indicate a degree of compensation between the slope and intercept.
Overall, these tests demonstrate the robustness of the regression model parameters obtained in the benchmarks here. The reported scaling factors in Table 1 should prove useful in future molecular crystal studies or in other cases such as biological systems where the explicit treatment of the environment is necessary for accurate chemical shift prediction. The next section discusses four illustrative applications which apply these scaling parameters to systems which were not included in the benchmark sets.

Applications
The purpose of developing the regression models in the previous sections is to enable chemical shift prediction in new systems which are not included in the test sets. The following sections provide example applications of the regression models for each of the four nuclei studied here. In assessing the quality of the predictions, one should consider the distributions of errors observed in the benchmark test sets. As shown in Fig. S5 in the ESI, † the error distributions are somewhat Gaussian, particularly for the larger test sets. Notably, 65-75% of the errors fall within one standard deviation of the mean (zero error), and 92-97% of the errors fall within two standard deviations. These percentages compare favorably with the 68% and 95% probabilities expected for data lying within one and two standard deviations in an ideal normal distribution.
Accordingly, as a rule of thumb, we consider any predicted shift that lies within twice the root-mean-square error of the experimental value to be in reasonable agreement. For example, based on two-body fragment (or cluster/fragment for oxygen) data from Table 1, this means that individual PBE0 shift errors of 0.68 ppm for 1 H, 3.0 ppm for 13 C, 8.4 ppm for 15 N, and 15.1 ppm for 17 O should be considered acceptable. Larger errors will sometimes occur, of course, but relatively rarely (B5% of the time). When considering larger sets of predicted shifts, one might similarly hope that the rms errors would be similar in magnitude to the test set rms errors. One could attempt to make this latter argument more rigorous using a w 2 test, for instance, but we do not do so here. Table 2 Cross-validation analysis for isotropic shifts. Data reported for 1 H, 13 C and 15 N nuclei was obtained using the charge-embedded two-body fragment method (R 2b = 6 Å, R emb = 30 Å). Cluster/fragment results are reported for 17 O (R 2b = 6 Å, R emb = 30 Å, 4 Å cluster). Uncertainties in the slope and deviation represent the standard deviations in the fit parameters observed over all cross-validation fits

9-tert-Butyl anthracene ester
The 9-tert-butyl anthracene ester (9-TBAE) has been the subject of recent experimental interest in light of its unique photochemical properties. Previous work demonstrated a photodimerizationinduced expansion of the molecular crystal nanorods of up to 15%. 98,99 Although initial evidence characterized the photodimerized product as a metastable intermediate, the mechanism for the expansion remains unclear. In the present work we present tentative 1 H and 13 C assignments in the heteronuclear correlation (HETCOR) spectra for the 9-TBAE monomer (Fig. 7).
A previous X-ray diffraction study of the 9-TBAE crystal structure revealed two distinct configurations of the t-butyl side chain with a 69%/31% occupancy ratio at low temperature, which disappears above 100 K. 99 Hydrogen-only plane wave geometry optimizations with fixed lattice parameters were performed on each configuration (see Section 3.2 for further details). Fragment-based NMR chemical shielding calculations using the PBE0 functional and mixed basis described above with a 6 Å two-body cut off were performed on both optimized crystal structures. The predicted chemical shieldings were converted to chemical shifts using the test-set-derived PBE0 scaling parameters presented in Table 1. The predicted chemical shifts for each configuration were then weighted according to the site occupancy to obtain final predicted chemical shifts for assigning the spectrum. The methyl carbon and its hydrogen shifts were also averaged under the assumption of fast dynamics.
The upper panel of Fig. 7 illustrates the HETCOR spectrum with the experimental peaks labeled in red and the corresponding predicted peaks in blue. While unambiguous assignment of C Me , C 1 , C 2 and H Me is possible upon inspection of Fig. 7, the aromatic region requires more careful analysis. The lower panel in Fig. 7 shows an expansion of the aromatic region of the HETCOR spectrum. The experimental 13 C shifts are indicated as dashed red lines. Predicted cross-peaks to carbon are illustrated using green triangles for directly bound hydrogen atoms and purple for nearest-neighbor hydrogens. The predicted cross-peaks agree well with the experimental HETCOR spectrum, with the largest 13 C error approximately B2.5 ppm.
Tables 3 and 4 present tentative spectral assignments based on the fragment NMR calculations. Most carbon shifts in the aromatic region can be plausibly assigned based on the sequential ordering of the predicted carbon shifts. Two pairs of carbon atoms, C 8 /C 10 and C 4 /C 9 , are difficult to resolve unambiguously due to their very similar experimental shifts, but our predicted shieldings provide candidate assignments. C 3 has no directly bound hydrogen and should exhibit weak cross-peaks to neighboring protons, so it was assigned to the weak peak at 129.22 ppm. C 4 and C 9 exhibit similarly weak cross-peaks, but were assigned to the larger peak at 127.70 ppm due to their predicted overlapping resonances. Based on these assignments, two-body fragment PBE predicts an overall 13 C rms error of only 1.57 ppm. GIPAW PBE calculations lead to the same assignments, albeit with a larger 13 C rms error of 2.17 ppm relative to experiment.
Given the degree of overlap in the proton resonance for the aromatic region, spectral assignment of the 1 H shifts proves more difficult. Table 4 lists the prominent, experimentally observed cross peaks in the HETCOR spectrum. Combining the carbon assignments and the assumption that the resonances will be dominated by contributions involving either directly  bonded or nearest-neighbor hydrogen atoms, we propose the possible assignments listed in the table. Based on the assignments for C 8 and C 10 , for instance, the two cross peaks at 6.9 ppm for 1 H likely correspond to H 8 /C 8 and H 10 /C 10 correlations, perhaps with smaller contributions from dipolar couplings to nearestneighbor hydrogens (e.g. H 7 /C 8 , H 10 /C 8 , and H 8 /C 10 ). As shown in Fig. 7 and Table 4, the predicted two-body fragment PBE0 1 H shifts for these assignments are generally within a few tenths of a ppm of the experimental values. Performing the analysis with 1 H GIPAW PBE shifts instead of two-body PBE0 leads to the same qualitative cross-peak assignments. For the hydrogen shifts associated with the methyl, C 6 , and C 7 atoms, GIPAW PBE performs comparably well or up to a few tenths of a ppm better than two-body PBE0. On the other hand, GIPAW PBE underestimates the hydrogen shifts by B0.5-1 ppm for cross-peaks associated with C 3 , C 4 , C 5 , C 8 , and C 10 . Overall, the 1 H and 13 C isotropic chemical shifts predicted with either the two-body fragment PBE0 model or GIPAW PBE can help assign the HETCOR spectrum of 9-TBAE. However, the fragment PBE0 predictions provide moderately better agreement with experiment.

15 N Chemical shift predictions in histidine co-crystals
Hydrogen bonding between imidazole and carboxylate moieties occurs frequently in biological systems. Given that both functional groups have similar pK a values, enhanced proton mobility is often observed. 100 In an effort to characterize these ubiquitous interactions better, a recent study used solid-state NMR spectroscopy to probe the magnetic properties of a collection of histidinecontaining molecular co-crystals. 100 Here, we investigate the ability of our fragment-based chemical shift predictions to discriminate among the two imidazole ring nitrogens and between the different crystal environments found in four such co-crystals: L-histidine perchlorate (H1), L-histidine monohydrochloride monohydrate (H2), L-histidine hydrogen oxalate (H3), and L-histidine hydrogen oxalate co-crystals (H4). The hydrogen bonding and CSD reference codes for these crystals are shown in Fig. 8.
Each co-crystal was subjected to an all-atom geometry optimization using fixed lattice parameters, as described in Section 3.2. Both GIPAW PBE and fragment PBE0 chemical shielding calculations were performed on each of the optimized structures. The raw shieldings were scaled according to eqn (8) using the test-set derived scaling parameters reported in Table 1.
A comparison of the experimental and predicted 15 N isotropic shifts for each of the histidine co-crystals is given in Fig. 9. The predicted 15 N isotropic shifts systematically overshoot the experimental values (Fig. 9) by between 3-9 ppm for fragment PBE0 and 2-10 ppm for GIPAW PBE. The fragment PBE0 rms error of 6.9 ppm is larger than the nitrogen test set rms error of 4.2 ppm (Table 1), but it still lies well within the expected error distribution. In fact, close examination of the predicted shifts for comparable sp 2 hybridized nitrogen atoms hydrogen-bonded to a carboxylate group in the nitrogen benchmark set (e.g. see structures 3, 5, 6, 7, and 9 in the ESI †) reveals a similarly large rms error. In other words, these particular nitrogen environments prove slightly harder to model than other Table 4 Possible cross-peak assignments of the 9-TBAE HETCOR spectrum, based on the carbon assignments from Table 3, along with the corresponding predicted 1 H chemical shifts. All shifts in ppm. Listed atom pairs correspond to hydrogens either directly bonded to the carbon or on nearest-neighbor carbons ones. For comparison, cluster/fragment PBE0 predicts the shifts with an rms error of 6.1 ppm, while GIPAW PBE gives an rms error of 6.6 ppm.
As a side note, the systematic nature of the over-estimation of the chemical shifts with all three methods means that one could reduce the rms errors by directly fitting the predicted shieldings to the experimental shifts via eqn (8). For example, doing so with the two-body PBE0 data reduces the error to 2.7 ppm. Of course that approach lacks the transferability of the scaling models developed here.
The differences between the d and e N chemical shifts in a given co-crystal range from 3-13 ppm. Except for L-histidine perchlorate (H1), the predicted chemical shifts correctly order the d and e nitrogen shifts, which would be important when using these calculations to assign the histidine nitrogen features. H1 exhibits the smallest difference between the two nitrogen shifts (3.4 ppm), and both the fragment PBE0 and the GIPAW PBE results incorrectly order those shifts.
A more stringent test comes from considering the ordering of the chemical shifts across all four co-crystals. Fragment PBE0 orders most of the shifts correctly, except for those in H1 and the e nitrogen in H4. The d nitrogens in H2, H3, and H4, all of which exhibit short N-O distances, are over-estimated by 8-9 ppm, while the e nitrogen in H4 has the smallest error of 3 ppm, leading to the incorrect ordering. GIPAW does better for H4 e , correctly predicting that it occurs further downfield from H2 e and H3 d . Overall, despite imperfect reproduction of the qualitative trends, the predicted chemical shifts derived from the nitrogen scaling parameters provide a useful tool for investigating how chemical environment impacts these 15 N chemical shifts.

Benzoic acid
In the crystalline state, benzoic acid molecules form symmetric carboxylic acid dimers (Fig. 10). There are two possible configurations for the hydrogens in the carboxylic acid dimer, and these configurations inhabit slightly different environments in the crystal, 101 as shown in Fig. 10. In configuration A, the protonated oxygen is closer to the meta hydrogen in the  neighboring co-planar dimer, while in configuration B it is closer to the ortho hydrogen. Given four oxygens per dimer and two unique dimer configurations, one might expect up to eight oxygen chemical shifts. However, the oxygens diagonal to one another within a given dimer are equivalent by symmetry, which reduces the number of potential shifts to four. At room temperature, fast proton exchange further dynamically averages over configurations A and B. In the end one expects to observe two unique 17 O shifts. The experimental NMR spectrum shows a single, broad 17 O peak at 230 ppm under magic angle spinning conditions. 53 The carboxyl carbon appears at 173-174 ppm. 102,103 Such dynamical effects can be modeled by computing the chemical shifts for the two different possible proton positions and Boltzmann averaging over them. This has been done for the carboxylic acid dimers in aspirin and salicyclic acid, 104 for instance. We perform similar analysis here. The chemical shift tensors were averaged over configurations A and B according to: s O2 = P A s A C-OH + P B s B CQO (10) where P A and P B are Boltzmann weights for the two configurations, To test the performance of the scaling models derived here, configuration A was generated by optimizing the experimental benzoic acid crystal structure (CSD code BENZAC02) using periodic PBE-D2 (all atom, fixed lattice parameters). Configuration B was generated by transferring the hydrogens to the opposite oxygens and re-optimizing. The resulting energies predict that configuration B is more stable by 1.13 kJ mol À1 , so P A = 0.387 and P B = 0.613 at 298 K. Fragment, cluster/ fragment, and GIPAW NMR chemical shielding calculations were then performed using each density functional. For each model, the final isotropic chemical shifts were obtained by diagonalizing the tensors obtained from eqn (9)-(11), averaging the principal components to obtain isotropic shieldings, and converting the shieldings to shifts according to the linear regression parameters reported in Table 1. Fig. 10 shows the two unique 17 O chemical shifts predicted for each of the two configurations at the cluster/fragment PBE0 level. The shifts resulting from room-temperature Boltzmann-averaging according to eqn (9)-(12) are summarized in Table 5 for all six functionals using the fragment and cluster/fragment models. As expected from previous results 13 and the discussion in Sections 5.1 and 5.2, the carboxyl carbon shift predicted with each functional varies by only B0.1-0.3 ppm between the fragment and cluster/fragment methods. The fragment approach shifts differ from the GIPAW PBE value by 1-1.5 ppm, but they still lie within 1-2 ppm of the experimental 13 C shift. Similarly good agreement is seen for OPBE, PBE0, and B3LYP.
Analysis of the oxygen atoms is slightly more difficult, since the experimental study did not resolve the two distinct shifts. Nevertheless, the predicted two-body fragment PBE0 shifts (213 and 231 ppm) and the cluster/fragment PBE0 shifts (214 and 237 ppm) are reasonably consistent with the broad experimental peak assigned to 230 ppm. They are also within a couple ppm of the GIPAW PBE predictions of 212 and 238 ppm. Similar shifts are obtained with fragment-based approaches for most of the other functionals, too. In summary, the fragment approaches predict these carboxylic acid atom shifts in good agreement with both GIPAW and experiment.

C-nitrosoarene complex
C-nitroso compounds represent an important class of organic compounds with chemical, biological and pharmaceutical relevance. 30,105 Of particular interest for the present work is the wide span of chemical shieldings nitrosoarene-metal complexes display. For instance, p-NMe 2 C 6 H 4 15 NO, p-[ 15 N]-nitroso-N,Ndimethylaniline (NODMA) has one of the largest 15 N chemical shift anisotropies known. 106,107 In the present work, we examine the 17 O chemical shielding for SnCl 2 (CH 3 ) 2 (NODMA) 2 , hereafter referred to as compound 2 and depicted in Fig. 11. Strong tinoxygen interactions give rise to an experimental 17 O isotropic chemical shift of 717 ppm relative to liquid water. 30 The largest chemical shift included in the 17 O test set is approximately 350 ppm, therefore the chemical shift for compound 2 lies well outside the range of shifts which comprise the test set. This test case allows us to simultaneously assess the accuracy of fragment methods in the context of organometallic molecular crystals and the accuracy of the scaling parameters when applied to chemical environments significantly different from those included in the test set. Compound 2 exhibits static disorder about the oxygennitrogen bond. The two dominant configurations were observed with occupancy rations of about 3 : 1 in the X-ray crystallography. We performed all-atom optimizations with fixed lattice parameters on each of the three possible crystal structures for compound 2 (CSD code BISVII01) as outlined in Section 3.2. The two minor configurations had stabilities of +2.5 and +9.9 kJ mol À1 relative to the major one, and the corresponding room temperature Boltzmann factors suggest populations of approximately 72%, 26%, and 1%, in good agreement with the 3 : 1 ratio cited experimentally for the two most important structures. The original experimental study of this nitrosoarene ascribes the oxygen shift of 717 ppm to the dominant configuration, 30 so we focus on that structure here. Though oxygen chemical shift calculations benefit appreciably from explicit treatment of many-body effects via the cluster/fragment approach, pairwiseonly fragment calculations with a 6 Å two-body cut off and electrostatic embedding were performed instead due to the large size of this system. Table 6 reports the absolute and scaled isotropic shifts and the error in the predicted shifts relative to the experiment.
First, we observe that hybrid functionals reproduce the 17 O isotropic shieldings to within B30 ppm, while GGA functionals exhibit errors nearly double that. Interestingly, the TPSS and TPSSh notably out-perform the other functionals here, with errors of only 25 and 17 ppm, respectively. The B30 ppm errors for the hybrid and B20 ppm errors for the meta-GGA/hybrid functionals are large but are plausibly within the tails of the error distributions one expects for oxygen with the two-body fragment method based on the test set results (cf. Fig. 3h). On the other hand, GIPAW PBE dramatically overestimates this shift by 64 ppm, which is an order of magnitude larger than the GIPAW oxygen test set rms errors.
Second, the errors reported in Table 6 reflect contributions from both the fragment DFT calculations and the extrapolation of the scaling parameters well-outside the range for which they were fitted. Although the statistical cross-validation studies demonstrate low sensitivity of the regression parameters to the choice of fitting set, those studies included only shifts in the B50-350 ppm range. If, hypothetically, one were to expand the test set to include compound 2, comparing the resulting linear regression parameters with those from the initial test set would provide further insights into the robustness of the regression parameters. Fig. 12 illustrates the differences in the linear regression parameters for the 17 O test set upon including compound 2. The magnitude of the differences in regression parameters in Fig. 12 directly correlate with the magnitude of the errors given in Table 6. Accordingly, both the TPSS and TPSSh density functionals demonstrate the smallest variations in the regression parameters upon including compound 2. With these new regression parameters for the cluster/fragment model, individual predicted shifts in the oxygen test set would vary by an rms B2 ppm. However, the overall rms errors versus experiment would increase by only a few tenths of a ppm.
These results raise the question of whether compound 2 should be included in the test set. Its inclusion would improve the prediction for its oxygen chemical shift (Table 7) and would extend the range of oxygen chemical shifts upon which the Table 6 Predicted isotropic 17 O chemical shifts (in ppm) for compound 2 using a two-body fragment method and scaling parameters reported in  regression parameters are included. That could improve the realm of applicability of the regression parameters. On the other hand, due to its extreme 717 ppm chemical shift, this single data point has an outsized effect on the regression line through the more typical B50-350 ppm range of oxygen shifts. For that reason, we excluded this shift from the test set. Even without including this particular shift in the test set, the B20-30 ppm errors obtained for this oxygen with the fragment approach remain tolerable, demonstrating the broad range of applicability of the test-set-derived parameters. Of course, if one is interested in chemical shifts outside the more typical range, one might wish to include compound 2 and perhaps other similar species in constructing the regression model.

Conclusions
In conclusion, a series of benchmark calculations assessing the performance of fragment, cluster and combined cluster/ fragment models for predicting 1 H, 13 C, 15 N, and 17 O isotropic chemical shifts in molecular crystals using a variety of density functionals have been carried out. Test sets have been assembled for each nucleus which enable one to validate chemical shift predictions against experiment. The following key conclusions can be drawn from this work. Fragment, cluster, and combined cluster/fragment methods using a 6 Å two-body cut off and a 4 Å cluster size demonstrate comparable performance for 1 H, 13 C, and 15 N isotropic shift prediction. However, local many-body effects result in improved accuracy (B2 ppm reduction in rms error) for predicted 17 O isotropic shifts using a cluster-based approach. Accordingly, the two-body fragment approach can be used for most applications, but a cluster/fragment approach should be used for oxygen when higher accuracy is needed.
Hybrid functionals out-perform GGA-type functionals for 1 H, 13 C, 15 N, and 17 O nuclei, regardless of the model used.
Among the functionals tested, the hybrid functionals PBE0 and B3LYP stand out as the functionals of choice for modeling NMR chemical shifts. Whether the improved performance of hybrid functionals is seen for other NMR active nuclei will be explored in future work. The benchmarks here do not reproduce earlier findings which found particularly good performance for the TPSS and TPSSh meta-GGA/hybrid functionals. Those functionals do perform particularly well for the unusual oxygen chemical shift in nitrosoarene compound 2, though it is unclear how broadly one should generalize from that single data point.
The fragment-based approaches exhibit accuracy that is highly competitive with GIPAW. Two-body fragment PBE0 and B3LYP out-perform GIPAW PBE on the 1 H, 13 C, and 15 N test sets by 20-30%. For 17 O, where many-body effects are particularly important, the cluster/fragment PBE0 and B3LYP models produce rms errors that are about half a ppm (8%) worse than those from GIPAW PBE.
Linear regression parameters mapping absolute chemical shieldings to observable chemical shifts have been provided for six different density functionals (OPBE, PBE, TPSS, PBE0, B3LYP, and TPSSh) using both fragment and cluster/fragment methods and the locally dense 6-311+G(2d,p)/6-311G(d,p)/ 6-31G basis combination. Mean scaling parameters obtained via statistical cross-validation (Table 2) are in excellent agreement with those obtained directly from the test set (Table 1), with only small variations in the regression parameters for any of the nuclei and density functionals included in the analysis. For general applications, we recommend the use of the linear regression parameters presented in Table 1.
The applicability of these regression parameters for each nucleus to systems not included in the test set was demonstrated on several systems, including assignment of the 1 H and 13 C HETCOR spectrum of 9-TBAE, investigation of the nitrogen chemical shifts in histidine co-crystals, analysis of the carboxylic acid group shifts in benzoic acid, and prediction of the unusually far downfield oxygen chemical shift of oxygen bound to a tin atom in a C-nitrosoarene.
These models provide practical, high-accuracy alternatives to existing plane wave methods in organic molecular crystals. In the future, it will be interesting to explore the performance of these models more widely. Fragment methods generally perform well in systems with sufficiently large band gaps, [108][109][110] and so one might expect the chemical shift prediction methods developed here to be effective in a variety of non-metallic systems. On the other hand, some nuclei (like 17 O) are more sensitive to many-body effects, so the impact of the truncating the many-body expansion should be tested on other nuclei.