Benchmark quantum-chemical calculations on a complete set of rotameric families of the DNA sugar–phosphate backbone and their comparison with modern density functional theory

The DNA sugar–phosphate backbone has a substantial influence on the DNA structural dynamics. Structural biology and bioinformatics studies revealed that the DNA backbone in experimental structures samples a wide range of distinct conformational substates, known as rotameric DNA backbone conformational families. Their correct description is essential for methods used to model nucleic acids and is known to be the Achilles heel of force field computations. In this study we report the benchmark database of MP2 calculations extrapolated to the complete basis set of atomic orbitals with aug-cc-pVTZ and aug-cc-pVQZ basis sets, MP2(T,Q), augmented by D CCSD(T)/aug-cc-pVDZ corrections. The calculations are performed in the gas phase as well as using a COSMO solvent model. This study includes a complete set of 18 established and biochemically most important families of DNA backbone conformations and several other salient conformations that we identified in experimental structures. We utilize an electronically suﬃciently complete DNA sugar–phosphate–sugar ( SPS ) backbone model system truncated to prevent undesired intramolecular interactions. The calculations are then compared with other QM methods. The BLYP and TPSS functionals supplemented with Grimme’s D3(BJ) dispersion term provide the best tradeoﬀ between computational demands and accuracy and can be recommended for preliminary conformational searches as well as calculations on large model systems. Among the tested methods, the best agreement with the benchmark database has been obtained for the double-hybrid DSD-BLYP functional in combination with a quadruple-z basis set, which is, however, computationally very demanding. The new hybrid density functionals PW6B95-D3 and MPW1B95-D3 yield outstanding results and even slightly outperform the computationally more demanding PWPB95 double-hybrid functional. B3LYP-D3 is somewhat less accurate compared to the other hybrids. Extrapolated MP2(D,T) calculations are not as accurate as the less demanding DFT-D3 methods. Preliminary force field tests using several charge sets reveal an almost order of magnitude larger deviations from the reference QM data compared to modern DFT-D3, underlining the challenges facing force field simulations of nucleic acids. As expected, inclusion of the solvent environment approximated by a continuum approach has a large impact on the relative stabilities of diﬀerent backbone substates and is important when comparing the QM data with structural bioinformatics and other experimental data.


Introduction
Nucleic acids, essential biomacromolecules of cellular life, have numerous functions many of which are still not fully appreciated or are completely unknown.While the primary and the most crucial role of a 2 0 -deoxyribonucleic acid (DNA) is to preserve, protect and transfer inherited genetic information, the pool of biological functions for which ribonucleic acid (RNA) is accountable is much larger and by far still not exhaustively explored.The independent monomeric units of nucleic acids are called nucleotides and consist of three chemically diverse compounds: an aromatic nucleobase of monocyclic pyrimidine (C, T, U) or a polycyclic purine (A, G) frame, a five-membered sugar moiety, and a negatively charged phosphate group, which imparts a nucleotide with acidic properties.While a nucleobase is linked to C1 0 via a glycosidic bond, the phosphate group is covalently attached to the sugar's O5 0 through the bridging phosphodiester linkage.The principal difference between DNA and RNA is that DNA nucleotides incorporate 2 0 -deoxyribose sugar, while RNA incorporate ribose.The nucleotides are merged together by esterification reaction between a C3 0 hydroxyl group of one nucleotide and a phosphate group of another to form a directional linear chain.Due to the intrinsic asymmetry of nucleotides a polynucleotide thread has two distinguishable terminals labeled 5 0 and 3 0 .Labeling of nucleotides within a sequence goes from 5 0 to 3 0 , i.e. the first nucleotide is at the 5 0 end, while the last one at the 3 0 end.A polynucleotide chain is thus formed of two elements, a sequence-independent and chemically monotonous sugarphosphate backbone and an ordered succession of nucleobases attached to sugar rings, which determines specific structural and interaction properties of a given molecule.The major part of the striking structural diversity of RNA is due to the H-bond capability of the C2 0 hydroxyl group of a ribose.Also the DNA show considerable structural variability, ranging from local variability of B-DNA to non-canonical structures and higherorder structures in chromatin. 1The structure and dynamics of nucleic acids result from a delicate balance of numerous energy contributions.Even though it has been often assumed that the sugar-phosphate backbone plays a rather passive role and the main role was ascribed to nucleobases, 2,3 it is now obvious that its intrinsic backbone conformational preferences belong to the most important factors in structuring nucleic acids. 4onsidering the high number of consecutive single bonds in the sugar-phosphate backbone allowing a substantial freedom for dihedral rotations, systematic exploration of the backbone conformational space is difficult.Nevertheless, to fully comprehend the nucleic acids structure, conformational behavior of the sugar-phosphate backbone needs to be understood.
The conformational space of a phosphodiester DNA sugarphosphate backbone is defined by six torsion angles a, b, g, d, e, and z (Fig. 1). 1 The a torsion is defined as O3 0 (n À 1)-P-O5 0 -C5 0 , b as P-O5 0 -C5 0 -C4 0 , g as O5 0 -C5 0 -C4 0 -C3 0 , d as C5 0 -C4 0 -C3 0 -O3 0 , e as C4 0 -C3 0 -O3 0 -P(n + 1), and z as C3 0 -O3 0 -P(n + 1)-O5 0 (n + 1), where (n + 1) denotes atoms of the subsequent nucleotide.It is customary to describe the backbone torsional angles of B601 as gauche+ (g+), of B3001 as gaucheÀ (gÀ), and of B1801 as trans (t).The DNA conformation is further affected by puckering of the 2 0 -deoxyribose sugar ring.This is described by five internal torsions t 0 -t 4 , which can be replaced by only two internal degrees of freedom called the pseudorotation phase angle (P) and the pucker amplitude (t max ). 1,5The sugar conformations are commonly expressed using suffixes X 0 -endo and X 0 -exo, where X 0 is one of the sugar ring embedded atoms.The former indicates that the given atom is displaced on the same side of the ring as the C5 0 atom, while the latter means that the given atom is displaced on the opposite side.While C2 0 -endo (P B 1621) is typical for canonical B DNA structure, the C3 0 -endo (P B 181) pucker is characteristic of the A DNA double helix.The inherent flexibility of the five-membered ring is an important aspect in understanding the structural properties of furanose sugar moieties.
DNA double helix exists in three main conformations: B DNA, A DNA and Z DNA. 1 The sugar pucker in B DNA is typically in the C2 0 -endo conformation, though with broad distribution.The B form backbone has two major conformers: BI and BII. 6hese two arrangements are related to the variability of the e and z torsion angles, which pass from (B1801, B3001) in BI to (B3001, B1801) in BII.DNA can adopt other structures, such as single-stranded hairpins, 7 triple helices, 8 three-and four-way junctions, 9,10 four-stranded G-quadruplexes (G-DNA), 11 Fig. 1 A dinucleotide unit consists of two nucleic acids residues i and i + 1, and is defined from phosphate to phosphate.Its conformation is therefore described by 12 torsion backbone angles (a to z + 1).A suite unit spans also two nucleotides, but it goes only from sugar to sugar, consisting thus of the following seven torsion angles: d, e, z, a + 1, b + 1, g + 1, d + 1. 88 Orientation of the nucleobase attached to C1 0 is defined by the w torsion angle (left).The SPS (sugar-phosphate-sugar) model system used in the present work (center) is derived from the suite unit by cutting off the nucleobases.The T3PS model system (right) suggested by Mackerell in ref. 17.This journal is c the Owner Societies 2013 Phys.Chem.Chem.Phys., 2013, 15, 7295--7310 7297 i-DNA quadruplexes, 12 or parallel helices. 13Well studied and biochemically relevant are mainly G-DNA molecules which often show unusual but recurrent backbone conformations, some of which are considered in our study.
6][27][28][29] The CHARMM force field 30 dihedral parameters of individual furanose conformations were recently developed 31 by fitting to over 1700 quantum mechanical conformational energies.The last degree of freedom of a nucleotide is represented by the rotation around a glycosidic bond and is characterized by the torsion angle w.There are two populated substates of w torsion named (high) anti (w B 2501 in DNA) and syn (w B 601).
Since the anionic and highly flexible sugar-phosphate backbone is a rather challenging task for QM techniques, careful selection of methods and sets of basis functions is imperative to obtain reliable results.Accurate calculations can, for small systems, be acquired using high-level reference (benchmark) methods like coupled clusters (CC). 324][35][36][37][38][39][40][41][42][43][44][45][46][47] However, extensive characterization of the complex potential energy surface of the nucleic acids backbone and studies of larger fragments of nucleic acids require the use of more efficient QM methods.The present-day density functionals supplemented with empirical dispersion corrections [48][49][50] often yield outstanding results for a fraction of time and thus appear to be a promising way to study the nucleic acids backbone.
In our previous study 14 we introduced the so-called SPSOM and SPM model compounds to study electronic structure and energetics of selected DNA backbone a/g conformational substates.Subsequently we reported benchmark QM computations 24 followed by testing of other methods on 22 individual DNA backbone conformations, covering three distinct conformational space regions (families), namely the canonical B DNA region with a/g B gÀ/g+, a native g+/t substate occurring in the stem-loop junction of a parallel stranded human telomeric sequence G-DNA and an essentially pathological g+/t conformation, which has been populated in explicit solvent simulations with older versions of the Cornell et al. force field. 26Note that the two a/g g+/t substates differ in combination of the remaining backbone dihedrals and thus represent distinct conformational families.An essentially similar approach was adopted also in the work of Mackerell 17 where three canonical A, BI, and BII conformations of DNA were studied using a so-called T3PS model system.It should be noted that two-sugar SPSOM and T3PS models are similar but not identical to the SPS model we use in the present work.
The common attribute of the above-mentioned studies is QM analysis of backbone conformations pertaining to only several families or Taylor expansion-like torsional scans around canonical values.However, while the relative energies ordering of conformers within a given family is usually well reproduced by the majority of methods, energetical comparison between different families is more difficult.Therefore, in this study, we provide reference QM data for essentially all known DNA backbone families.These include the complete set of the 18 most significant DNA backbone families identified by structural bioinformatics 51 as well as several additional experimental geometries we confidently found in specific noncanonical DNA forms which were not classified in the preceding bioinformatics study.Our conformation set is highly diverse and samples all naturally populated conformational domains.The calculations are done in the gas phase as well as using a COSMO continuum solvent model.The calculations are compared with a set of modern DFT functionals, while we also comment on preliminary testing of the MM force field.

Geometrical optimizations
The structures derived as explained below have been partially optimized using the local meta-GGA functional TPSS 52 and by applying the new D3 London dispersion correction with Becke-Johnson damping, 49,53 abbreviated as TPSS-D3.To preserve the experimentally determined conformations, constraints on six consecutive backbone torsion angles (d, e, z, a + 1, b + 1, g + 1, d + 1) have been imposed (Fig. 1, center and Table 1).Note that without such constraints, it would be impossible to keep the desired rotameric family. 4,14,24The conformations were optimized separately in the gas phase and using the COSMO solvent model 54,55

(see below).
A large all-electron Gaussian basis set def2-TZVPPD 56 has been used in the optimizations.This basis set is based on the Karlsruhe segmented contracted basis set of a triple-z valence quality (def2-TZVPP) 57,58 and is augmented with a small number of moderately diffuse functions (D) important for weakly bound electrons in negatively charged molecules.0][61][62][63][64] Due to the anionic character of the SPS model system (charge = À1) converged wave functions were validated by manual inspection.All calculations employ the resolution of the identity (RI) approximation. 65,66The numerical quadrature multiple grid m4 for the integration of the exchange-correlation contribution has been used.The convergence threshold of the SCF density matrix has been tightened to 10 À7 a.u. as compared to the default value of 10 À6 a.u.Taking advantage of the efficient and robust optimization algorithms of the Gaussian03 67 software package and the superior scalability of the Turbomole 6.3 code, 68 we have developed a scheme whereby the electronic energy gradients calculated by Turbomole are passed to Gaussian to execute the energetically downhill geometry alteration.The modified coordinates are then passed back to Turbomole and serve as a new input structure for the subsequent optimization cycle.This iterative procedure repeats until imposed convergence criteria are met.
The COSMO continuum solvation model 54,55 was used for a dielectric constant of 78.5 corresponding to the water environment.The atomic van der Waals radii for the molecular cavity construction in COSMO were taken as their defaults in Turbomole 6.3 (i.e., in Å for H: 1.3, C: 2.0, O: 1.72, P: 2.11).

Single point calculations
Energies of the optimized backbone conformers were calculated using several QM methods and sets of basis functions, both in the gas phase and continuum COSMO solvent 54,55 (with the exception of DCCSD(T) corrections, which were calculated in the gas phase only).
MP2/CBS calculations we carried out using the Halkier et al. extrapolation scheme 69,70 with augmented correlation consistent Dunning's basis sets 71,72 (aug-cc-pVXZ, where X = D, T or Q) to obtain MP2 energies at the complete basis set limit (CBS).Even though extrapolation to CBS effectively eliminates both intramolecular basis set superposition (BSSE) and incompleteness (BSIE) errors to a large extent, some residual BSSE/BSIE depreciating the results are likely to remain. 42,73The Hartree-Fock (HF) and the MP2 correlation components are evaluated separately by solving the following equations: where X is the cardinal angular momentum quantum number of the respective basis set (X = 2 for aug-cc-pVDZ, X = 3 for aug-cc-pVTZ, and X = 4 for aug-cc-pVQZ) and d, A and B are parameters.While d = 1.43 and 1.54 for (D,T) and (T,Q) extrapolations, respectively, was taken from the literature, 70 A and B are system-dependent coefficients which are to be determined via solution of the given linear equations.E Corr CBS and E HF CBS are the correlation and HF components of the total electronic energy at the basis set limit, respectively.Analogously E Corr CBS and E HF CBS are corresponding terms obtained using the aug-cc-pVXZ set of basis functions.
In our preceding study we suggested that the DCCSD(T) correction is not needed for calculations of different conformers of the DNA backbone. 24However, as in the present paper we substantially extend the spectrum of the calculated DNA backbone conformers, we augmented the Halkier et al.MP2/ CBS extrapolation scheme (dubbed as MP2(D,T) or MP2(T,Q) herein) by DCCSD(T) correction calculated in a smaller basis set.We refer this composite method as CBS(T) to indicate that the CBS extrapolation has been carried out only at the MP2 level.The most accurate approach in our study is the MP2(T,Q) + DCCSD(T)/aug-cc-pVDZ level, which we refer to as ''Higher Level CBS(T)'' abbreviated as CBS(T) HL .This is the level we selected to be the reference against which the remaining techniques were benchmarked.We have also carried out DCCSD(T) calculations with a smaller 6-31+G(d) basis set, and thus CBS(T) LL abbreviation in our paper stands for the ''Lower Level CBS(T)'' method of MP2(D,T) + DCCSD(T)/6-31+G(d) quality, which has been used (and labeled CBS(T)) in our preceding study. 24Since solvation does not significantly affect the magnitude of DCCSD(T) corrections CBS(T)/COSMO energies were evaluated using MP2/CBS calculated in COSMO and gas phase DCCSD(T) corrections.The MP2 and CCSD(T) calculations were done using Turbomole 6.3 68 and Molpro 2012.1. 74wo GGAs, two meta-GGA, three hybrid, and two doublehybrid functionals were tested in the present study.The GGA level of theory comprises BLYP, 75,76 and the reparameterized variant of PBE, 77 revPBE. 78From the meta-GGA methods the original TPSS 52 along with its recalibrated version oTPSS 79 were assessed.The group of investigated global hybrids, which take a portion of HF exchange into account, consists of B3LYP 76,80 and two thermochemistry oriented functionals, PW6B95 81 and MPW1B95. 82ble 1 Standard conformations (families) of repeating units of DNA.Data were analyzed at the ''suite'' level, i.e. from d to d + 1 torsions plus both glycosidic torsions w and w + 1.Average values of the individual torsions for each class are given, as well as symbolic names of the classes adopted in our study (''Label''), conformational class tags used in ref. 51 (''#''), the number of individual dinucleotide units of each class in the database (''N'') and short detailed annotations (''Description'').Our numerical ordering of the symbolic names (C1-C18) follows the conformational frequency of occurrence (N).All the bioinformatics data are taken from ref. 51

Label
The tested double-hybrids are PWPB95 35 and DSD-BLYP. 83The former one developed by Grimme's group 35 contains a mixture of reparameterized Perdew-Wang and Fock (50%) exchange, Becke95 correlation, and 26.9% of spin-opposite scaled perturbative correlation (SOS-PT2).The latter suggested by Kozuch et al. 83 is basically identical to the B2PLYP double hybrid functional of Grimme, 84 but adds a spin-component-scaled perturbative contribution (SCS-PT2).Both SOS-and SCS-PT2 corrections were calculated on the converged Khon-Sham molecular orbitals without frozen core approximation.With the exception of B3LYP and TPSS, which were considered primarily due to their widespread usage, the above functionals were selected based on their outstanding score at the respective Jacob's ladder rung in the thorough benchmark study of Goerigk and Grimme. 34ith the exception of double-hybrid calculations where a large quadruple-z aug-cc-pVQZ basis set was used for both SCF and PT2 runs, all single point computations were carried out with the def2-TZVPPD set of basis functions.To accelerate SCF calculations of (meta-)GGA density functionals, RI approximation has been employed and corresponding auxiliary basis functions for Coulomb-fitting were used. 65,66While in (meta-)GGA and double-hybrid single point calculations only the numerical quadrature m4 and m5 grid has been utilized, respectively, for hybrid functionals both m4 and m5 grids were tested to appraise the effect of a denser integration grid on the relative energies and computational time.SCF convergence criteria have been tightened to 10 À7 a.u., as in the geometry minimization (vide supra).All remaining computational parameters were kept at their defaults.The DFT calculations were either carried out with the modified version of Turbomole 5.9 or with the original version of Turbomole 6.3. 68

Molecular mechanics calculations
The molecular mechanics (MM) energies were calculated using the non-polarizable Cornell et al. force field 85 including the parmbsc0 torsional potential revision. 26The energies were evaluated for four different sets of partial charges (see below) both in the gas phase and solvent environment approximated by the Poisson-Boltzmann model (MM-PBSA).Prior to evaluation of the MM energies, TPSS-D3 optimized model systems were relaxed using the respective force field except for the fixed backbone torsions, i.e. the force field energies were derived using the force field geometries.The relaxation was carried out to the default tolerances using the steepest descent technique for the first 250 iterations, followed by the conjugate gradient method.No cutoff was applied.To keep the backbone dihedrals at the values given in Table 1 and to obtain geometries equivalent to those minimized by TPSS-D3, tight restraints with a penalty function of 3000 kcal mol À1 have been imposed.
Since our SPS model system (see below) does not belong to standard residues for which the AMBER library contains precomputed partial charges we derived four sets of charges as follows: (1) We adopted the pre-computed AMBER partial charges and based on chemical intuition manually adjusted the nonstandard ones (e.g. at the artificial 5 0 backbone termination, etc.) considering the symmetries in order to keep the total charge À1.This partial charges set is labeled the ''AMBER'' set.
(2) We derived the restrained electrostatic potential (RESP) charges 86 fitted to the HF/6-31G(d) potential of the gas phase (for MM calculations) and COSMO (for MM-PBSA calculations) TPSS-D3 optimized BI DNA SPS conformation using the default settings, i.e. two-stage procedure with the restrains of 0.0005 and 0.001.This approach is (in the gas phase) basically equivalent to the one utilized for deriving the original AMBER charges, but the calculation is performed for the SPS model.This partial charges set is labeled the ''RESP'' set.
(3) We derived the restrained electrostatic potential (RESP) charges 86 fitted to the HF/6-31G(d) potential of the gas phase TPSS-D3 optimized BI DNA SPS conformation using the modified grid procedure, namely we used an extended grid spacing with 10 layers having a grid density of 17 points per Å 2 and ten times stronger force constants of restraints compared to AMBER defaults, i.e. 0.005 and 0.01 at the first and the second stage of the fit, respectively.This partial charges set is labeled the ''Mod.RESP'' set.Only gas phase charges were derived.
(4) We derived the restrained multimolecular (multiconformational) electrostatic potential (RESP) charges 86 fitted to a set of HF/6-31G(d) potentials of the 18 gas phase TPSS-D3 optimized DNA SPS backbone conformations using the above-mentioned modified procedure.This multimolecular partial charges set is labeled the ''Mod.RESP-MM'' set.The purpose of the multimolecular fit was to reduce bias due to the choice of just a one single geometry in the fit.Only gas phase charges were derived.
All charges are available in ESI † (Table S2).All force field calculations were performed with the AMBER 12.0 suite of programs. 87The MM-PBSA calculations were carried out using MM-PBSA script; Delphi v4 was used for the numerical solution of the Poisson-Boltzmann equation.

DNA backbone model system
There are two main issues of model system selection that need to be addressed when making QM computations on a DNA backbone.First, a truncation is inevitable.Even though a larger model system might give the impression to better mimic the real DNA backbone there are many obstacles associated with it.Clearly, a too large model system might preclude application of accurate and computationally demanding QM methods.However, there is yet another problem well documented in our earlier studies, 14,24 namely, larger model systems have so complex potential energy surfaces (PES) that it is often becoming virtually impossible to get them conformationally under control.Due to the lack of native environment, e.g. a solvent, a nearby strand, etc., large model systems are prone to adopt non-native conformations with non-native intramolecular interactions.Moreover, the anionic nature of the phosphate group would complicate computations where two or more phosphates are present.The second issue is whether the model system should include nucleobase/nucleobases.Although retaining of nucleobases (or substituting them for simpler analogues) makes the model system electronically more complete, inherent backbone conformational preferences become partially obscured by the This journal is c the Owner Societies 2013 nucleobase interactions. 24For all the reasons noted above there is no perfect model for QM studies of the nucleic acids backbone and some compromise needs to be made.The model should be as small as possible to prevent spurious interactions to obtain an unbiased picture of the PES and to be analyzable by high-level QM methods, but sufficiently large to represent the actual physico-chemical nature of the DNA backbone.
In our preceding studies we have used a model abbreviated as SPSOM, i.e., Sugar-Phosphate-Sugar capped on both ends with methoxy (-O-Methyl) groups. 14,24However, in the present study we decided to utilize a slightly reduced model system labeled the SPS model (Fig. 1, center).The core of the SPSOM model, i.e. the sugar-phosphate-sugar backbone tract, is retained.However, to prevent spurious energy-biasing C2 0 HÁ Á ÁO5 0 interactions discussed in ref. 14 and 24 we replaced the original 5 0 terminal methoxy group with a hydrogen atom.Since we are interested in the intervening backbone torsions, such substitution does not have a significant impact on the results.Note that neither of these termini can be considered as biochemically fully relevant.Similar modification has been done at the 3 0 end.While the SPSOM model terminates with a methyl group linked to O3 0 , the SPS model has a hydroxyl group at its 3 0 end.We regard both 5 0 /3 0 terminal modifications as insignificant for relative energy evaluations while they enable us to carry out unbiased analysis of the PES corresponding to a given backbone conformational family.As we stated elsewhere, 14,24 further reduction of the model would break the hyperconjugation network along the phosphodiester linkage and alter the electronic structure.We thus regard the SPS model system to be simple enough to be treatable by high-level techniques and still sufficiently complete to mimic the real DNA sugar-phosphate backbone at the same time.There is another efficient two-sugar DNA backbone model system similar to SPS suggested by Mackerell 17 and called T3PS (Tetrahydrofuran with 3 0 -Phosphate with a capping Sugar).In T3PS the 5 0 -methyl group and the 3 0 -hydroxyl group are substituted for hydrogens (Fig. 1, right).Although this further reduction has only a marginal impact on the rather complex electronic topology we decided to retain C5 0 and O3 0 atoms since their positions define d torsion we fix during minimization.We nevertheless consider both SPS and T3PS models as interchangeable for the purpose of benchmark studies and we do not expect any substantial differences between SPS and T3PS models from an energetical point of view.The T3PS model was successfully used, e.g. for extensive one-dimensional PES scans of backbone torsions excluding d coupled to the sugar pucker and subsequent correlation with crystallographic probability distributions. 17The scans were done using the MP2 method with the cc-pVTZ basis set and unlike the approach adopted in the present study the systematic conformational exploration focused on canonical regions and their vicinities.Both studies nevertheless complement each other, for example, our work provides accuracy assessment of the earlier calculations.Our study is the first QM study directly taking into consideration multi-dimensional correlation among different backbone dihedrals while the preceding study provides scans across the potential energy surface.

Derivation of representative DNA backbone conformations
In the present work we utilize the concept of ''suite'' (Fig. 1). 88A suite is a structural subset of a dinucleotide going from sugar to sugar, containing thus only seven backbone torsion angles (d, e, z, a + 1, b + 1, g + 1, d + 1, see Fig. 1).A suite is a physically meaningful description of a backbone-repeating unit since it is centered around the phosphorus atom.Although the current work is based on the suite structural unit including only the backbone torsion angles, the original DNA bioinformatics analysis considered also the two glycosidic angles w and w + 1 that were shown to play an important role in classifying the local conformations of DNA. 51ost of the structures analyzed in our study come from the preceding bioinformatics analysis of the local conformation space of DNA. 51The clustering analysis 51 is based on the data consisting of dinucleotide units from 415 duplex crystal structures resolved with the resolution of 1.9 Å or better augmented by additional 58 structures with unusual topologies (G-quadruplexes, i-motifs, three-and four-way junctions, etc.).The final analysis considered a set of 4571 individual dinucleotides that were classified into 18 distinct families 51 (Table 1).
Each conformational class is thus characterized by a vector consisting of average values of its backbone and glycosidic torsions (Table 1).For each class, an exemplar dinucleotide with values of suite's backbone torsion angles as close as possible to the average values was identified (ESI, † Table S1), for details see ref. 51.Subsequently, these 18 exemplar dinucleotides have been manually pruned to obtain our SPS (sugar-phosphate-sugar) model systems and all the backbone torsions (i.e.d to d + 1) were then adjusted exactly to the values given in Table 1.The coordinates of each system were then used as the input for consecutive constrained geometry optimizations.

Additional conformations
While the bioinformatics cluster analysis is well suited to identify frequent backbone conformations, it is susceptible to miss some rare, yet biologically significant rotamers.We thus extended our set of 18 average conformations (Table 1) by six auxiliary structures, which have specific non-canonical combinations of backbone dihedrals (Table 2).Three conformations are from antiparallel oxytricha nova telomeric G-DNA 89 (PDB ID: 3NZ7) as we analyzed elsewhere 90 and three structures were found in an intercalated four-stranded DNA 91 (PDB ID: 191D).Regarding the G-DNA structure, we added the SPS model of an outer tetrad dinucleotide (T8-G9, chain B) with a(g+)/g(t) dihedral combination (A1 conformation) and of two inner tetrad dinucleotides (G2-G3, chain B and G10-G11, chain A), which have a(g+)/g(gÀ) (A2) and a(gÀ)/g(gÀ) (A3), respectively.Moreover, the latter a/g combination of A3 has also an unusually high b of B2601.All of these combinations are also tightly coupled with specific glycosidic torsions and pucker values.To reproduce the experimentally observed G-DNA backbone conformations by simulation force fields is not easy. 90n the case of i-DNA, we derived SPS out of chain B successive dinucleotides, namely C5-C6 (A4), C6-C7 (A5), and C7-T8 (A6).
Note that none of the six auxiliary conformations can be assigned to any of the 18 established DNA backbone families (Table 1) despite being likely essential to non-canonical DNA structures formation.The gas phase and solvent stabilities of the auxiliary systems were assessed only at the MP2(T,Q) level and we did not use them for benchmarking of the DFT-D methods, as we believe the 18 standard families provide sufficient data for this purpose.Adding these geometries into the database would be straightforward.However, since force field calculations are more prone to fail for non-canonical backbone conformations, we computed MM and MM-PBSA energies of A1-A6 structures using the force field (vide supra) and validated them against the QM results.Note that when tuning the force field through the dihedral terms, we use formally intramolecular terms to effectively include incorrect or missing intermolecular contributions, which may limit transferability of the torsion potentials for different backbone families.

Results and discussion
The single point calculations were done on geometries optimized using the TPSS-D3/Def2-TZVPPD method either in the gas phase or in the continuum COSMO solvent model.Since the most flexible torsional degrees of freedom of the SPS model were frozen at their initial values (see Methods and Table 1), there are only marginal structural differences between the experimental and the minimized geometries.The optimized structures were manually inspected for the presence of spurious interactions originating from artificial termini of the SPS model.No such unnatural contacts have been observed so the calculated energies reflect purely intrinsic backbone rotameric preferences and native interactions.The set of optimized SPS geometries is available in the ESI † and can be used for benchmark purposes.
As anticipated both variants of the kinetic energy densitydependent group perform slightly better than pure GGAs as compared to the reference CBS(T) HL level.The rather outstanding results of the TPSS functional with MAD being only 0.17 kcal mol À1 for both gas phase and COSMO environments (Tables 3 and 4), respectively, might be partially attributed to the fact that the geometries were minimized at the same level of theory (TPSS-D3/def2-TZVPPD), i.e. the single point calculations were done on the true minima of the TPSS-D3 PES.While the empirically revised variant oTPSS yields negligibly worse results compared to the original TPSS functional in the gas phase with MAD of 0.21 kcal mol À1 , it slightly improves on TPSS for Table 3 Gas phase relative energies (kcal mol À1 ) compared to reference CBS(T) HL data (in bold) with the canonical BI DNA conformation (structure C1, in bold) taken as the reference.For hybrid functionals values in parentheses refer to the sparse integration grid (m4).Statistical descriptors VAR, |MAX.DEV|, and MAD refer to variance (kcal 2 mol À2 ), the absolute value of maximum deviation (kcal mol À1 ), and mean absolute deviation (kcal mol À1 ) compared to the CBS(T) HL relative energies, respectively.Values in square brackets in the ''Label'' column correspond to the number of identified conformations, i.e. refer to the ''N'' column in Table 1.The conformations are ordered according to their gas phase stability in the benchmark computations 0.00 0.00 0.00 0.00 0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 0.00 0.00 0.00 0.00 0.00 COSMO calculations with MAD as low as 0.13 kcal mol À1 (Tables 3 and 4).Note that our finding is slightly in contrary to the benchmark study of Goerigk and Grimme, 34 where oTPSS clearly outperforms TPSS in the gas phase.It is therefore evident that the electronic structure of the sugar-phosphate backbone represents a good test for density functionals.Still, both methods exhibit very good performance with a maximum deviation of B0.4-0.5 kcal mol À1 .As far as the GGA functionals are concerned, BLYP provides energies closer to the reference data than the revPBE functional with the maximum deviation below 1 kcal mol À1 for both gas phase and solvent (Tables 3 and 4).Even though accuracy of the tested (meta-)GGAs is inferior to more elaborated hybrid and double-hybrid functionals (see below), their simpler exchangecorrelation evaluation and the possibility of taking advantage of the RI approximation render them computationally noticeably more feasible.For that reason (o)TPSS and BLYP functionals with the def2-TZVPPD basis set and accompanied with the D3(BJ) dispersion correction represent a viable compromise between accuracy and computational demand.These methods can be recommended especially for preliminary conformational searches and calculations on much larger DNA model systems including several nucleotides.

Hybrid functionals
Another group of a higher Jacob's ladder rung incorporating a portion of an exact HF exchange interaction consists of three tested hybrid functionals: B3LYP, PW6B95 and MPW1B95.The PW6B95 and MPW1B95 functionals supplemented with the D3 term yield energies in tight accordance with the CBS(T) HL benchmark.Both functionals have maximum and mean absolute deviations as low as B0.5 and B0.2 kcal mol À1 (Tables 3 and 4), respectively, which renders them to be the methods of choice for calculations of electronic energies of the sugar-phosphate backbone.The performance of the B3LYP functional is somewhat inferior, with the maximum deviation being B0.7 kcal mol À1 for both environments.The essential role of the dispersion correction is illustrated for the B3LYP functional (Table 5).While the maximum and mean absolute deviation for dispersion-corrected B3LYP in the gas phase is equal to 0.74 and 0.28 kcal mol À1 , respectively, pure (i.e., without the dispersion correction) B3LYP deviates markedly from the reference CBS(T) HL energies with respective deviations being as high as 2.55 and 1.20 kcal mol À1 .The noticeable improvement upon D3 correction holds also for the COSMO environment.
While B3LYP appears to be insensitive to the size of the integration grid (difference between m5 and m4 grids being lower than 0.02 kcal mol À1 ) in line with preceding studies, 92 PW6B95 and MPW1B95 show slightly stronger grid size dependence.The relative energies calculated using the sparser m4 integration grid (Tables 3 and 4, values in parentheses) differ from the reference ones (m5) by less than 0.10 kcal mol À1 for both PW6B95 and MPW1B95.Taking into account that the Table 4 COSMO relative energies (kcal mol À1 ) compared to reference CBS(T) HL data (in bold) with the canonical BI DNA conformation (structure C1, in bold) taken as the reference.For hybrid functionals values in parentheses refer to the sparse integration grid (m4).Statistical descriptors VAR, |MAX.DEV|, and MAD refer to variance (kcal 2 mol À2 ), the absolute value of maximum deviation (kcal mol À1 ), and mean absolute deviation (kcal mol À1 ) compared to the CBS(T) HL relative energies, respectively.Values in square brackets in the ''Label'' column correspond to the number of identified conformations, i.e. refer to the ''N'' column in Table 1  acceleration of a computation when a numerical quadrature grid is pruned from 590 (m5) to 434 (m4) spherical grid points is no more than B2% of the wall computational time, we encourage to use the denser m5 integration grid for all hybrid functionals.Even though from the current results the m5 grid might seem to be unnecessary, the magnitude of the numerical error is conformation-dependent and thus it cannot be guaranteed to be insignificant over the whole potential energy surface.

Double hybrid functionals, MP2/CBS and CBS(T) LL
The last set of the most accurate and computationally challenging methods embraces PWPB95 and DSD-BLYP double hybrid functionals, extrapolated MP2(D,T) and MP2(T,Q) methods and composite CBS(T) LL .Both double hybrids show a great correlation with the reference CBS(T) HL gas phase results, with the maximum deviations of 0.56 and 0.20 kcal mol À1 (Tables 3 and 4) for PWPB95 and DSD-BLYP, respectively.The mean absolute deviations of 0.21 and 0.07 kcal mol À1 rank them among the most accurate methods within the current study.Especially the DSD-BLYP functional shows an astonishing agreement with the CBS(T) HL and is the clear winner of this benchmark.Even though high computational costs prohibit its practical and widespread usage for systems of similar size as the SPS model and larger, D3 correction term complemented DSD-BLYP with the large quadruple-z basis set is the method of choice when highly accurate energies are demanded.Contrary to that PWPB95 can be superseded by considerably faster and equally accurate (o)TPSS meta-GGAs or PW6B95/MPW1B95 hybrids and thus its application seems to be rather ineffective for sugarphosphate backbone model systems.
Although the MP2(D,T) performs rather well as it reaches chemical accuracy with maximum deviation of B0.7 kcal mol À1 (Tables 3 and 4), it yields energies slightly inferior to the majority of DFT-D3 methods and is approximately of B3LYP-D3 quality.Note, however, that conclusions of our foregoing study 24 about the quality of MP2(D,T) are not in contradiction with the present findings.The difference between the preceding and present study reflects the tremendous improvement in DFT methodology, as the most advanced functionals and D3 dispersion correction were not used in the earlier study.When replacing MP2(D,T) by MP2(T,Q), both the maximum and mean absolute deviations of gas phase energies decrease approximately by a half to 0.43 kcal mol À1 and 0.18 kcal mol À1 , respectively.As can be evidenced from both gas phase and COSMO results (Tables 3 and 4), performance of MP2(T,Q) is comparable to the CBS(T) LL and thus enlargement of the one-electron space by considering the quadruple-z basis set effectively compensates for missing higher order excitations covered by the DCCSD(T) term.

CCSD(T) corrections
The DCCSD(T) calculations were carried out using the 6-31+G(d) and aug-cc-pVDZ basis sets (Table 6).There are two main conclusions that follow from the evaluated corrections.Firstly, the DCCSD(T) corrections are, to a good approximation, conformation-independent, in line with our earlier preliminary study. 24The average relative values of the DCCSD(T) term, i.e.DDCCSD(T), using the 6-31+G(d) and aug-cc-pVDZ basis sets are equal to 0.16 kcal mol À1 and 0.18 kcal mol À1 (Table 6), respectively.Note, that unlike the previous study 24 the current diverse benchmark set effectively covers the whole presently known biologically sampled DNA backbone conformational space, so that significance of the present results is markedly higher.Secondly, we show that DCCSD(T) energies are rather invariant to the number of basis functions, at least when going from 6-31+G(d) to aug-cc-pVDZ with the average difference below B0.05 kcal mol À1 (Table 6).Thus, when DCCSD(T) corrections are required, the 6-31+G(d) basis set seems to be sufficient.It obviously cannot be ruled out that a larger basis set could still affect the results, but we consider this unlikely due to the evident insensitivity of the sugar-phosphate backbone energetics to the higher-order correlation correction, contrasting studies of molecular clusters. 37he assessed performance of the tested methods is visualized in Fig. 2 and 3 for gas phase and COSMO results, respectively.As expected the quality of the density functionals as measured against CBS(T) HL energies follows in general the Jacob's ladder scheme, i.e.GGA o meta-GGA o hybrids o double-hybrids.The B3LYP and PWPB95 slightly stick out as their performance corresponds rather to the respective lower-lying ladder rung.More specifically, while B3LYP and BLYP functionals are of similar quality, computational demands of the latter functional are significantly lower.The same holds for the PWPB95, which is substantially more demanding than comparably accurate PW6B95 and MPW1B95 hybrid functionals.Even though both B3LYP and PWPB95 yield reliable results with maximum deviation below 1 kcal mol À1 , computational inefficiency argues against their usage for comparison of the sugar-phosphate backbone conformational energies.We thus recommend to use BLYP and (o)TPSS functionals supplemented with D3(BJ) dispersion correction for fast exploratory calculations of the sugarphosphate backbone model.When accurate relative energies are needed, e.g. for force field fitting purposes, we encourage applying PW6B95 and MPW1B95 hybrids supplied with the D3 term instead of the inferior MP2(D,T) level.In case higher precision is needed DSD-BLYP surpasses extrapolated MP2(T,Q) and CBS(T) LL data as it Table 6 Relative DCCSD(T) gas phase corrections (kcal mol À1 ) for 6-31+G(d) (BS1) and aug-cc-pVDZ (BS2) basis sets, C1 structure taken as the reference.The DD row denotes the absolute value of the difference between the first two lines, i.e. the basis set dependence of the DCCSD(T) corrections provides closer agreement with the CBS(T) HL at comparable time (vide infra) and hardware requirements.Similar trends can be observed also for COSMO calculations.

Calculations time demands
For DFT-D3 and MP2 methods a rough comparison of computational time demands is given in Table 7. Timing of the different methods differs by almost three orders of magnitude.CCSD(T) requirements are not included as the calculations were carried out on different computational architecture and thus unambiguous comparison cannot be done.The efficiency of the more accurate DSD-BLYP surpasses MP2(T,Q) and similarly the new D3-corrected hybrid functionals PW6B96 and MPW1B95 are clearly more efficient than MP2(D,T).
Table 7 Computational time ratios of selected DFT-D3 and MP2 methods as compared to the least demanding revPBE.Values for hybrid functionals correspond to the denser m5 integration grid.Basis sets labeled DZ, TZ1, TZ2, and QZ refer sequentially to aug-cc-pVDZ, def2-TZVPPD, aug-cc-pVTZ, and aug-cc-pVQZ.

Molecular mechanics calculations
Molecular mechanics calculations were carried out both in the gas phase and solvent environment approximated by the Poisson-Boltzmann model using four different partial charges sets labeled AMBER, RESP, Mod.RESP, and Mod.RESP-MM (Table 8).The AMBER charge set was obtained by truncation and simple neutralization from the original force field.It is the worst set in the gas phase with maximum and mean absolute deviations being 8.76 kcal mol À1 and 3.62 kcal mol À1 .However, it surprisingly outperforms the remaining charge sets for MM-PBSA calculations with the respective deviations from the benchmark being 4.66 kcal mol À1 and 1.35 kcal mol À1 .Even though rough agreement with CBS(T) HL for solvent was anticipated, the deviation of the gas phase energies is astonishing.The other three sets directly acquired by the standard and tightened RESP procedure for the SPS model (vide supra) provide comparable agreement with the CBS(T) HL reference data with the maximum and mean absolute deviations B4 kcal mol À1 and B1.5 kcal mol À1 .Considering CBS(T) HL /COSMO energies, which span a rather narrow range of B6 kcal mol À1 width, it is obvious that MM calculations with neither charge set satisfactorily reproduce the fine energy differences between distinct DNA backbone rotamers.Although it cannot be ruled out that the differences can be in future reduced by further tuning of the force field torsional parameters, 25,[27][28][29] we suggest that a substantial part of the discrepancy is due to inherently neglected conformational polarization effects, which are known to be more pronounced in anionic systems.That would mean the uncertainty is inherent to the fixed-charge force field models.As emphasized elsewhere, the capability of refinements on pairadditive force fields with constant point atomic charges via tuning of the formally intramolecular torsional parameters is certainly not unlimited. 93Considering that polar medium partially screens out electrostatic interactions, it might be anticipated that gas phase MM relative energies would be more sensitive to partial charges adjustments compared to MM-PBSA calculations.However, the present results give evidence that also MM-PBSA energies are highly responsive to charge variations.[29] Relative energies distribution The distribution of the gas phase and COSMO CBS(T) HL relative energies is plotted in Fig. 4. Note that energy distribution calculated in COSMO solvent is appreciably narrower than in the gas phase, which likely reflects the real conformational preferences.
As explained in detail elsewhere, 4 the individual backbone geometries seen in the X-ray structures are affected by various kinds of data and refinement errors.Thus, many individual observed backbone conformations might be inaccurate or even entirely incorrect.This is the reason why the structures need to be processed by structural bioinformatics, which through statistical methods identifies real conformational families.Many individual backbone geometries then occur as outliers, which cannot be assigned to the established families.Some of them may still represent real structures missed by the bioinformatics due to their infrequent occurrence.On the other side, it cannot be ruled out that some families based on clusters with small numbers of the individual occurrences may be misleading.Definitely, some biologically relevant backbone conformations are missing in the current C1-C18 set, e.g.conformations emerging in structures difficult to crystalize or those, whose frequency of occurrence is too low to be statistically significant.
Table 8 Relative MM and reference CBS(T) HL energies (kcal mol À1 ) for the gas phase and the solvent environment.''Solvent'' stands for MM-PBSA and COSMO for MM and QM calculations, respectively.Statistical descriptors VAR, |MAX.DEV|, and MAD refer to variance (kcal 2 mol À2 ), the absolute value of maximum deviation (kcal mol À1 ), and mean absolute deviation (kcal mol À1 ) compared to the CBS(T) HL  Some examples are the additional auxiliary conformations A1-A6 discussed below.Nevertheless, the selected 18 conformations derived as described above are biologically significant and represent a wide range of real backbone substates of DNA.
Since the database analysis was based on structures with 1.9 Å and better resolution, we do not anticipate that it contains any irrelevant or markedly unstable backbone substates.Nevertheless, one of the initial goals of our study was to try to identify potentially incorrect geometries, based on their unfavorable energies.However, we found out that such a task is rather complex and we decided to postpone it to some future studies.The reason is that the energy pattern in our dataset is pretty complex and the energies do not include the overall contexts in which the backbone substates occur.Therefore, it is impossible to establish any straightforward criteria that would unambiguously tag potentially incorrect geometries.Biomolecular structures should in general sample geometries on a fine energetical scale of several kcal mol À1 .Thus, the existence of high-energy biologically significant DNA backbone conformations is not likely.Therefore, rather narrow distribution of conformational energies is expected, which is exactly what is observed when the backbone model systems are put in the continuum solvent.While the most unstable conformation in the gas phase (C17) is B9.8 kcal mol À1 above canonical BI DNA, in the solvent environment it is only B3.2 kcal mol À1 less stable than the reference.Note also the asymmetric relationship between structure and energy.While the existence of uncommon backbone substates of comparable intrinsic stability to the BI DNA conformation cannot be ruled out, it is much less likely to have a frequently occurring conformation to be intrinsically markedly unstable.Whereas all commonly populated substates (the number of identified dinucleotide steps N > 100, Table 1) are below B3.5 kcal mol À1 in COSMO solvent, the BI conformation with a + 1/g + 1 crankshaft motion (C9, N = 158) is more than twice less stable than canonical BI DNA in the gas phase (7.4 kcal mol À1 , Table 3).This substate is identified by structural bioinformatics without any doubt, so it is a very real structure.Based on the gas phase calculations only, i.e. without the solvent correction, the C9 conformation might be incorrectly regarded as too unstable to be adopted in DNA.Despite the fact that the average difference between the gas phase and COSMO CBS(T) HL relative energies is equal to B2 kcal mol À1 in absolute value, in the case of C17, C12, and C9 conformers the difference is 6.6, 5.5, and 4.5 kcal mol À1 at the CBS(T) HL level of theory, respectively.Also conformational energy ordering is dramatically changed upon inclusion of the continuum solvent (Tables 3 and 4).It is thus evident that inclusion of solvation effects, at least in an approximate continuum fashion, is essential when predictions about stability and conformational preferences of nucleic acids are made.Gas phase data cannot be used to discriminate unstable geometries of the backbone.In COSMO solvent, none of the geometries is unstable enough to be identified as suspicious.
It should be noted that in our dataset each rotameric family is represented just by a single geometry.It is possible that when investigating the PES around the calculated structures, more stable structures could be found.This is more likely to happen for rotameric families, where the bioinformatics clustering has been based on a smaller number of the individual occurrences.Therefore, from the biochemical point of view the energy ordering in the present study should be taken only as approximate.In addition, stability of the given backbone rotamer inside a particular DNA context may be decisively affected by its compatibility with the overall DNA architecture and interactions with DNA parts not included in the computations.Still, on average, more populated clusters tend also to be more stable intrinsically.

Auxiliary conformations
Since the 18 members set of DNA conformations apparently lacks statistically insignificant, yet biologically highly relevant backbone conformations, we also considered six auxiliary systems (A1-A6) we have identified in high resolution experimental structures and which are sufficiently dissimilar to any of the 18 identified conformers, although we did not calculate the CCSD(T) corrections.The MP2(T,Q) and MM energies evaluated in the gas phase and solvent medium are listed in Table 9.Note the rather big average difference between gas phase and solvent MP2(T,Q) relative energies amounting to B3.3 kcal mol À1 (Table 9).This finding underlines the effect of the solvent environment, which reshuffles the predicted intrinsic conformational preferences.
The MP2(T,Q) results show that highly non-canonical backbone substates are energetically quite feasible.It is well known that non-canonical backbone conformations may be particularly difficult for force fields. 90,94This is in line with our MM calculations (Table 9), where the mean absolute deviations for the four partial charge sets are slightly below 3.0 kcal mol À1 , while for the most frequent DNA conformations (C1-C9 with N > 100, see Table 1) it is below 1.0 kcal mol À1 .Just as for the main set of backbone conformers, the ''truncated'' AMBER charges give worse results in the gas phase, but outperform sets of charges derived directly for our SPS model system in continuum solvent calculations.Note that the multimolecular RESP does not improve the agreement (Table 9).The MM-PBSA destabilization of the A1, A4, and A5 conformations may be partially due to the g torsion, which is close to the trans region and is intentionally penalized by the parmbsc0 force field, in order to prevent B-DNA degradation. 26It has been shown that the g(t) penalty of parmbsc0 may be excessive in some other contexts. 95We have to conclude that the simple point charge model has difficulties in describing the DNA backbone in a balanced manner, which do not seem to be easily resolvable.Note that when tuning the force field through the dihedral terms, we use formally intramolecular terms to effectively include incorrect or missing intermolecular contributions, which may limit transferability of the torsion potentials for different backbone families.
Note that while the 18 conformational families (structures C1-C18) are represented by exemplar geometries that are close to the respective cluster centers, 51 the additional auxiliary geometries were just taken from some of the available X-ray structures without any bioinformatics processing.That is why we do not mix the two datasets.From the biochemical point of view, however, the A1-A6 geometries are by no means less significant.They were missed by the bioinformatics analysis only because of an insufficient number of i-DNA and antiparallel G-DNA X-ray structures with sufficient resolution in the database.The performance of density functionals with respect to MP2(T,Q) for the A1-A6 set is very similar to data for the C1-C18 data set (data not shown), so we suggest that the C1-C18 database is representative enough for benchmarking other computational methods.

Conclusions
The DNA sugar-phosphate backbone poses a challenging task even for present-day quantum chemistry methods.The ability to predict DNA backbone conformational preferences and the knowledge of backbone energetics are important for comprehension of the DNA structure and dynamics.It is also crucial for empirical force field calibration, as the sugar-phosphate backbone description represents one of the main obstacles in molecular modeling of nucleic acids.To gain insight into the intrinsic energetics of the DNA backbone and to address the performance of modern dispersion corrected DFT, SPS (sugarphosphate-sugar, Fig. 1) models of 22 diverse backbone conformations sampling the complete presently known naturally populated DNA torsional space were studied both in the gas phase and continuum solvent.While 18 DNA backbone conformational families are taken from the bioinformatics literature, 51 6 additional conformations are suggested by us based on the inspection of X-ray structures of noncanonical DNA.The main results can be summarized as follows.
We provide a benchmark database of accurate structureenergy data for the DNA backbone, which can be used for assessment of other theoretical methods.The most accurate data are based on MP2/CBS extrapolation with aug-cc-pVTZ and aug-cc-pVQZ basis sets (MP2(T,Q)) supplemented by CCSD(T) correction with the aug-cc-pVDZ basis set (known in the literature as CBS(T) or estimated CCSD(T)/CBS level).
We propose a new simplified SPS (sugar-phosphate-sugar) model system that remediates spurious interactions biasing calculations with the previously used SPSOM (sugar-phosphatesugar capped with methoxy groups at the 3 0 and 5 0 ends) model, 24 while still being sufficiently electronically complete.The SPS model is conceptually similar to the T3PS model system used by some other groups 17 and discussed above (see Fig. 1).
The latest dispersion-corrected DFT methods are capable of achieving semi-quantitative or even quantitative accuracy for the conformational preference of the DNA backbone.The BLYP, TPSS, and oTPSS functionals augmented with the Grimme's D3 dispersion correction damped by the Becke-Johnson (BJ) method yield very good results close to the reference data.These methods could be recommended for preliminary calculations or geometry optimizations of the sugar-phosphate models and for computations on larger model systems.PW6B95 and MPW1B95 D3-corrected hybrids can be used to obtain even a higher accuracy, as these functionals provide already better results than MP2(D,T) CBS computation.The B3LYP-D3 method provides energies slightly inferior to the other tested methods.B3LYP shows clear integration grid size independence.Even though the PW6B95 and MPW1B95 results Table 9 Relative MP2(T,Q) and MM energies (kcal mol À1 ) of the auxiliary (A) conformations for the gas phase and the solvent environment compared to BI DNA (C1, in bold).''Solvent'' stands for MM-PBSA and COSMO for MM and MP2(T,Q) calculations, respectively.Statistical descriptors VAR, |MAX.DEV|, and MAD refer to variance (kcal 2 mol À2 ), the absolute value of maximum deviation (kcal mol À1 ), and mean absolute deviation (kcal mol À1 ) compared to the MP2(T,Q) energies, respectively are approximately invariant to integration grid density we recommend to generally use the m5 grid for (double-)hybrid functionals.The expensive PWPB95 double-hybrid functional provides energies of comparable quality with PW6B95 and MPW1B96 hybrids and thus appears to be rather inefficient for backbone energies evaluations.Contrary to that the DSD-BLYP double-hybrid yields the most accurate energies as compared to the reference CBS(T) HL data, however, high computational requirements make its common usage less attractive.Proof of an approximate basis set independence of DCCSD(T) corrections when going from 6-31+G(d) to aug-cc-pVDZ is given.We demonstrate that force field calculations with constant point atomic charges are much less accurate than modern DFT-D3 methods and struggle to reproduce QM relative energies, especially when highly non-canonical backbone conformations are taken into account.In fact, the maximum deviations of MM relative energies are comparable to the energy range spanned by the different biochemically relevant backbone conformations.It confirms that the description of the backbone is a major limitation of MM modeling of nucleic acids.We tested several charge distributions but we were so far unable to substantially reduce the spread of the MM deviations from the reference data.

Gas phase
Finally we stress the importance of inclusion of the solvent environment, which modifies energetical ordering of backbone conformers and reduces the energy difference between the different rotamers.Obviously, both gas phase and condensed phase computations are valid, the latter, however, should be preferred if any suggestions regarding nucleic acids are made based on QM computations.

Table 2
This journal is c the Owner Societies 2013 Phys.Chem.Chem.Phys., 2013, 15, 7295--7310 7301 . The conformations are ordered according to their COSMO stability in the benchmark computations

Table 5
Performance of the B3LYP functional integrated using the m5 grid with and without D3(BJ) dispersion correction for the gas phase and the COSMO environment.Statistical descriptors VAR, |MAX.DEV|, and MAD refer to variance (kcal 2 mol À2 ), the absolute value of maximum deviation (kcal mol À1 ), and mean absolute deviation (kcal mol À1 ) compared to the CBS(T) HL This journal is c the Owner Societies 2013 Phys.Chem.Chem.Phys., 2013, 15, 7295--7310 7303 energies, respectively c the Owner Societies 2013 Solvent System AMBER RESP Mod.RESP Mod.RESP-MM MP2(T,Q) System AMBER RESP Mod.RESP Mod.RESP-MM MP2(T,Q)