Vincenzo
Barone
Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy. E-mail: vincenzo.barone@sns.it
First published on 29th November 2023
Computation of accurate geometrical structures and spectroscopic properties of large flexible molecules in the gas-phase is tackled at an affordable cost using a general exploration/exploitation strategy. The most distinctive feature of the approach is the careful selection of different quantum chemical models for energies, geometries and vibrational frequencies with the aim of maximizing the accuracy of the overall description while retaining a reasonable cost for all the steps. In particular, a composite wave-function method is used for energies, whereas a double-hybrid functional (with the addition of core–valence correlation) is employed for geometries and harmonic frequencies and a cheaper hybrid functional for anharmonic contributions. A thorough benchmark based on a wide range of prototypical molecular bricks of life shows that the proposed strategy is close to the accuracy of state-of-the-art composite wave-function methods, and is applicable to much larger systems. A freely available web-utility post-processes the geometries optimized by standard electronic structure codes paving the way toward the accurate yet not prohibitively expensive study of medium- to large-sized molecules by experimentally-oriented researchers.
Recently, rotational spectroscopy has been extended to the investigation of solid thermolabile molecules (like most molecular bricks of life) thanks to the introduction of the laser ablation (LA) technique.12–17 Furthermore, supersonic-jet expansion has resulted in great simplification of rotational spectra due to the cooling of molecules produced in the LA step to low rotational temperatures. Finally, chirp-pulse microwave (MW) spectrometers18,19 combined with fast-mixing nozzles permit the investigation of non-covalent interactions in large systems.20–23
Unfortunately, direct interpretation of the spectroscopic signals in structural and dynamic terms is seldom straightforward, as for prototypical molecular bricks of life (see Fig. 1). In this respect, molecular simulations can play an invaluable role, provided that they are able to couple accuracy and feasibility.5,24,25 One of the most effective strategies to reach this goal is based on an integrated computational approach that employs quantum chemical (QC) models of increasing accuracy in the different steps of an exploration/exploitation workflow. The main steps of this strategy26–29 can be summarized as follows:
(1) Unsupervised perception of the molecular system with the aim of disentangling hard and soft degrees of freedom and, possibly, identifying the most suitable fragmentation patterns.30
(2) Exploration of the PES governed by soft degrees of freedom using a fast semi-empirical method,31 guided by knowledge-based and evolutionary algorithms26,32 and followed by the refinement of the structures of the most stable minima29 and, possibly, by the analysis of relaxation paths between pairs of adjacent minima.28
(3) Determination of accurate geometries and force fields for the most stable structures not involved in fast relaxation processes.33,34
(4) Evaluation of accurate electronic energies and properties for the final panel of low-energy minima.35,36
(5) Computation of relative populations and spectroscopic parameters under the experimental conditions of interest employing the quantities obtained in steps 3 and 4.37–41
In step 1, the Proxima software30 for molecular perception is used to produce automatically a full list of stereo-isomers (and/or tautomers) and to identify soft degrees of freedom. In this step, special attention is paid to pseudo-rotation coordinates for describing the puckering of rings not involving π-electron conjugation42,43 and the combinations of dihedral angles around each bond not belonging to cycles and with a bond order lower than a pre-defined threshold.
The main aspects of all the other steps will be analyzed in the following sections, taking into account the latest achievements of contemporary computational chemistry, and, in particular, the availability of reduced-cost QC methods for the accurate evaluation of electronic energies for large molecules.44–49 Unfortunately, equilibrium geometries and vibrational frequencies can benefit only marginally from these developments, which have not yet been extended to analytical energy derivatives. However, the intrinsic errors affecting the standard methods employed in the current spectroscopic studies (typically second-order Møller–Plesset perturbation theory (MP2)50 or hybrid density functionals)51–54 hamper any a priori prediction of the spectroscopic outcome. Therefore, the current practice is to resort to a posteriori interpretations in terms of the agreement between experimental and computed spectroscopic parameters, irrespective of the computed stability of the selected species.55 Improved results can be obtained by combining different QC methods and correcting the DFT geometrical parameters with bond-specific scaling factors56 derived from a large database of accurate molecular structures,57,58 or with reference to suitable fragments, whose accurate geometries are already known.59 While both approaches have been combined with remarkable success in the so-called Nano-LEGO model,57,58,60 the use of a large number of parameters remains quite unsatisfactory and, above all, suitable fragments are not always available.35,61 Based on these premises, it is the purpose of this paper to show that unbiased prediction and interpretation of high-resolution spectra can be obtained by means of the Pisa Composite Scheme (PCS),33–36 which consistently improves the accuracy of current approaches for molecules containing up to a few dozen atoms, while retaining a black-box nature and the use of reasonable computational resources.
Alternative strategies involving universal machine learning (ML) potentials to search for low-energy structures have been proposed63 and are possibly competitive with the evolutionary algorithm sketched above, but a comparison of different exploration strategies is beyond the scope of the present paper. In any case, at the end of the whole exploration, low-energy structures within a pre-defined energy range are selected by discarding too similar structures and then performing single point energy evaluations at the B3LYP/6-31+G* level,64 also including Grimme's D3BJ dispersion corrections.65 This computational model (hereafter referred to as B3/SVP) is used only for the selection of an initial panel of structures to be next refined at higher levels and has been chosen because it couples a remarkable computational efficiency with the prediction of reasonable anharmonic contributions (vide infra).
In the next step, structures lying within a smaller energy range are optimized at the same level and the surviving ones define the panel of candidates for the final structural refinement, which is performed employing the revDSD-PBEP86-D3BJ double-hybrid functional66 (hereafter rDSD), for which effective analytical gradients and Hessians are available.67 Several studies have shown that this functional provides excellent geometrical structures,57 dipole moments,68 spectroscopic parameters,69 non-covalent intermolecular interactions,70,71 and conformational landscapes.72–74 In the present paper, the rDSD functional is employed in conjunction with a modified cc-pVTZ-F1275 basis set in which d functions on first-row atoms are removed and the two f functions on second- and third-row atoms are replaced by a single f function taken from the cc-pVTZ basis set.76 The resulting basis set (referred to as 3F12− in the following) has dimensions comparable with those of the jun-cc-pVTZ basis set employed systematically in previous studies,77 but it delivers results much closer to those of augmented quadruple-zeta basis sets,33 which are, in turn, close to the complete basis set (CBS) limit.78
The integrated strategy sketched above minimizes the number of expensive geometry optimizations with a negligible reduction in accuracy in the final results. The different energy thresholds depend on the system and the spectroscopic technique of interest. In general terms, a conservative limit for the relative stability of detectable conformers is around 900 cm−1 (which corresponds to a relative population of about 1% at room temperature, where kT/hc = 207 cm−1).28,79 As a consequence, the typical thresholds for the acceptance of semi-empirical structures, B3/SVP geometry optimizations and final rDSD/3F12− refinement are 2500, 1500 and 1000 cm−1, respectively.
As mentioned in the introduction, relaxation toward more stable structures can take place whenever the energy barriers ruling this process are sufficiently low, with a typical threshold being about 400 cm−1.80–82 With the aim of unraveling those fast relaxations, the paths connecting adjacent minima are analyzed systematically at the rDSD/3F12− level and the structures of the most significant transition states (TSs) estimated in this way are refined by full geometry optimizations at the same level.
Finally, dipole moments, quadrupolar coupling constants, and other one-electron properties are computed with good accuracy at the rDSD/3F12− level, whereas electronic energies are refined by the composite wave-function method described in the next subsection. Noted is that, while rDSD/3F12− geometries are sufficient for semi-quantitative purposes, the models developed for obtaining the improved structures needed for computing accurate rotational constants will be analyzed after the discussion of relative stabilities.
E = EV2 + ΔEV + ΔECV2 | (1) |
ΔECV2 = E(MP2(ae)/3wC) − E(MP2(fc)/3wC) | (2) |
Explicitly correlated (F12) approaches86,87 are employed in the new version of the Pisa composite scheme (PCS/F12) introduced in the present paper to evaluate EV2 and ΔEV in conjunction with the cc-pVnZ-F12 family of basis sets75 (hereafter nF12). Thanks to the balanced optimization of these basis sets for Hartree–Fock (HF) and MP2-F1286 computations and to the inclusion of the complementary auxiliary basis set (CABS) in the HF step,88,89 the CBS limit of the whole MP2-F12 energy (including the HF contribution) can be estimated accurately using the two-point extrapolation proposed by Helgaker:90
![]() | (3) |
![]() | (4) |
ΔECC = E(CCSD(F12*)(T+)) − E(MP2-F12) | (5) |
The choice of the computational level for the different terms included in the new PCS/F12 model has been based on the requirement of reasonable resources for each of them with specific reference to molecules containing up to about 20 atoms. In particular, the cost of CCSD(F12*)(T+)/2F12+ and MP2-F12/4F12 computations is negligible with respect to their CCSD(F12*)(T+)/3F12+ counterparts, which are, in turn, fully feasible on a standard workstation for molecules as large as guanine.35 For larger systems, containing up to about 50 atoms, single point CCSD(F12*)(T+)/3F12+ energy evaluations can be routinely performed thanks to efficient parallelization and implementation of frozen natural orbital (FNO), natural auxiliary functions (NAF) and/or related reduced-scaling techniques in several computer codes.44–49 The results of several benchmark studies have shown that FNO (and related) approximations produce results very close to those obtained from full computations, while reducing the computer time by about five times.36
All the PCS/F12 computations have been performed with the MRCC98 code.
The relative stability of low-energy minima is not related directly to the electronic energy differences (ΔE), but, rather, to the corresponding relative enthalpies at 0 K or free energies (ΔG°) at a temperature depending on the experimental conditions. In this connection, quantitative comparison with experiments requires going beyond the current practice of using scaled harmonic frequencies99–101 by incorporating explicit anharmonic contributions. To this end, equilibrium rotational constants (Beq), harmonic frequencies (ωi) and Coriolis couplings (ζij,τ) are computed at the rDSD/3F12− level,67 whereas cubic (fijk) and semi-diagonal quartic (fiijj) force constants are obtained from finite differences of analytical B3/SVP Hessians.37 Then anharmonic zero point energies (ZPEs) are estimated in the framework of second order vibrational perturbation theory (VPT2) by means of an analytical and resonance-free expression:102,103
![]() | (6) |
r = rV2 + ΔrV + ΔrCV2 | (7) |
ΔrCV2 = r(MP2(ae)/3wC) − r(MP2(fc)/3wC) | (8) |
rPCS/F12 = r(CCSD(F12b)(T)/2F12+) + ΔrCV2 | (9) |
An accurate reduced-cost version of the PCS model (referred to as PCS/DFT) has been recently introduced,33 in which the ΔrCV2 contribution is retained, but the coupled cluster model is replaced by the much cheaper rDSD double hybrid functional66 and the 3F12 basis set is employed in both the rV2 and ΔrV terms:
rPCS/DFT = r(rDSD/3F12) + ΔrCV2 | (10) |
Pij = exp{(rcovi + rcovj − rij)/0.3} | (11) |
![]() | (12) |
ΔrBij = ΔrCVBij + ΔrVBij | (13) |
![]() | (14) |
In the case of the rDSD/3F12− model, ΔrVB correction is needed only for counterbalancing slight errors for delocalization (Δdel) and hyperconjugation (Δhyp):
ΔrVB = Δrdel + Δrhyp | (15) |
![]() | (16) |
Δrhypij = −khyp(Pij − 1)2[δ(i,C)δ(j,F) + δ(i,F)δ(j,C)] | (17) |
Indeed, a generic rotational constant is inversely proportional to the corresponding principal inertia moment, which, in turn, only depends on molecular geometry and isotopic masses. However, direct structural determinations from experimental rotational constants are hampered by the limited amount of data with respect to the number of geometrical parameters and by the appropriate account of vibrational effects. The first limitation can be overcome by considering for a given molecule different isotopic species. However, determination of the couplings between rotations and vibrations needed to go from vibrationally averaged to equilibrium rotational constants is practically impossible for polyatomic molecules. Therefore, in the top-down approach the so called semi-experimental (SE) equilibrium geometry112 is obtained by a least-squares fit of the experimental rotational constants for the vibrational ground state (B0) of different isotopologues corrected by computed electronic (ΔBel) and vibrational (ΔBvib) contributions:113
B0τ = Beqτ + ΔBelτ + ΔBvibτ | (18) |
![]() | (19) |
![]() | (20) |
Whenever the lack of experimental information is too extensive, one can resort to the bottom-up approach, which consists in verifying the computed equilibrium geometry by means of a comparison between calculated and experimental rotational constants. In general terms, the optimal level of accuracy associated with predicted rotational constants should be close to 0.1% (1 MHz for a constant of 1 GHz), which roughly corresponds to errors smaller than 0.001 Å for typical bond lengths and 0.002 radians (0.1 degrees) for typical valence angles.115 This target accuracy can be surely obtained by expensive composite schemes incorporating high excitation orders in the correlation treatment.116 Nevertheless, the PCS/F12 model described above and its ‘cheap’ predecessors77,84 are able to draw closer to this high accuracy limit.15,36 However, for the quite large flexible molecules of interest in the present context, even this computational level becomes too expensive, and cheaper alternatives are needed. At the same time, the magnitude of vibrational corrections is typically between 0.1% and 1.0% that of the corresponding equilibrium rotational constant. Since the accuracy of ΔBvib contributions provided by global hybrid density functionals in conjunction with medium-size basis sets (e.g. the B3/SVP model previously defined) falls well within 5%,59 the ensuing average error of 0.05% on the rotational constants is more than acceptable. As we will see, equilibrium rotational constants obtained by the low-cost PCS/bond variant described in the previous section in conjunction with B3/SVP vibrational corrections draw closer to the goal accuracy of 0.1% on ground vibrational state rotational constants.
As is well known, atomization energies represent the most demanding thermochemical quantities and, actually, the so-called chemical accuracy (4 kJ mol−1) is considered a satisfactory target. The first validation of PCS/F12 has been performed with respect to the W4/11 set of highly accurate atomization energies.117 Concerning valence contributions, the mean unsigned error (MUE) and root mean square deviation (RMSD) are below 1 kJ mol−1 (0.7 and 0.9, respectively) and the maximum error (MAX) is below 4.0 kJ mol−1. Since the average value of the contribution of higher excitations (CCSDTQ5–CCSD(T)) is 1.2 kJ mol−1, the accuracy of PCS/F12 results is well within the intrinsic error bar of composite methods including up to CCSD(T) contributions. The situation is more involved for the CV contribution, due to the intrinsic error (around 3 kJ mol−1) related to its evaluation at the MP2 level. While further work is needed to devise a more accurate approximation not becoming the rate determining step of the whole PCS/F12 model, already the present version provides reliable results for reaction energies including tautomerization equilibria.36
The accuracy of the PCS/F12 model for conformational equilibria has been assessed with reference to the conformational landscape of the prototypical glycine amino acid (see Table 1), which shows eight different energy minima.118
A comparison with the state-of-the-art results of ref. 119 (including up to CCSDT(Q) contributions) confirms the accuracy of the PCS/F12 model, with the MAX and MUE between the two models being 17 and 8 cm−1 (0.2 and 0.1 kJ mol−1), respectively. This quantitative agreement is not unexpected since conformational equilibria are quite well described using low-level QC methods: as a matter of fact, rDSD/3F12− relative energies are already usually sufficient for semi-quantitative analyses of this kind of problems.118
Next, the accuracy of PCS/bond molecular structures is examined for the panel of medium-sized semi-rigid molecules shown in Fig. 2, whose computed and experimental rotational constants are collected in Table 2. When needed, the difference between rDSD/3F12− and PCS/bond equilibrium rotational constants will be labeled ΔBB. It is quite apparent that the rDSD/3F12− model already produces respectable results, but systematically overestimates the experimental ground state rotational constants. Among the different corrections to the rDSD/3F12− equilibrium rotational constants, ΔBvib plays the leading role, reducing systematically the computed values by up to 1%. This effect is partially counterbalanced by the ΔBB contribution, which, however, never exceeds 0.3% of the corresponding Beq value. Finally, the ΔBel contribution is smaller than the target accuracy of the proposed computational approach for all the molecules considered in the present study except H2CS. As such, in the following this contribution will be neglected.
Molecule | Axis | B eq (rDSD) | ΔBB (ΔBel) | ΔBvib | B 0 (PCS)a | B 0 (exp.)b |
---|---|---|---|---|---|---|
a Not including ΔBel except for H2CS. b The experimental ground state rotational constants have been rounded to one decimal place. | ||||||
CH2F2 | a | 49![]() |
285.8(−2.1) | −514.8 | 49![]() |
49![]() |
b | 10![]() |
67.7(−0.2) | −61.0 | 10![]() |
10![]() |
|
c | 9251.0 | 60.4(−0.2) | −69.4 | 9242.0 | 9249.8 | |
H2CS | a | 293![]() |
595.1(−843.4) | −1958.7 | 291![]() |
291![]() |
b | 17![]() |
21.0(−1.3) | −74.0 | 17![]() |
17![]() |
|
c | 16![]() |
20.6(−0.2) | −108.2 | 16![]() |
16![]() |
|
Pyrrole | a | 9174.2 | 27.9(−0.5) | −73.7 | 9128.4 | 9130.6 |
b | 9042.5 | 24.8(−0.3) | −69.8 | 8997.5 | 9001.3 | |
c | 4554.0 | 13.1(0.2) | −36.7 | 4530.4 | 4532.1 | |
Furan | a | 9493.0 | 32.6(−0.5) | −78.7 | 9446.9 | 9447.1 |
b | 9287.1 | 25.8(−0.5) | −65.4 | 9247.5 | 9246.7 | |
c | 4694.5 | 14.5(0.1) | −38.3 | 4670.7 | 4670.8 | |
Thiophene | a | 8077.0 | 27.3(0.4) | −59.5 | 8044.8 | 8041.8 |
b | 5434.1 | 18.1(0.2) | −29.6 | 5422.6 | 5418.1 | |
c | 3248.5 | 10.9(−0.1) | −21.7 | 3237.7 | 3235.8 | |
Pyridine | a | 6067.4 | 15.7(−0.3) | −44.7 | 6038.4 | 6039.3 |
b | 5832.9 | 17.2(−0.4) | −37.1 | 5813.0 | 5804.9 | |
c | 2973.6 | 8.6(0.1) | −21.3 | 2960.9 | 2959.2 | |
2F-Pyridine | a | 5901.6 | 16.4(−0.3) | −43.8 | 5874.2 | 5870.9 |
b | 2704.9 | 8.1(−0.1) | −15.8 | 2697.2 | 2700.0 | |
c | 1854.8 | 5.4(0.0) | −12.1 | 1848.1 | 1849.2 | |
3F-Pyridine | a | 5860.8 | 16.8(−0.3) | −42.6 | 5835.0 | 5829.7 |
b | 2642.1 | 8.3(−0.1) | −14.8 | 2635.6 | 2637.5 | |
c | 1821.1 | 5.6(0.0) | −11.5 | 1815.2 | 1815.7 | |
3-Furonitrile | a | 9340.4 | 30.3 | −75.1 | 9295.6 | 9296.5 |
b | 1939.9 | 6.3 | −6.1 | 1940.1 | 1940.3 | |
c | 1606.3 | 5.2 | −7.0 | 1604.5 | 1604.6 | |
Azulene | a | 2856.4 | 8.9 | −20.1 | 2843.2 | 2842.0 |
b | 1259.2 | 3.5 | −7.7 | 1255.0 | 1254.8 | |
c | 873.9 | 2.6 | −5.4 | 871.1 | 870.7 | |
Quinoline | a | 3159.0 | 9.5 | −22.4 | 3146.1 | 3145.4 |
b | 1275.8 | 3.7 | −7.9 | 1271.6 | 1271.6 | |
c | 908.8 | 2.6 | −5.7 | 905.7 | 905.7 | |
MAX% | 0.801 | 0.140 | ||||
MUE% | 0.397 | 0.047 |
In summary, the PCS/bond results always fulfill the target accuracy of 0.1%, with this confirming that a double-hybrid functional in conjunction with a sufficiently large basis set does not require a huge number of bond-specific corrections and, even more importantly, any correction to valence and dihedral angles.
An intuitive picture of the performance of the model is offered through graphical representation of the results based on normal distributions defined by:
![]() | (21) |
The rotational constants delivered by the PCS/F12 and PCS/bond variants have been compared for the two most stable conformers of glycine (see Fig. 4), together with the two low-energy forms of cycloserine, and the three tautomers of creatinine (see Fig. 5), for which experimental rotational constants are available and PCS/F12 computations are feasible. The results collected in Table 3 show that the quality of the PCS/bond results is even marginally better than that of their PCS/F12 counterparts, except for cycloserine, where PCS/F12 rotational constants are exceptionally accurate.
![]() | ||
Fig. 4 Molecular structure of the glycine and aminoisobutyric acid (Aib) conformers detected in the rotational spectra. |
![]() | ||
Fig. 5 Molecular structures of the cycloserine isomers and creatinine tautomers detected in the rotational spectra. |
Species | Parameter |
B
0exp.![]() |
ΔBvibb | PCS/F12 | PCS/bond |
---|---|---|---|---|---|
a The experimental ground state rotational constants (taken from ref. 120, 55 and 121 for glycine, cycloserine and creatinine, respectively) have been rounded to one decimal place. b B3/SVP and (in parenthesis) rDSD/3F12−. | |||||
Glycine I (ttt) | B a | 10![]() |
−77.1 | 10![]() |
10![]() |
(−76.2) | |||||
B b | 3876.2 | −29.5 | 3871.6 | 3869.3 | |
(−31.9) | |||||
B c | 2912.4 | −22.7 | 2920.3 | 2907.6 | |
(−22.7) | |||||
Glycine II (ccc) | B a | 10![]() |
−44.0 | 10![]() |
10![]() |
(−60.4) | |||||
B b | 4071.5 | −39.5 | 4056.1 | 4059.3 | |
(−37.1) | |||||
B c | 3007.5 | −37.4 | 2993.2 | 2998.8 | |
(−34.8) | |||||
MAX% | 0.48 | 0.30 | |||
MUE% | 0.27 | 0.20 | |||
Cycloserine I | B a | 3685.0 | −19.0 | 3683.1 | 3678.0 |
B b | 3160.5 | −28.3 | 3158.7 | 3163.4 | |
B c | 1815.5 | −15.6 | 1815.0 | 1814.1 | |
Cycloserine II | B a | 3649.0 | −38.6 | 3647.5 | 3642.6 |
B b | 3168.6 | −22.4 | 3167.8 | 3171.0 | |
B c | 1993.7 | −19.6 | 1992.7 | 1988.7 | |
MAX% | 0.06 | 0.25 | |||
MUE% | 0.05 | 0.15 | |||
Creatinine ZI | B a | 3842.4 | −38.5 | 3820.2 | 3825.6 |
B b | 1825.4 | −11.5 | 1819.3 | 1824.3 | |
B c | 1261.0 | −10.4 | 1259.6 | 1259.5 | |
Creatinine EI | B a | 3890.2 | −41.1 | 3879.1 | 3881.4 |
B b | 1810.2 | −11.9 | 1804.7 | 1810.4 | |
B c | 1258.1 | −11.2 | 1256.9 | 1256.9 | |
Creatinine A1 | B a | 3840.9 | −29.3 | 3816.5 | 3828.0 |
B b | 1831.5 | −14.0 | 1826.6 | 1830.8 | |
B c | 1266.9 | −11.9 | 1268.1 | 1266.3 | |
MAX% | 0.64 | 0.44 | |||
MUE% | 0.30 | 0.15 |
A comparison between the B3/SVP and the more accurate (but much more expensive) rDSD/3F12− vibrational corrections for glycine confirms the reliability of the cheaper approach except in the case of glycine II: as a matter of fact the very flat potential energy surface around this conformer (see the section on amino acids) enforces more stringent constraints on the level of anharmonic contributions. However, systematic benchmarks59,118 have shown that this situation is quite unusual for the molecules of interest in the present study.
The relative stability of different forms of cycloserine is very similar at the rDSD/3F12− and PCS/F12 level (195 and 205 cm−1, respectively) and the same applies to the relative stability of the EI and A1 forms of creatinine with respect to their most stable ZI counterpart (28 and 680 cm−1 at the RDSD/3F12− level to be compared with 20 and 645 cm−1 at the PCS/F12 level). While rDSD/3F12− electronic energies are never used in the final evaluation of the relative populations of low energy minima, the above results give further support to their use in the last steps of PES explorations.
A direct comparison between experimental and computed geometrical parameters would allow an unbiased analysis of the performance of the PCS/bond model, but, unfortunately, accurate experimental structures are not available for the quite large molecules, which are the main targets of the present study. However, the very recent determination of the semi-experimental equilibrium structure of 3-furonitrile122 (see Fig. 2) permits a detailed comparison for a heteroaromatic molecule containing a quasi-linear substituent. The results collected in Table 4 show that the accuracy of all the geometrical parameters is in the expected range and rivals that of the much heavier ‘cheap’ composite wave-function method,71,77 which is not applicable to larger molecules. Actually, the comparison is biased toward the ‘cheap’ results, since the semi-experimental equilibrium geometry was determined freezing bond lengths and valence angles involving hydrogen atoms at their ‘cheap’ values. Indeed, the maximum and mean relative unsigned errors (MAX% and MUE%, respectively) with respect to semi-experimental rotational constants suggest that in this specific case the PCS/bond model might be even more accurate than its ‘cheap’ counterpart.
Param. | Exp.a | Cheapb | rDSD/3F12−![]() |
PCS/bondc |
---|---|---|---|---|
a From ref. 122 with CH bond lengths, together with C3C4H6, C4C7H8, O9C5H10, and C3C9H10 valence angles fixed at the ‘cheap’ computed values and ΔBvib computed in this work at the B3/SVP level. b For the definition and expected accuracy of the 'cheap’ model see ref. 60 and 71. c This work. | ||||
N1–C2 | 1.1581 | 1.1570 | 1.1619 | 1.1594 |
C2–C3 | 1.4210 | 1.4213 | 1.4219 | 1.4196 |
C3–C4 | 1.4370 | 1.4363 | 1.4384 | 1.4361 |
C3–C5 | 1.3610 | 1.3602 | 1.3646 | 1.3619 |
C4–C7 | 1.3516 | 1.3509 | 1.3548 | 1.3533 |
C7–O9 | 1.3628 | 1.3618 | 1.3650 | 1.3633 |
C4–H6 | 1.0741 | 1.0741 | 1.0771 | 1.0759 |
C7–H8 | 1.0732 | 1.0732 | 1.0761 | 1.0749 |
C5–H10 | 1.0737 | 1.0737 | 1.0765 | 1.0753 |
![]() |
179.42 | 179.36 | 179.32 | 179.32 |
![]() |
127.40 | 127.44 | 127.40 | 127.40 |
![]() |
126.20 | 126.22 | 126.18 | 126.18 |
![]() |
105.47 | 105.46 | 105.55 | 105.55 |
![]() |
110.81 | 110.79 | 110.70 | 110.70 |
![]() |
127.08 | 127.08 | 127.40 | 127.40 |
![]() |
133.35 | 133.35 | 133.42 | 133.42 |
![]() |
117.11 | 117.11 | 117.13 | 117.13 |
![]() |
132.66 | 132.66 | 132.78 | 132.78 |
B eqa | 9371.6 | 9381.4 | 9340.4 | 9370.7 |
B eqb | 1946.4 | 1948.3 | 1939.9 | 1946.2 |
B eqc | 1611.6 | 1613.2 | 1606.3 | 1611.5 |
MAX% | 0.11 | 0.36 | 0.01 | |
MUE% | 0.10 | 0.34 | 0.01 |
Several molecules of current biological and medicinal interest are quite flexible, with this feature increasing the difficulty of predicting accurate geometrical parameters and vibrational corrections to rotational constants. In order to illustrate this point, the three medium-sized drugs shown in Fig. 6 (isoamil-acetate, aspirine and vitamin C) have been investigated. Inspection of Table 5 shows that the trends are the same as those discussed for semi-rigid molecules, but the errors increase significantly. However, also in this case the PCS/bond results represent a significant improvement over the underlying rDSD/3F12− results, which, in turn, are already much more accurate than the current standards in MW studies of biomolecule building blocks without any significant increase in the required computational resources.123–126
Axis | ΔBvib![]() |
B 0 (exp.)b | B 0 (rDSD) | B 0 (PCS) |
---|---|---|---|---|
a At the B3/SVP level. b The experimental ground state rotational constants (taken from ref. 123–125 for isoamyl-acetate, aspirine and vitamin C, respectively) have been rounded to one decimal place. | ||||
Isoamyl-acetate | ||||
a | −28.9 | 3280.9 | 3273.3 | 3287.4 |
b | −8.6 | 713.0 | 706.8 | 709.1 |
c | −8.0 | 690.1 | 685.8 | 688.0 |
Aspirine | ||||
a | −7.6 | 1156.1 | 1152.1 | 1155.4 |
b | −4.7 | 762.6 | 758.0 | 760.3 |
c | −3.3 | 509.0 | 506.9 | 507.9 |
Vitamin C (I) | ||||
a | −14.4 | 1562.5 | 1553.9 | 1559.9 |
b | −5.7 | 715.2 | 711.8 | 714.0 |
c | −4.3 | 524.3 | 521.6 | 523.3 |
Vitamin C (II) | ||||
a | −21.1 | 1472.7 | 1463.6 | 1469.4 |
b | −3.5 | 678.1 | 673.9 | 675.9 |
c | −4.0 | 575.0 | 571.6 | 573.1 |
Vitamin C (III) | ||||
a | −8.8 | 1455.7 | 1448.8 | 1454.0 |
b | −8.4 | 760.5 | 754.1 | 756.4 |
c | −5.9 | 569.5 | 570.6 | 572.4 |
MAX% | 0.87 | 0.56 | ||
MUE% | 0.53 | 0.28 |
The number of possible tautomers (NT) of a given species is NT = NS!/[NH!(NS − NH)!], where NS is the number of tautomeric sites and NH is the number of labile protons. For nucleobases, the so called “canonical” (keto and amino) forms predominate over their “minor” enol and imino counterparts under physiological conditions. In the case of uracil, 2-thiouracil, thymine and adenine (see Fig. 7) the “canonical” tautomer is significantly more stable than all the “minor” ones also in the gas phase, and the results collected in Table 6 show that the experimental rotational constants are reproduced very well by PCS/bond computations (MAX% = 0.17 and MUE% = 0.05).
Axis | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|
a The experimental ground state rotational constants (taken from ref. 128, 129, 130, and 131 for uracil, 2-thiouracil, adenine, and thymine, respectively) have been rounded to one decimal place. | |||||
Uracil | |||||
a | 3899.7 | 13.3 | −27.7 | 3885.4 | 3883.9 |
b | 2025.8 | 7.4 | −10.6 | 2022.6 | 2023.7 |
c | 1333.2 | 4.8 | −7.4 | 1330.6 | 1330.9 |
2-Thiouracil | |||||
a | 3568.3 | 11.8 | −22.9 | 3556.8 | 3555.1 |
b | 1316.3 | 5.0 | −6.4 | 1314.2 | 1315.0 |
c | 961.6 | 3.3 | −4.9 | 960.0 | 960.0 |
Adenine | |||||
a | 2380.9 | 8.7 | −13.7 | 2375.9 | 2371.9 |
b | 1576.2 | 5.4 | −8.7 | 1572.9 | 1573.4 |
c | 948.5 | 3.4 | −5.8 | 946.1 | 946.3 |
Thymine | |||||
a | 3212.2 | 10.8 | −23.4 | 3199.6 | 3201.2 |
b | 1407.2 | 5.2 | −7.7 | 1404.7 | 1404.8 |
c | 984.5 | 3.5 | −5.6 | 982.4 | 983.2 |
MAX% | 0.41 | 0.17 | |||
MUE% | 0.23 | 0.05 |
The situation is more involved for cytosine and guanine.127 In particular, cytosine has two endo (N1 and N3) and two exo (O7 and N8) tautomeric sites and two labile protons, so that NS = 4, NH = 2 and NT = 6. These tautomers can be classified as keto-amino (KA and KA1), enol-amino (EA), keto-imino (KI), and enol-imino (EI and EI1).
Furthermore, each enol and imino group shows, together with the most stable form, an additional rotamer (labeled with a c subscript). The focus of the discussion will be on the five most stable species (namely KA, EA, EAc, KI, and KIc) shown in Fig. 8, which are the only ones having non negligible populations in the gas phase according to all the available theoretical and experimental studies.51,61,132
The relative electronic energies of those tautomers and rotamers with respect to the KA species computed at the PCS/F12 level are 250, 288, 542 and 1148 cm−1, respectively, whereas the relative are 245, 243, 643, and 1205 cm−1, respectively. Irrespective of the employed basis set, the relative stability of the EA species with respect to its KA counterpart is underestimated by about 150 cm−1 at the rDSD level and overestimated by about 350 cm−1 at the MP2 level. On the other hand, the relative stability of different rotamers and the energy difference between the KI and KA species are well reproduced by both rDSD and MP2 computations. Finally, the main effect of ZPE is to increase the relative stability of the KA species with respect to all the other tautomers (and rotamers).
Inspection of Table 7 confirms the remarkable accuracy of PCS/bond equilibrium rotational constants and the need to include vibrational corrections (at least at the B3/SVP level) for appropriate comparison with experimental data. For instance, the experimental PCS/bonds) differences between the rotational constants of EA and EAc rotamers are ΔBa = −62.3 (−61.2) MHz, ΔBb = 17.3 (16.7) MHz, and ΔBc = 0.4 (0.3) MHz, with the error being one order of magnitude smaller than that obtained at the MP2 level.51 This finding is indeed remarkable for assignment purposes since quadrupole coupling constants cannot help in this connection in view of their very similar value in EA and EAc for all the 14N nuclei.51
Species | Axis | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|---|
a The experimental ground state rotational constants taken from ref. 51 have been rounded to one decimal place. | ||||||
EA | a | 3967.9 | 11.7 | −27.6 | 3952.0 | 3951.8 |
b | 2013.9 | 5.9 | −11.0 | 2008.8 | 2009.0 | |
c | 1336.5 | 4.0 | −8.6 | 1331.9 | 1332.5 | |
EAc | a | 3902.8 | 11.9 | −23.9 | 3890.8 | 3889.5 |
b | 2032.2 | 5.9 | −12.6 | 2025.5 | 2026.3 | |
c | 1337.1 | 3.9 | −8.8 | 1332.2 | 1332.9 | |
KA | a | 3888.8 | 12.7 | −27.4 | 3874.1 | 3871.5 |
b | 2028.8 | 6.1 | −9.4 | 2025.5 | 2025.0 | |
c | 1333.7 | 4.1 | −7.9 | 1329.9 | 1330.3 | |
KI | a | 3865.7 | 11.9 | −29.4 | 3848.2 | 3848.2 |
b | 2030.2 | 6.2 | −11.0 | 2025.4 | 2026.3 | |
c | 1331.1 | 4.1 | −7.7 | 1327.5 | 1328.0 | |
KIc | a | 3879.6 | 12.1 | −28.4 | 3863.3 | 3861.3 |
b | 2015.2 | 6.2 | −11.4 | 2010.0 | 2011.4 | |
c | 1326.3 | 4.1 | −7.7 | 1322.7 | 1323.2 | |
MAX% | 0.47 | 0.07 | ||||
MUE% | 0.30 | 0.04 |
Guanine has 4 endo (N1, N3, N7, N9) and 2 exo (OC and NH2) tautomeric sites and 3 labile protons, so that NS = 6, NH = 3, and NT = 20. Among those tautomers, there are 10 amino and 10 imino species, but the imino species will not be considered in the following since they are much less stable than their amino counterparts. Two keto-amino and one enol-amino (EA) tautomers are possible for each of the two non-equivalent structures of the imidazole ring (N7H and N9H), with both trans and cis (labeled with a c subscript) conformations of the hydroxyl group corresponding to energy minima for enol-amino tautomers. The different species will be identified by the same two letter code employed for cytosine followed by one (for EA forms) or two (for the KA forms) numbers indicating the positions of the other two acidic hydrogen atoms.
All the methods agree in confirming that only four species (KA17 (I), KA19 (II), EA9 (III) and EAc 9 (III′)) should have non-negligible populations in the gas phase.133 The relative electronic energies of these species obtained at the PCS/F12 level (0, 234, 276 and 368 cm−1) are virtually identical (maximum difference of 8 cm−1) to their W1–F12 counterparts.134 The main effect of ZPEs is to increase the relative stability of all the other species with respect to the KA17 (I) rotamer by about 80 cm−1. However, scaled harmonic134 and anharmonic (present work) relative ZPEs show non negligible differences (maximum 60 and average 40 cm−1), with these results confirming the importance of refined vibrational contributions for obtaining accurate thermochemical data. In any case, the computed trends (larger population of KA tautomers with respect to EA tautomers) are in full agreement with their experimental estimates.133,135
The computed rotational constants of the four most stable forms of guanine are compared in Table 8 to their experimental counterparts. Already the rDSD/3F12− results are quite accurate, and correction of bond lengths by the PCS/bond approach further improves the accuracy, which becomes fully quantitative when vibrational corrections are also included. As a matter of fact, the final PCS/bond MUE% and MAX% (0.03% and 0.04%) are comparable with those delivered by the most sophisticated (and much more expensive) wave-function composite methods for small semi-rigid molecules.115,116
Species | Axis | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|---|
a The experimental ground state rotational constants taken from ref. 133 have been rounded to one decimal place. | ||||||
KA17 (I) | a | 1927.4 | 6.7 | −11.6 | 1922.5 | 1922.2 |
b | 1124.6 | 4.2 | −6.7 | 1122.1 | 1121.7 | |
c | 710.8 | 2.5 | −4.2 | 709.1 | 709.0 | |
KA19 (II) | a | 1927.7 | 6.7 | −12.0 | 1922.4 | 1922.3 |
b | 1119.3 | 4.1 | −6.5 | 1116.9 | 1116.7 | |
c | 708.6 | 2.6 | −4.2 | 707.0 | 706.7 | |
EA9 (III) | a | 1923.4 | 6.6 | −13.3 | 1916.7 | 1916.1 |
b | 1134.8 | 4.2 | −6.0 | 1132.9 | 1132.4 | |
c | 713.9 | 2.6 | −4.2 | 712.3 | 712.2 | |
EAc9 (III′) | a | 1931.2 | 6.6 | −13.8 | 1924.0 | 1923.5 |
b | 1138.6 | 4.2 | −6.1 | 1136.7 | 1136.0 | |
c | 716.5 | 2.6 | −4.3 | 714.8 | 714.7 | |
MAX% | 0.40 | 0.04 | ||||
MUE% | 0.27 | 0.03 |
The ‘soft’ dihedral angles governing the conformational landscape of α-amino acids belong either to the backbone (ϕ′ = LP–N–Cα–C′ and ψ = N–Cα–C′–O(H) dihedral angles) or to the side-chain (χ1 = N–Cα–Cβ–Xγ, χ2 = Cα–Cβ–Cγ–Yδ, etc.) where X, Y are generic substituents and LP is the nitrogen lone-pair perpendicular to the plane defined by the two amine hydrogens and the Cα atom. Only planar (or nearly planar) conformations are always allowed for the carboxy moiety (ω = Cα–C′–O–H ≈ 0° or 180°), with ω ≈ 0° being preferred, unless the hydroxyl hydrogen is involved in strong hydrogen bonds with other electronegative atoms. The c, g, s, t labels are used to indicate the cis, gauche, skew, and trans conformations determined by the above dihedral angles in the following order: ϕ, ψ, χ1, χ2, … In the case of proline, the ψ and ω dihedral angles retain the same definitions, whereas the puckering of the pyrrolidine ring can be described by two pseudorotation coordinates, the puckering amplitude (α) and the phase angle (τ).42,43
A systematic exploration of the conformational landscape of α-amino acids by the integrated strategy sketched in Section 2.1 provides several low-energy structures, most of which are stabilized by hydrogen bonds of type I (bifurcated NH2⋯OC, ϕ′ ≈ 180°, ψ ≈ 180°, ω ≈ 180°, ttt) and II (N⋯HO, ϕ′ ≈ 0°, ψ ≈ 0°, ω ≈ 0°, ccc).16 Low-energy conformers of type III (ϕ′ ≈ 180°, ψ ≈ 0°, ω ≈ 180°, tct) have also been found, but they usually relax to more stable I conformers overcoming the very small energy barriers governing rotation around the ψ dihedral angle.
In the following, one very rigid (cyclopropylglycine, Ac3c) and one extremely flexible (proline, Pro) amino acid will be analyzed in some detail, together with 4-fluorothreonine (4FT), which well illustrates the need (and power) for integrated computational strategies.
In the case of cyclopropylglycine (see Fig. 9) the conjugation between the three-membered ring and the carbonyl substituent strongly hinders any rotation around the ψ dihedral angle outside planar conformations. As a consequence, conformers I, II and III have Cs symmetry and lie in quite deep energy wells making Ac3c an ideal target for obtaining accurate structural parameters and vibrational corrections to rotational constants. In fact, the error bar of the PCS/bond rotational constants reported in Table 9 is much lower than the difference between the rotational constants of the various conformers, with this allowing a full a priori assignment of experimental spectra.
Conformer | Param. | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|---|
a The experimental rotational constants for Ac3c and Aib are taken from ref. 136 and 137, respectively, and have been rounded to one decimal place. | ||||||
Ac3c | ||||||
I(ttt) | a | 3993.1 | 15.3 | −33.3 | 3975.1 | 3973.8 |
b | 2664.9 | 9.7 | −22.9 | 2651.7 | 2656.3 | |
c | 1843.5 | 6.9 | −14.3 | 1836.1 | 1838.4 | |
II(ccc) | ΔE | 269 | 246 | |||
![]() |
325 | 313 | ||||
a | 3965.6 | 14.6 | −26.4 | 3953.8 | 3951.8 | |
b | 2691.1 | 10.0 | −27.6 | 2673.5 | 2678.6 | |
c | 1850.9 | 6.9 | −15.0 | 1842.8 | 1844.9 | |
III(tct) | ΔE | 133 | 138 | |||
![]() |
150 | 155 | ||||
a | 4043.4 | 14.9 | −31.3 | 4027.0 | 4026.9 | |
b | 2628.5 | 10.3 | −24.1 | 2614.7 | 2618.6 | |
c | 1836.1 | 6.8 | −14.1 | 1828.8 | 1831.1 | |
MAX% | 0.49 | 0.39 | ||||
MUE% | 0.37 | 0.14 | ||||
Aib | ||||||
I(ttt) | a | 3401.9 | 12.7 | −32.1 | 3382.5 | 3383.8 |
b | 2382.4 | 9.0 | −23.0 | 2368.4 | 2372.0 | |
c | 2019.3 | 7.6 | −13.7 | 2013.2 | 2015.3 | |
II(ccc) | ΔE | −113 | −118 | |||
![]() |
−3 | −8 | ||||
a | 3350.2 | 12.5 | −32.9 | 3329.8 | 3330.2 | |
b | 2442.4 | 9.2 | −25.8 | 2425.8 | 2415.4 | |
c | 2032.0 | 7.8 | −15.0 | 2024.8 | 2020.4 | |
III′(tgt) | ΔE | 264 | 269 | |||
![]() |
291 | 296 | ||||
a | 3402.2 | 12.6 | −31.8 | 3383.0 | 3385.2 | |
b | 2403.6 | 9.0 | −24.1 | 2388.5 | 2390.7 | |
c | 2004.2 | 7.6 | −13.6 | 1998.2 | 2000.7 | |
MAX% | 1.12 | 0.43 | ||||
MUE% | 0.52 | 0.14 |
In the case of aminoisubutyric acid (Aib, see Fig. 4) the presence of two methyl groups at Cα should again hinder the relaxation of conformer III to conformer I, and this expectation is confirmed by QC computations. However, the lack of conjugative contributions results in a larger degree of freedom to the ψ dihedral angle around the fully planar arrangement. As a result, conformer I retains Cs symmetry, but this is not the case for conformers II and III, in which ψ becomes −15° and −56°, respectively. Actually, the latter structure is better indicated as III′ since the bifurcated hydrogen bridge of the reference structure becomes a single hydrogen bond. On the other hand, the very small barrier connecting the two equivalent non-planar conformers of type II leads to a much larger error of the computed rotational constants due to the difficulty in describing vibrational corrections for large amplitude motions.
The results collected in Table 9 confirm the accuracy of rDSD/3F12− relative energies and show the expected reversal of the relative stability of conformers II and III, with inclusion of anharmonic ZPEs leading to an essentially equal stability of conformers I and II in Aib.
Replacement of both methyl groups with hydrogen atoms leads to glycine (see Fig. 4), which has a much larger conformational freedom due to the negligible steric hindrance of the substituents at Cα. As a result, conformer III has not been detected in MW studies (due to its easy relaxation to the most stable I counterpart) and conformer II loses the Cs symmetry again. As already anticipated, the other consequence of the increased flexibility is a worsening of the agreement between computed and experimental rotational constants, partially due to the reduced accuracy of vibrational corrections computed in the framework of the VPT2 model (cf.Tables 3 and 9). However, even under those circumstances, PCS/bond computations are largely sufficient to guide and interpret the experimental results.
The next prototypical system is proline (see Fig. 9), whose conformational PES shows several low-energy conformers of type I, II, and III. However, all the species of type III are too unstable to be detected in MW studies, whereas four structures (two of type I and two of type II) are found in the range of 600 cm−1. All those species show envelope structures of the pyrrolidine ring with either exo- or endo-like placements of the carboxy moiety (referred to as E+ and E−, respectively) and have been detected in the MW spectra.138,139 PCS/F12 computations show that the species of type II are significantly more stable than their counterparts of type I, with ZPE contributions slightly reducing the difference (see Table 10). As expected, the presence of the large-amplitude puckering of the pyrrolidine ring increases the errors of the computed rotational constants with respect to those obtained for Ac3c, Aib, and, even Gly (cf.Tables 3, 9 and 10). In particular, the Ba rotational constants are significantly underestimated (in some cases by more than 1%), whereas the Bb and Bc rotational constants are overestimated by comparable amounts. Furthermore, this is the first molecule, whose rDSD/3F12− equilibrium rotational constants and their PCS/bond ground state counterparts show comparable errors with respect to the experiment. However, even in those circumstances, the agreement between computed and experimental results remains sufficiently good to permit the unequivocal assignment of the detected species, which is further confirmed by quadrupolar coupling constants.140
Conformer | Param. | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|---|
a The experimental rotational constants are taken from ref. 139 and have been rounded to one and two decimal places, respectively. | ||||||
IIE− | a | 3718.4 | 15.1 | −46.9 | 3686.6 | 3673.9 |
b | 1681.8 | 5.6 | −7.1 | 1680.3 | 1688.4 | |
c | 1404.8 | 4.6 | −6.0 | 1403.4 | 1407.4 | |
IIE+ | ΔE | 215 | 200 | |||
![]() |
210 | 195 | ||||
a | 3992.8 | 16.1 | −46.6 | 3962.3 | 3923.6 | |
b | 1592.7 | 5.3 | −13.6 | 1584.4 | 1605.9 | |
c | 1271.7 | 4.3 | −9.7 | 1266.3 | 1279.8 | |
IE− | ΔE | 607 | 580 | |||
![]() |
462 | 435 | ||||
a | 3913.3 | 16.0 | −44.3 | 3885.0 | 3857.2 | |
b | 1577.1 | 5.2 | −16.5 | 1565.8 | 1590.5 | |
c | 1372.2 | 4.5 | −14.7 | 1362.0 | 1377.5 | |
IE+ | ΔE | 591 | 561 | |||
![]() |
474 | 444 | ||||
a | 4032.3 | 16.7 | −44.6 | 4004.4 | 4004.0 | |
b | 1568.5 | 5.2 | −13.6 | 1560.1 | 1567.3 | |
c | 1284.3 | 4.2 | −12.9 | 1275.6 | 1281.5 | |
MAX% | 1.76 | 1.55 | ||||
MUE% | 0.72 | 0.74 |
Fluorinated α-amino-acids have been attracting increasing attention because their introduction in specific domains can be used for tuning the stability, folding and biological activity of proteins.141 In this framework 4FT plays a central role since it is the only fluorinated amino acid found in nature.142 The identification of the most stable conformers of 4FT, potentially observable in rotational spectroscopy experiments, started from a knowledge-based step in which 21 conformers were generated by fluorination of the terminal methyl group in the 7 conformers detected for threonine (Thr).118,143 Next, additional low-energy conformers were detected using the IM-EA algorithm sketched in Section 2.1. The final panel of 12 conformers lying in the range of 750 cm−1 was submitted to full geometry optimizations at the B3/SVP and next rDSD/3F12− levels. Refinement of the relative electronic energies at the PCS/F12 level does not provide any significant change at least concerning the general trends, whereas inclusion of ZPE (leading to relative enthalpies at 0 K) induces significant differences in the stability order predicted by electronic energies, with a general destabilization of type II conformers.
In summary, 12 conformers should be detectable in the rotational spectra (see Fig. 10). Characterization of such a large number of conformers in rotational spectra is not an easy task, but the very accurate rotational constants obtained at the PCS/bond level provide an invaluable aid to the assignment of the very congested spectrum, with the results summarized in Table 11 pointing once again to a good agreement between theory and experiment.
Conformer | Param. | B eq (rDSD) | ΔBB | ΔBvib | B 0 (PCS) | B 0 (exp.)a |
---|---|---|---|---|---|---|
a The experimental rotational constants taken from ref. 144 have been rounded to one decimal place. | ||||||
I′gg−g− | a | 2498.2 | 14.0 | −21.3 | 2490.9 | 2486.9 |
b | 1034.5 | 4.6 | −9.3 | 1029.8 | 1034.1 | |
c | 994.8 | 4.2 | −8.5 | 990.5 | 993.0 | |
IItg−g− | a | 1912.2 | 12.2 | −13.5 | 1910.9 | 1906.1 |
b | 1433.4 | 4.8 | −14.0 | 1424.2 | 1434.0 | |
c | 1121.8 | 5.2 | −11.6 | 1115.4 | 1120.4 | |
IIggg− | a | 2611.7 | 14.0 | −24.6 | 2601.1 | 2597.9 |
b | 1074.4 | 4.7 | −11.0 | 1068.1 | 1065.8 | |
c | 948.4 | 4.1 | −9.8 | 942.7 | 950.2 | |
I′gg−g | a | 2831.0 | 14.0 | −26.1 | 2818.9 | 2822.2 |
b | 1047.8 | 4.5 | −11.0 | 1041.3 | 1045.5 | |
c | 949.9 | 4.3 | −7.6 | 946.6 | 948.9 | |
Igtt | a | 2977.5 | 13.3 | −27.2 | 2963.6 | 2966.8 |
b | 960.0 | 5.1 | −6.6 | 958.5 | 959.1 | |
c | 863.2 | 4.6 | −6.0 | 861.8 | 861.5 | |
IIg−tt | a | 2370.0 | 10.9 | −21.4 | 2359.5 | 2372.2 |
b | 1144.0 | 6.3 | −9.7 | 1140.6 | 1139.8 | |
c | 850.6 | 4.3 | −7.8 | 847.1 | 846.1 | |
IIggg | a | 2903.8 | 14.1 | −28.2 | 2889.7 | 2888.2 |
b | 1075.9 | 4.5 | −11.9 | 1068.5 | 1071.5 | |
c | 918.0 | 4.1 | −6.4 | 915.7 | 919.1 | |
IIg−g−g | a | 2310.1 | 12.2 | −20.1 | 2302.2 | 2307.5 |
b | 1246.5 | 6.0 | −11.7 | 1240.8 | 1242.2 | |
c | 924.0 | 4.0 | −9.0 | 919.0 | 920.7 | |
IIgtt | a | 3046.6 | 12.5 | −29.8 | 3029.3 | 3036.7 |
b | 937.1 | 4.9 | −7.3 | 934.7 | 935.3 | |
c | 845.4 | 4.4 | −6.3 | 843.5 | 843.7 | |
III′ggg− | a | 2523.3 | 14.3 | −20.1 | 2517.5 | 2522.3 |
b | 1056.4 | 4.4 | −11.2 | 1049.6 | 1051.4 | |
c | 932.1 | 4.3 | −7.8 | 928.6 | 931.8 | |
Ig−gg− | a | 1849.1 | 10.1 | −17.4 | 1841.8 | 1846.7 |
b | 1552.7 | 5.2 | −10.5 | 1547.4 | 1551.6 | |
c | 997.9 | 4.2 | −9.0 | 993.1 | 995.1 | |
Ig−gg | a | 2077.1 | 11.1 | −15.7 | 2072.5 | 2086.4 |
b | 1364.4 | 5.3 | −14.0 | 1355.7 | 1355.8 | |
c | 963.6 | 3.7 | −9.4 | 957.9 | 956.6 | |
MAX% | 0.81 | 0.79 | ||||
MUE% | 0.29 | 0.25 |
Systematic studies of other α-amino acids based on the nano-LEGO57,58,118 or PCS strategies confirmed that the backbone of species containing simple non-polar side-chains (e.g., alanine or valine) shows the same low-energy conformers of type I and II found for glycine, albeit with the reduced symmetry related to the presence of a chiral Cα atom.145,146 On the other hand, polar side-chains (as found, e.g., in serine, cysteine, threonine, aspartic acid, or asparagine) give access to backbone-(side-chain) hydrogen-bonds, with this strongly increasing the number of low-energy conformers.118 Additional interactions between backbone polar hydrogen atoms and side-chain π-systems are possible for amino acids containing aromatic moieties (like, e.g., phenylalanine and tyrosine).147 This diversified landscape is further enriched in amino acids showing specific features, like tautomerism (histidine) or heteroaromatic structures with non-equivalent rings (tryptophan).140
In order to focus the attention on the backbone, in the following only the glycine and alanine dipeptide analogues are explicitly considered, whose soft degrees of freedom are the ϕ (C′NCαC′) and ψ (NCαC′N) dihedral angles. In general terms, the conformational flexibility of the dipeptide analogues is smaller than that of the corresponding amino acids, with only the C5 (ϕ ≈ 180°, ψ ≈ 180°) and C7 (|ϕ| ≈ 90°, |ψ| ≈ 60°, with ϕ and ψ of opposite sign) conformers being populated in the gas phase (see Fig. 11). For chiral residues like alanine, two different situations, namely Ceq7 and Cax7 (with eq and ax standing for equatorial and axial, respectively), are possible; however, only the first one is experimentally accessible. Both C5 and C7 conformers of glycine15 and alanine (Ceq7)150 dipeptide analogues have been characterized accurately by rotational spectroscopy with the agreement between PCS/bond and experimental rotational constants being in the expected range for flexible molecules (see Table 12).
Axis | ΔBviba | B 0(exp.)b | B 0(rDSD/3F12−) | B 0(PCS/bond) |
---|---|---|---|---|
a At the B3/SVP level. b The experimental ground state rotational constants (taken from ref. 15, 150, 151 and 152 for glycine dipeptide analogue, alanine dipeptide analogue, uridine, and testosterone, respectively) have been rounded to one decimal place. | ||||
GlyDA C7 | ||||
a | −23.9 | 4421.3 | 4367.7 | 4385.6 |
b | −18.2 | 1214.2 | 1205.6 | 1209.0 |
c | −13.5 | 1081.3 | 1073.1 | 1075.9 |
GlyDA C5 | ||||
a | −42.3 | 5268.9 | 5205.0 | 5226.2 |
b | −7.4 | 1012.0 | 1003.7 | 1007.0 |
c | −5.1 | 857.2 | 851.6 | 854.5 |
MAX% | 1.21 | 0.81 | ||
MUE% | 0.89 | 0.56 | ||
AlaDA C7eq | ||||
a | −0.9 | 2598.8 | 2578.5 | 2588.8 |
b | −13.3 | 1125.7 | 1117.1 | 1120.6 |
c | −11.4 | 914.5 | 908.4 | 911.3 |
AlaDA C5 | ||||
a | −8.4 | 3100.3 | 3073.1 | 3085.1 |
b | −7.2 | 955.1 | 946.0 | 949.2 |
c | −7.2 | 812.7 | 809.3 | 812.2 |
MAX% | 0.95 | 0.62 | ||
MUE% | 0.74 | 0.39 | ||
Uridine | ||||
a | −9.1 | 886.0 | 880.7 | 883.9 |
b | −2.9 | 335.6 | 334.3 | 335.1 |
c | −2.0 | 270.1 | 268.7 | 270.2 |
MAX% | 0.60 | 0.24 | ||
MUE% | 0.50 | 0.14 | ||
Testosterone | ||||
a | −8.1 | 785.3 | 780.7 | 783.8 |
b | −1.6 | 168.7 | 168.0 | 168.6 |
c | −1.4 | 153.8 | 153.2 | 153.8 |
MAX% | 0.58 | 0.19 | ||
MUE% | 0.46 | 0.08 |
Dipeptide analogues containing more complex side chains have been investigated experimentally, but they do not add any new feature to the general trends since the detected conformers reduce to only the Ceq7 structure because of the increased backbone rigidity (e.g. proline dipeptide analogue153) or the formation of additional hydrogen bridges between the backbone and the side chain (e.g. serine dipeptide analogue154).
Accurate computational studies of nucleosides and hormones have not yet been performed, but the preliminary results for the uridine nucleoside and the testosterone hormone (see Fig. 11) reported in Table 12 suggest that the agreement between PCS/bond and experiment is on par with that obtained for other flexible systems. In the case of uridine, the PCS/bond rotational constants are an order of magnitude more accurate (MAX% and MUE% reduced from 2.4% to 0.24% and 1.9% to 0.15%, respectively) than their MP2 counterparts employed in the original experimental study,151 while requiring a comparable computational effort. This improvement permits the unbiased assignment of the detected conformer without any need for additional spectroscopic information (e.g., quadrupole couplings) or chemical intuition. As a matter of fact, the rotational constants of the most stable conformer perfectly match the experimental rotational constants (and the same applies to quadrupole couplings). Even lower MAX% (0.19%) and MUE% (0.08%) are obtained for testosterone, which is the largest molecule (49 atoms) considered in the present paper. While there are no ambiguities in the assignment of the rotational spectrum of this molecule,152 the situation could be different for other hormones (e.g., β-estradiol155) involving soft degrees of freedom, which are at present under investigation in our laboratory.
The last family of compounds considered in the present study is that of neurotransmitters, which are chemical compounds released by neurons at the synapse, where they bind to a receptor in order to transmit a signal to the target cell. Tryptamine (Try) and serotonin (5-hydroxytryptamine, Srt) have been chosen for illustrative purposes due to their average dimension and high flexibility (see Fig. 12). The structure of both compounds is closely related to that of the tryptophan amino acid, which has been recently studied in detail.140 As a consequence, the same notation employed for amino acids can be extended to the three dihedral angles (ϕ′, χ1, χ2) ruling the conformational behaviour of the considered neurotransmitters. Of course, ψ and ω dihedral angles are now lacking, whereas the additional dihedral angle involving the hydroxyl group of serotonin is frozen in syn or anti orientations, with the former one being preferred for all the low-energy conformers.156
In analogy with tryptophan, the preferred conformation of the χ2 dihedral angle (governing the position of the indole ring) is close to 90° (broadly referred to as g) and the more stable structures are characterized by the interaction of one amine hydrogen with the π-system of the phenyl (χ1 ≈ 60°) or pyrrole (χ1 ≈ −60°) ring of indole. These interactions are possible for two orientations of the amine lone pair corresponding to ϕ′ ≈ 60° or 180°. The ggg and gg−g conformers have comparable stability in serotonin (ΔE = 11 cm−1 at the rDSD/3F12− level), whereas only the gg−g conformer has been detected in the rotational spectrum of tryptamine.157 On the other hand, gg−g and tg−g conformers have been detected for both neurotransmitters, with the gg−g conformer being slightly more stable according to rDSD/3F12− computations (128 and 120 cm−1 for Try and Ser, respectively).
The rotational constants of the conformers detected in MW spectra are collected in Table 13. The remarkable agreement between experimental and PCS/bond values confirm that double-hybrid functionals also deliver extremely accurate structural parameters for flexible systems, but unbiased comparison with experiment requires the inclusion of vibrational corrections and (to a lower extent) core–valence correlation contributions.
Axis | ΔBvib![]() |
B 0 (exp.)b | B 0 (rDSD/3F12−) | B 0 (PCS/bond) |
---|---|---|---|---|
a At the B3/SVP level. b The experimental rotational constants (taken from ref. 157 and 156 for tryptamine, and serotonin, respectively) have been rounded to one decimal place. | ||||
Tryptamine I (gg−g) | ||||
a | −11.9 | 1730.2 | 1722.2 | 1728.2 |
b | −4.9 | 681.9 | 680.0 | 681.7 |
c | −3.8 | 551.5 | 549.9 | 551.3 |
Tryptamine II (tg−g) | ||||
a | −9.1 | 1709.4 | 1704.7 | 1710.7 |
b | −5.5 | 681.9 | 678.7 | 680.4 |
c | −3.8 | 550.8 | 548.7 | 550.1 |
MAX% | 0.47 | 0.22 | ||
MUE% | 0.36 | 0.10 | ||
Serotonin I (ggg) | ||||
a | −5.7 | 1163.2 | 1157.8 | 1161.5 |
b | −7.1 | 650.6 | 648.9 | 650.7 |
c | −3.2 | 450.1 | 448.5 | 449.7 |
Serotonin II (gg−g) | ||||
a | −5.6 | 1286.5 | 1283.9 | 1288.2 |
b | −5.2 | 571.8 | 568.8 | 570.3 |
c | −3.7 | 435.6 | 433.6 | 434.7 |
Serotonin III (tg−g) | ||||
a | −5.8 | 1267.0 | 1263.8 | 1268.0 |
b | −3.5 | 574.6 | 573.1 | 574.6 |
c | −3.3 | 436.0 | 434.1 | 435.2 |
MAX% | 0.52 | 0.26 | ||
MUE% | 0.36 | 0.12 |
As already mentioned, most molecules of biological interest are highly flexible and the search for stable structures on the corresponding rugged potential energy surfaces has taken great advantage of the development of multi-level QC methods driven by ML algorithms.27 However, the underlying problem of large amplitude modes limits the accuracy of any perturbative approach to vibrational corrections. The challenges related to this issue depend on the strength of the couplings of the large amplitude modes among themselves and with the other small amplitude modes. Whenever these couplings are small enough (e.g., isolated torsions or ring deformations), the large amplitude modes can be removed from the perturbative treatment and their contribution taken into account by different one-dimensional discrete variable representations.158 The details and successful applications of this procedure can be found, for example, in ref. 159–161. In this connection, the development of an effective VPT2 engine based on generalized internal coordinates40 is particularly promising since the inter-mode couplings are much reduced with respect to the traditional implementation in Cartesian coordinates. However, the general case of strongly coupled large amplitude modes calls for new ideas and implementations.
Even taking these remarks into consideration, the reduced computational cost and the user-friendly implementation of the new model pave the way toward widespread application, which could also be used by non-specialists in the assignment and interpretation of experimental results.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cp05169b |
This journal is © the Owner Societies 2024 |