The Houk-List transition states for organocatalytic mechanisms revisited

The ten year old Houk – List model for rationalising the origin of stereoselectivity in the organocatalysed intermolecular aldol addition is revisited, using a variety of computational techniques that have been introduced or improved since the original study. Even for such a relatively small system, the role of dispersion interactions is shown to be crucial, along with the use of basis sets where the superposition errors are low. An NCI (non-covalent interactions) analysis of the transition states is able to identify the noncovalent interactions that in ﬂ uence the selectivity of the reaction, con ﬁ rming the role of the electrostatic NCH d + / O d (cid:1) interactions. Simple visual inspection of the NCI surfaces is shown to be a useful tool for the design of alternative reactants. Alternative mechanisms, such as proton-relays involving a water molecule or the Hajos – Parrish alternative, are shown to be higher in energy and for which computed kinetic isotope e ﬀ ects are incompatible with experiment. The Amsterdam manifesto, which espouses the principle that scienti ﬁ c data should be citable, is followed here by using interactive data tables assembled via calls to the data DOI (digital-object-identi ﬁ ers) for calculations held on a digital data repository and themselves assigned a DOI.


Introduction
The promotion of a wide range of organic synthetic reactions using new generations of so-called organocatalysts (metal free systems) is highly topical.So too is the increasing adoption of computational modelling to chart the most probable mechanistic pathway and to establish the factors responsible for reactivity, selectivity, and particularly stereoselectivity. 1 In this regard, one report 2 in 2003 of computational investigation of the proline-mediated asymmetric intermolecular aldol reactions (Scheme 1) has achieved the rare distinction of having the key transition state model being named aer the two principal authors; Houk and List.This computational model has informally become known as the gold standard for such investigations; since it exemplied the procedures for comparing computational modelling with experimental results for organocatalysed reactions and outlined a methodology for predicting the stereoselectivity of new organocatalysed reactions.
A one-proline mechanism based on enamine activation, the Houk-List model involves the transition state for stereogenic carbon-carbon bond formation.This is the rate-determining step 2 of the intermolecular aldol catalytic cycle in which the catalytically active enamine attacks an electrophile.The carboxylic acid group of proline plays a central role in the model, directing the electrophile to the Re face of the enamine.The enamine can be either anti or syn, relative to the acid, and the electrophile can offer two prochiral faces for attack, Re or Si, resulting in four different stereochemical outcomes (Scheme 2).
Informed by extensive computational studies, [2][3][4][5] Houk and Bahmanyar suggested that the energy differences between these transition states, and so the origin and degree of stereoselectivity displayed by the reaction, depends on two critical structural elements: the relative degree to which each transition state can adopt a planar enamine, and the degree of electrostatic stabilisation provided to the forming alkoxide.A planar enamine allows for the greatest possible nucleophilicity of the terminal olen while also reducing the geometric distortion experienced by the forming iminium group.The proton transfer from the carboxylic acid to the forming alkoxide was suggested as providing the majority of the electrostatic stabilisation and is key to the Houk-List model. 2,6Smaller, yet important, stabilising contributions also result from NCH d+ /O dÀ interactions from the pyrrolidine ring. 7ery good agreement between calculated and experimental stereoisomer ratios was observed, with the (S,R)-isomer most favoured for both R ¼ Ph and i Pr (Scheme 2).The challenge of computational methods then is to establish a level of reliability which can be used to predict the outcome of further reaction variations.Any success of such models can then be used to build condence for diversication to new reactions and mechanisms.
][12] However, despite the apparently excellent comparison made between computational and experimental selectivities, for which this research became widely known, two signicant issues have come to light regarding the details of the proline-cyclohexanone-benzaldehyde system.Houk has since disclosed that calculations carried out subsequent to publication, in which a greater volume of conformational space was investigated, revealed that structures of the transition states with lower energies exist; 1,13 potentially altering the predicted selectivity of the reaction.In addition, repetition of the experimental results by List found product selectivities to vary, with the reaction found to be extremely sensitive to both water content and to temperature.In the present article, we set out to establish whether the success of the Houk-List model as a concept, and the reportedly excellent agreement between computational and experimental results, is just coincidence or whether the faith in it is indeed justied.

Advances in computational methods
The ten years that have elapsed since the original report have seen many incremental, and sometimes dramatic, improvements to computational methods used.In the present article, we re-evaluate the Houk-List model by taking advantages of these; the gold standard itself must evolve.A more accurate computational model in turn allows the experimental ndings to be subjected to improved scrutiny, which can potentially in itself result in suggestions for further experimental tests.
Because fully substituted reacting systems can oen have >20 atoms and the organocatalysts themselves are potentially large molecules, the level of theory adopted to construct the Houk-List model in 2003 was (as always) a pragmatic compromise between the quality of the theory and the requirement to be able to compute the model in a reasonable time (a four day batch run is a typical resource available to most).The level originally selected by Houk and co-workers is commonly iden-tied by the abbreviation B3LYP/6-31G*.Geometries, activation enthalpies, and free energies (DG 298 ) based on calculating the normal vibrational modes at this level were obtained for a gas phase model.To these energies (and at these geometries) various further corrections could be added using the previously computed geometries, including an estimation of the differential solvation energy computed for a polarizable continuum solvation model.This compound procedure was then applied to an exploration of the conformational space available to the system.Because this has many degrees of freedom for even quite simple reactions, only the most likely conformations were explored.So how can this basic approach be improved upon in 2014?We have focused on six aspects set out below: 1.The B3LYP density functional as used has, over the last twenty years, proven to be a very effective one for the study of reaction mechanisms, and it has been subjected to extensive testing and scrutiny during that period.5][16] As a result, one particular problem was identied with this functional; it captures only short range dynamic correlation energy and fails to capture longer range correlations.This term can be identied in part as the dispersion interactions between non-bonded atoms (also referred to as the non-covalent interactions or NCI).It is now recognised that the most effective way to correct a DFT method for dispersion terms is to add empirical corrections to the nuclear-nuclear repulsion terms, and three or more generations of methods have been developed over the last decade to achieve this.Grimme 17 in particular has pioneered this approach and we have adopted his third generation procedure, known as D3, to correct the B3LYP method for these energy terms.Most modern functionals also incorporate such terms, and the uB97XD method, which incorporates a version of Grimme's earlier D2 correction, was developed ve years ago 18 to specically include both dispersion and the ability to reproduce reaction barriers.
2. The basis set quality is another feature that can be steadily improved, as computers have increased in speed, aspiring of course to a practical complete-basis-set limit (CBS).In 2003, the limit using conventional computational resources to the total number of basis functions that could be used for a modelling study was about 650 functions. 19For a 6-31G* basis set as used by Houk (the modern notation would be 6-31G(d), indicating only d-polarisation functions on non-hydrogen atoms) that would normally mean a limit of around 100 atoms, or perhaps up to about 120 if only the reacting core was specied at this level with sterically large substituents dened at a much smaller basis set level, say STO-3G. 19At this size, using a two or four parallel processing computer of the period, the second derivatives of the energy with respect to geometry (the Hessian, required to locate and characterise transition states) could be computed in about 170-200 hours of elapsed time.This would naturally limit the degree of conformational exploration that could be undertaken.The technological advances over a decade, the more common availability of 12/16-processor systems with much larger memory (improved from $4 Gbyte to $94 Gbytes) and faster algorithms for analytically computing the Hessian have reduced the elapsed time for such a calculation by a factor of about 100.Consequently, a much better level of basis set can now be deployed on molecules of such size.In the present study, we have used the TZVP basis, 20 a triple-z quality set with polarisation functions on both hydrogen as well as nonhydrogen atoms.This basis was selected since the Grimme-D3 dispersion corrections have been specically parametrised for such basis sets.This now gives an opportunity to evaluate the errors due to basis-set-superposition that might have been incurred using the 6-31G* basis.For the system R ¼ Ph (Scheme 1), this results in 556 basis functions for the calculation, compared with 376 for the 6-31G* basis and further allows selected intrinsic reaction coordinates (IRC) to be computed in a reasonable time.The IRC procedure requires the DFT-based Hessian to be evaluated 20-50 times, each taking about 1 hour using the TZVP basis, in order to map a full IRC.This requirement imposes a current practical limit of $800 basis functions for such a study.It is worth noting that this size of basis precludes the use of other Hamiltonians based on perturbative expansions such as MP2, an alternative procedure for evaluating dynamic correlation.Even on a very large memory 64-processor system, it was not possible to locate a single transition state for the TZVP basis in any reasonable time using this method.
3. It did prove feasible using the TZVP basis to perform a more systematic exploration of conformation space (Scheme 2), and even to apply the larger QZVP basis (1944 basis functions for R ¼ Ph) to selected points.
4. The implementation of continuum solvation models has also improved considerably since 2003. 21Previously, optimisation of molecular geometries with such models was unreliable, since the rst and second derivatives of the energy in a solvent continuum oen resulted in small inhomogeneities due to the way in which the solvent cavity was constructed.York and Karplus 21 revolutionised this with the introduction of an algorithm for obtaining a smoothed solvent cavity and since about 2010 it has been possible to fully optimise geometries within such a model. 22This means that a more self-consistent solvation approach can be undertaken, whereby the normal mode frequencies required to correct for thermal and entropy terms can be computed for geometries obtained using the solvation model at the relaxed solvation geometry, rather than simply being an ad hoc correction applied to a gas phase model.The reaction can develop charge separation during its course (to form a zwitterion), and (continuum model) solvation of chargeseparated species can in turn have a signicant impact upon their geometries and consequent properties.
5. The Houk-List model recognised the effect of solvent by correction for solvation free energies, but it did not also include the effects of discrete additional molecules in the model.For example, it is possible in any reaction that produces water as a product that at least one explicit water molecule could participate in the mechanism, either actively by acting as a proton relay or more passively by forming key additional hydrogen bonds.Indeed, mechanisms involving as many as four discrete waters acting as consecutive proton relays have been investigated.One consequence is that the overall free energy of the reaction can decrease.Of course, such a super-molecule can itself be treated by a continuum solvent model layered on top of the explicit model.Here we report an exploration of the role of water as both proton relay and as hydrogen bonding solvent.
6.In 2003, optimised geometrical coordinates, but no other derived properties (the full computed wavefunction, an IRC pathway, etc.) were inserted into paginated PDF documents as text and submitted as the ESI.† Re-using this data by extraction from the PDF document is non-trivial, as we ourselves experienced in re-using the original Houk-List data.The introduction 23 of open digital chemical data repositories in 2005 has revolutionised this aspect of data curation and citation. 24All the results in the present study are presented in the form of interactive tables, themselves assigned a citable persistent digital-object-identier (DOI), and in which the DOI assigned to individual calculations is used to retrieve and visually present the calculation log le from the appropriate digital repository.

Results and discussion
The Houk-List base model The investigation began by locating the Houk-List transition states for the four stereochemical outcomes presented in Scheme 2. The original study located seven transition states for R ¼ Ph and eight for R ¼ i Pr.Two transition states differing in the conformation of the cyclohexene ring were located for each stereoisomer, apart from (S,R) anti R ¼ Ph, where only one conformation was located.In the present study, we considered four conformations for each stereochemical outcome, resulting in 16 transition states for both R ¼ Ph and R ¼ i Pr.This expanded conformational analysis includes "chair" or "twist-boat" conformations for the cyclohexene ring and puckering of the proline ring away from or towards the proton transfer (Scheme 3).
A repeat of the Houk-List calculations for R ¼ Ph is presented in Table 1; calculated at the same theoretical level as previously employed 2 but including the additional conformations described above.As well as the relative DFT energy (DE), populations are shown based on the free energy (DG 298 ), the activation enthalpy (DH 298 ) and free energy of solvation; each of which were considered in the original study.Mindful that it is not practical to replicate the study by using the same versions of the computer codes that were available ten years earlier in 2003, a congruence of 0.01 kcal mol À1 between the energy differences (DDH 298 ) previously reported and those calculated using the modern versions of the codes 25 assures us that the basic methodology is highly reproducible.
We nd that the three additional conformations for the observed (S,R) anti product (conformations 1, 2 and 4) are all lower in free energy than the single conformation previously reported for that diastereomer (conformation 3).
The original study predicted high enantioselectivity, but low diastereoselectivity, for the reaction with benzaldehyde (based on DH 298 ).This was in good agreement with the $1 : 1 diastereomeric ratio that was experimentally determined by List.However, inclusion of these new low energy transition states changes the predicted diastereoselectivity of the reaction so that it is highly selective towards the (S,R) anti product and is no longer in good agreement with experiment.

Basis set superposition error
Having addressed the apparently incomplete exploration of conformational space, the next priority was to explore the effects of incorporating the changes in methodologies noted in the introduction.A recent article argued the importance of accounting for BSSE (basis set superposition error), as well as including dispersion corrections, for thermochemical calculations using the B3LYP/6-31G(d) method (the same level of theory used in the Houk-List study). 26Compared to metal or protein based catalysts, typical organocatalytic systems consist of relatively few light main group elements.Therefore, the effects of BSSE are oen assumed to cancel or be of a magnitude small enough to routinely ignore when calculating energies of organocatalytic transition states with similar connectivity.To Scheme 3 The four considered conformational possibilities for the asymmetric aldol condensation.test this assumption, we have evaluated the difference in BSSE between the anti and syn transition state geometries using the Boys-Bernadi counterpoise method for four basis sets: the original Houk-List 6-31G(d), the commonly used rather larger 6-311G(d,p), the triple-z-quality TZVP and the quadruple-zquality QZVP.The calculations were performed in Orca (V.2.9.1) 27 using benzaldehyde and the cyclohexanone-proline enamine as fragments, with overall geometries fully optimised at each respective level of theory.These results (Table 2) clearly show that errors due to BSSE differences between stereoisomers can be of the order of 1 kcal mol À1 at the original 6-31G(d) level selected in 2003.These relative errors are still relatively large ($0.8 kcal mol À1 ) for the larger 6-311G(d,p) basis and only become insignicant for the TZVP basis ($0.2 kcal mol À1 ).For the best QZVP basis set, even the absolute BSSE error has become insignicant ($0.2 kcal mol À1 ) although the general use of this basis is precluded because of computational cost.From these results, we suggest that the relatively compact TZVP triple-z-quality basis is the minimum appropriate for delity in stereochemical prediction, resulting in optimisations fast enough to allow exploration of multiple conformations when needed.

The effects of the dispersion correction
To examine any effects a dispersion correction might have on energy and structure, the 16 transition states from the expanded conformational analysis (Scheme 3) were re-located for the benzaldehyde (R ¼ Ph) and isobutyraldehyde (R ¼ i Pr) systems.The geometries were optimised with the larger triple-z-quality TZVP basis set with inclusion of the CPCM solvation model, with and without DFT + D3 correction 17 as applied to the B3LYP hybrid functional.Thus B3LYP+D3/TZVP/SCRF-CPCM¼DMSO is now dened as our base standard.
The results of these calculations are summarized in Fig. 1 and Tables 3 (R ¼ Ph) and 4 (R ¼ i Pr).The most obvious geometrical effect is on non-bonded distances within the transition states, which can alter by up to 0.33 Å (Fig. 1).Smaller more subtle effects on the geometry of the reaction centre (Scheme 1) are induced.The creation of two stereogenic centres via the formation of a new C-C bond is accompanied by a key proton transfer from the carboxyl group to the oxygen of the carbonyl substrate.At the new base-level of theory, inclusion of the D3-correction for the transition state geometries for R ¼ Ph and R ¼ i Pr (Fig. 2) changes the optimised forming C-C length from 2.251 to 2.273 Å, the proton transfer geometry from 1.242/ 1.166 to 1.247/1.162Å, and the planarity of the enamine from 174.9/174.8 to 177.2/176.1 (R ¼ Ph).The corresponding changes for R ¼ i Pr are 2.159 to 2.115 for the forming C-C bond, 1.154/1.262to 1.151/1.269Å for the proton transfer, and 175.5/ 173.7 to 178.5/176.1 for the planarity of the enamine.Larger variation is induced by a change of functional (to uB97XD 18 ); the latter values are similar to that obtained using MP2/6-31G(d,p)/SCRF¼DMSO.We note that this smaller 6-31G(d,p) basis set is in fact the best basis for which a practical MP2 calculation can be run on the largest resource available to us, a 64-processor 88 Gbyte memory system.An intrinsic reaction coordinate for R ¼ Ph and R ¼ i Pr computed at B3LYP+D3/TZVP/SCRF¼DMSO (Fig. 3) shows C-C bond formation is synchronous in both cases with proton transfer.There is no sign of what Cremer 29 has referred to as a hidden intermediate, the frustrated formation of a minimum sandwiched between C-C bond formation and proton transfer.
Energies resulting from the more complete conformational exploration are set out in Table 3 for R ¼ Ph and in Table 4 for R ¼ i Pr.These results include the effects of the D3-dispersion correction on the computed relative energies, the total superposition error for this correction ranging from À36 to À39 kcal mol À1 .Both the magnitude, and in particular the variation ($3 kcal mol À1 ), may come as a surprise, since the accepted wisdom tends to the belief that such corrections are only signicant for much larger systems.
For both the R ¼ i Pr and R ¼ Ph systems, inclusion of D3 corrections results in an increase in the planarity of the enamine.Examining the enamine geometries for R ¼ Ph, two clear trends appear; the enamine nitrogen is signicantly (by 5 to 10 ) less planar in the enantiomer, (R,R) and (R,S), conformations than compared to the (S,R) and (S,S) conformations.In addition, enamine planarity is loosely correlated to relative stability; the lowest energy structure has the most planar enamine and vice versa.The lowest energy structure, (S,R) conformation 2, has an almost totally planar enamine (177.2 ) whereas the highest energy structure, (R,S) conformation 3, is highly pyramidalised (167.0 ).
The results for R ¼ Ph fully conrm the earlier conclusion that the (S,R) stereoisomer is the dominant species formed; the nearest alternative stereoisomer is conformation 2 of the (S,S) form, which emerges as 4.59 kcal mol À1 higher in free energy using B3LYP/TZVP/SCRF¼DMSO without dispersion correction, but the difference is reduced to 2.97 kcal mol À1 at this basis if the D3 correction is included in the procedure.Overall, we see that inclusion of D3 corrections for this system tends to decrease the predicted enantioselectivity for this model, down to 98.6% ee from effectively 100%, without dispersion.
We also recalculated the transition states for the most important pair (R ¼ Ph, conformation 2, (S,R), (S,S)) at the very large QZVP basis, resulting in an increase in the number of basis functions from 556 to 1944.The time taken for the DFT calculation scales as $N 3 -N 4 (N ¼ number of basis functions), an increase of $100 fold, making the use of such a large basis an impractical approach for general mechanistic exploration.DDG ‡ 298 increases slightly from 2.97 to 3.27 kcal mol À1 (Table 3), a result that might be assumed to be close to the complete basis  set (CBS) limit.This result is nevertheless still at odds with the reported apparent experimental diasteromeric ratio of between 1 : 1 and 4 : 1 (DDG ‡ 298 < 0.8 kcal mol À1 ) but as we note in the introduction, the experimental value may be sensitive to both water content and temperature.Similar conclusions can be drawn for R ¼ i Pr (Table 4).
The maximum difference in energy between the lowest energy conformer of the (S,R) form and the highest energy conformer of the least stable diastereomer (R,S) is 7.88 kcal mol À1 (B3LYP+D3/TZVP), 7.72 kcal mol À1 (uB97XD/TZVP, which includes an implicit D2-dispersion correction) and 8.11 kcal mol À1 (MP2/6-31G(d,p); it proved impractical to use the larger TZVP basis for this calculation), a result which is gratifyingly not overly sensitive to the method used.

Non-covalent interactions
In general terms, stereoselectivity is controlled by a combination of stereoelectronically-induced bond orientations in the transition state, but inuenced by non-bonded weak interactions within the framework.Although the balance of these latter interactions can be difficult to compute from total energies alone, it has been recently shown 30 that the reduced   density gradient (RDG) can be used to reveal the interactions that are present in the system (see ref. 28 for more details on the method).This index enables the identication and characterization of weak interactions of various strengths as chemically intuitive RDG isosurfaces that reveal both stabilizing (hydrogen-bonding interactions in blue, van der Waals interactions in green) and destabilizing interactions (steric clashes in red, and weaker ones to yellow).This approach can be much more effective at revealing the non-bonded interactions present and also provide a more realistic analysis than pure topological indices such as QTAIM. 31Thus it represents a semi-quantitative and highly visual manner in which to analyse what is oen merely described in a non-quantitative (and oen intuitive) manner merely as transition state steric clashes and hydrogen bond/ electrostatic attractions.
All NCI gures can be found in the interactive versions of Tables 2 and 3.In order to highlight certain features, three representative examples from the R ¼ Ph reaction have been chosen.Fig. 4 dissects the diastereoisomer with the highest energy (ent-anti), the diastereoisomer with the lowest energy (anti) and the diastereoisomer with the highest dispersion correction (syn).The cutoff (r ¼ 0.1 a.u.) has been chosen to isolate the purely non covalent interactions along with the C-C formation.At rst glance, both the anti and the syn conformers show a greater RDG surface, which conrms the role of noncovalent interactions in stabilizing the diastereoisomeric transition states.For a more detailed analysis, the most relevant interactions have been highlighted in Fig. 4. The deep blue feature corresponds in all three cases to the forming C-C bond.It corresponds to a mid-range interaction, in-between covalent and non-covalent.Along with the C-C formation, two purely non-covalent regions appear: 1.The region of around the heteroatoms (C]O/N) shows stabilizing features in all the conformers.This feature is most important in the anti and syn conformers, whereas it is much weaker in the ent-anti and ent-syn ones.This 3D view coincides with previous approaches, which locate the relevance in the NCH d+ /O dÀ interaction.However, along with the electrostatic interactions, green dispersive interactions appear elongating the NCI feature (see for example Fig. 4b) which highlight the importance of the planarity of this region (highlighted by the distances in Table 3) and which cannot be merely explained by electrostatics.
2. An extra region appears in the syn conformer (Fig. 3c), a green surface between the proline and the R ¼ Ph group.This interaction can be identied as tilted T-shape interaction or as a p-facial hydrogen bond (a gure for such interactions is provided in the ESI † for reference).It is important to note that this new interaction, which stabilizes the syn conformers, had not been identied before by mere geometric inspection.However, its presence enables us to explain the fact that the syn conformers are the ones with the greatest dispersion correction (Table 3).
Thus, a combination of NCH d+ /O dÀ electrostatics and dispersion (either in the NCH d+ /O dÀ or T-shape/p-facial Hbond in the ring region) determine the outcome of the reaction, with only anti and syn as observable diastereoisomers (see the ESI † for a more detailed analysis).It should be noted that this balance of electrostatic and dispersive interactions highlights, once again, the necessity to include dispersion effects in the calculations, else the correct energetics and geometries would not be obtained. 32,33

Kinetic isotope effects
The utility of having a computed force constant model for the transition state is that it can yield kinetic isotope effects for the reaction. 34For example, Meyer, Houk, and co-workers experimentally and computationally investigated 13 C KIEs for the Hajos-Parish-Eder-Sauer-Wiechert reaction, suggesting that the rate determining step of the proline-mediated intramolecular aldol reaction occurs prior to C-C bond formation. 35n contrast, for the proline-mediated intermolecular aldol reaction between acetone and o-chlorobenzaldehyde, Armstrong, Blackmond, and co-workers reported a normal isotope effect of k H /k D ¼ 1.93-2.26when using acetone-d 6 . 36Previous kinetic investigations of the reaction had led the researchers to  3).The gradient isosurfaces (s ¼ 0.5 a.u.) are colored on a BGR scale according to the sign (l 2 )r over the range À0.03 to 0.03 a.u.
expect an inverse isotope effect.The observation of a small, normal isotope effect implied that the nal KIE was the result of a balance between normal and inverse isotope effects resulting respectively from synchronous proton transfer and C-C bond formation involving a change of C-hybridisation from sp 2 .This implies that such isotope effects are indeed a sensitive measure of how realistic a computed transition state is.Computational modelling at the original Houk-List level of B3LYP/6-31G(d)/ SCRF¼DMSO level of theory had predicted a KIE of 1.97, thus offering further support for this hypothesis.This evidence also allowed the reasonable inference that nucleophilic addition of the enamine to the electrophile was indeed the rate determining step of the catalytic cycle by kinetic investigations, a process aided by a more or less synchronous proton transfer to the carbonyl oxygen from the carboxyl proton.
We felt it important to establish that the various corrections described above retain this behaviour.We report calculated KIEs (Table 5) for the proline-mediated intermolecular aldol reaction between cyclohexanone and benzaldehyde and also the acetone and m-chloro-benzaldehyde system for which previously measured values had been reported. 36The calculations have been performed using the lowest energy transition states, as established in the previous section.The predicted isotope effects again reect the synchronous nature of the reaction revealed by the IRC for R ¼ Ph, exhibiting normal deuterium and carbon effects and a small inverse oxygen effect (R ¼ Ph).The calculated isotope effects for the enamine formed from proline and acetone reacting with m-chloro-benzaldehyde in acetone-d 6 as solvent are more complex, since deuterium is incorporated into three distinct locations.Thus, the observed effect is a composite of a primary hydrogen KIE originating from the proton transfer and an inverse one originating from the isotope attached at the carbon forming the C-C bond.The agreement between experiment and theory is if anything even more congruent.

Passive explicit solvation
Patil and Sunoj have computationally investigated the role of explicit solvent molecules in the modeling of the prolinemediated conjugate addition reaction in polar protic solvents, at the mPW1PW91/6-31G(d) and B3LYP/6-31G(d) levels of theory. 10The researchers proposed, that only via the explicit inclusion of two methanol molecules in a cooperative hydrogenbonding network between the carboxylic acid proton and the oxygen atoms of a nitro group, could the correct stereochemical outcome of the reaction be predicted.This study however is based on the use of total calculated energies for transition states, rather than the more correct thermal and entropy-corrected relative free energies for the reaction prole.In addition to molecules of solvent, additives such as triethylamine 9 and DBU 37 have been shown to play an important role in determining the selectivity of proline-mediated reactions through their explicit inclusion in transition state calculations.
Here, we investigate not the role of additive but that of water in the Houk-List transition state.The rst step of the catalytic cycle, which involves the formation of an enamine from the carbonyl compound and proline, liberates one water molecule.Potentially, this water could then participate in the subsequent reaction.Here, a single water molecule is introduced into the Houk-List model (R ¼ Ph) as a passive solvent for one of the three oxygens via a hydrogen bond, augmenting the continuum solvation already applied.The effect this has on the relative energies of the anti and syn diastereomeric transition states is shown in Table 6.Addition of one explicit water molecule to the model was computed to induce only a small change in the computed free energy difference between (S,R) and (S,S) stereoselectivity, changing it from 2.97 to 3.15 kcal mol À1 .Models which include more water molecules were not studied, not least because the stochastic complexity increases rapidly.

Proton relay mechanisms
Beyond a passive role, it is possible that a single molecule of water could participate directly in the Houk-List mechanism, acting as an active proton transfer relay between the proline carboxylic acid group and the incoming aldehyde (R ¼ Ph) (Table 7).This results in the expansion of the ring size of the cyclic model by two atoms, thus allowing some stereochemical models previously excluded on the basis of strain to be included.Proton relay mechanisms in organocatalysis have precedent.One of the initial mechanistic proposals for the intermolecular aldol reaction, the HPESW reaction, involved a second molecule of proline acting as a proton transfer agent during C-C bond formation, based on the apparent observation of a non-linear effect. 38 Patil and Sunoj have also investigated the role of a proton-relay mechanism in hemiacetal-, iminium, and enamine formation for a model system mimicking the rst step in the catalytic cycle of proline catalysis, at the mPW1PW91/6-31G(d) level of theory. 40It was found that protic additives, such as methanol, allow for signicantly lower activation energies of these pathways.The reported energies did not include the effects of entropy, which would be expected to result in signicantly higher free energies of activation compared to those based on total energies alone.Here we present computed free energy  OH and the three C-H groups on the cyclohexanone-derived enamine.e Reactant DOI: pn7.f Values for proton relay mechanism involving an additional water molecule.g d 6 on OH and the ve C-H groups on the acetone-derived enamine.h Reactant DOI: pn8.i Reactant DOI: ppb, transition state DOI: pn9 for the system reported in ref. 34.differences for the proton relay mechanism, precisely isomeric with the passive mechanism described in the previous section.
Four features are noteworthy.1.The (S,S) diastereomer is now predicted as slightly lower in free energy than the observed (S,R) products.
2. The lowest free-energy for a proton-relay transition state is nevertheless 9.1 kcal mol À1 higher than that of the passively solvated isomer.Because proton transfer reactions in particular can depend on the nature of the density functional used, we also evaluated this energy difference at the uB97XD/TZVP level.The free energy difference of 6.4 kcal mol À1 is smaller than using B3LYP+D3, but still clearly excludes the involvement of water in the proton-relay.This model would also predict the incorrect stereochemical outcome; it may not always be so, of course, for other mechanistic variations.
3. The intrinsic reaction coordinate (B3LYP+D3/TZVP/ SCRF¼DMSO, Fig. 5) indicates that the presence of the proton relay delays C-C bond formation; its length at the transition state is now a much shorter 1.920 Å (compared to 2.273 Å without such intervention).Only aer this transition state is passed does the proton transfer commence.At a value of the IRC indicated with an arrow (Fig. 5b), C-C bond formation is largely complete (1.676 Å) and a so-called hidden intermediate is revealed in which a strong symmetric hydrogen bond between the proton relay and the carbonyl group manifests.The signicance of detecting such intermediates is that the electronic inuences at this geometry could be potentially targeted in order to induce the system to form a real intermediate; the relative timing of e.g.C-C bond formation and proton-relays can therefore be seen as a function of structure and substituents.This feature of a potential energy surface is relatively insensitive to e.g. the density functional procedure employed.Shown in Fig. 5c is the same reaction charted using the uB97XD functional; the hidden intermediate occurs at the position in the IRC, but its prole is stronger (the gradient norm approaches zero more closely than with the B3LYP functional).
4. The kinetic isotope effects computed for this mechanism (Table 5) are very different from the one involving no intervention by water.
We therefore consider it unlikely that an explicit water molecule is actively involved in the rate-determining step for this reaction.

Hajos-Parrish mechanism
This mechanistic variation proposed by Hajos and Parrish 41 for the intramolecular aldol reaction involves an initial proton  transfer from the carboxyl group to the nitrogen of the enamine to form a zwitterion.This species can then react to form a C-C bond directly via a 6-membered ring transition state, accompanied by proton transfer from N to the carbonyl oxygen (Fig. 6a).The computed reaction IRC again reveals an entirely synchronous process, with C-C bond formation coincident with proton transfer.The free energy however (Table 3) is 20.5 kcal mol À1 higher than the isomeric Houk-List model.
Interestingly, augmentation of the Hajos and Parrish variation with a proton-relay via a water molecule results in the reaction proceeding via the Seebach-Eschenmoser model. 42ere, the carboxylic acid is not acting as a directing group, as in the Houk-List model, but rather participates directly in the addition step as an enamine carboxylate, which is the proposed key reactive intermediate.In this mechanism, approach of the electrophile occurs on the opposite face of the pyrrolidine ring to the exocyclic carboxylate group, the free carboxylate then aids   carbon-carbon bond formation via E2 elimination to the syn enamine, forming the exo-oxazolidinone.
E2 elimination and carbon-carbon bond formation are almost synchronous and these transition states are followed by two hidden intermediates (Fig. 6d and e), transfer of the proton from water to the newly formed alkoxide, accompanied by the nal proton transfer from the nitrogen to the hydroxyl.However, this mechanism increases the overall activation energy even further (40.4kcal mol À1 ).
As with the proton-relay variation, we can conclude that the Hajos-Parrish variation, via the Houk-List or Seebach-Eschenmoser model, is also not viable for this reaction.

Substituted variations
For R ¼ Ph, there is a 2.27 kcal mol À1 difference in D3 dispersion stabilisation between the lowest energy (S,R) and (S,S) transition states (Table 3), the latter being the greater.This is reected in a signicant reduction in DDG ‡ 298 between (S,R) and (S,S) from +4.59 kcal mol À1 (B3YLP/TZVP/SCRF ¼ DMSO) to +2.97 kcal mol À1 (B3LYP-D3/TZP/SCRF¼DMSO).Even a simple visual inspection of the NCI surfaces for these transition states (Table 3) reveals two regions of interaction that are present in the (S,S) transition state, where the phenyl ring lies across the cyclohexene ring, but which are absent from the (S,R) transition state, where the phenyl ring protrudes into space (Fig. 7).The rst is a p-facial interaction between the phenyl ring and an axial proton from the "chair" conformation of the cyclohexene ring, coloured as green (weakly stabilising).The second is a hydrogen bonding interaction between the ortho-proton of the phenyl ring and the oxygen of the carboxylic acid, coloured as light blue (moderately stabilising).
These interactions are likely to contribute to the increased D3-dispersion stabilisation associated with the (S,S) transition state, as D3 correction has been shown to improve the description of weak hydrogen bonds. 43Three models (Fig. 8) were devised to take advantage of these observations.
(a) 4-Piperidinone-benzaldehyde-proline.This model aims to maximise the p-facial non-covalent interaction, observed in the syn transition state for R ¼ Ph, between the proton at the 4-position of the cyclohexanone-derived enamine and the face of the aryl ring of the aldehyde.
(b) Cyclohexanone-pyrrole-2-carboxaldehyde-proline.This model aims to maximise the weak hydrogen bonding interaction, observed in the syn transition state for R ¼ Ph, between the ortho-proton of the phenyl ring and the oxygen of the carboxylic acid of the bound catalyst.
(c) Acetone-pyrrole-2-carboxaldehyde-proline.This model aims to maximise the hydrogen bonding interaction, observed in the syn transition state for R ¼ Ph, between the ortho-proton of the phenyl ring and the oxygen of the carboxylic acid of the bound catalyst and uses acetone, instead of cyclohexanone, as a less sterically demanding environment.
The striking feature of all three variations is the difference in the predicted outcome that addition of a dispersion correction has (Table 8).By model (c), with reduced steric congestion, the predicted ratio of the R/S diastereomers has inverted, but only when the dispersion correction is included.This result provides Fig. 7 NCI surface of the B3LYP+D3/TZVP/SCRF¼DMSO computed transition state for (a) (S,R) R ¼ Ph (conformation 2) (Table 3) and (b) (S,S) R ¼ Ph (conformation 2) (Table 3).a compelling reason for including such corrections.It also suggests that an inspection of the computed NCI surfaces can give rapid visual clues for designing variations on the basic system.We suggest here that such procedures should be routinely included in protocols for mechanistic exploration.

Computational procedures
The Gaussian 09 program, revision D.01 was used for all calculations excepting the BSSE evaluation, where ORCA 2.9.1 was employed, and thermal corrections were applied based on computed normal vibrational frequencies.Solvation models were using the polarizable conductor calculation model (CPCM) using smoothed reaction cavities, the geometries fully optimised, and vibrational frequencies at this geometry used to correct for thermal energies and entropy.Kinetic isotope effects were computed from the difference in thermally corrected free energies obtained from the calculated Hessian (force constant) matrix and appropriate atomic masses for the isotopes.Intrinsic reaction coordinate (IRC) evaluations were obtained using the following keyword in the Gaussian program: irc-(calcfc,recalc¼10,maxpoints¼200,maxcycle¼40,tight,cartesian, stepsize¼7,lqa) (the outcome of such calculations can be very sensitive to the imposed parameters).All calculations were routinely archived in both of two digital repositories (DSpace-SPECTRa and Figshare) and assigned a digital-object-identier (DOI).Some DOIs were contracted to a shorter form using the resource http://shortDOI.org/NCI (non-covalent-interactions) were computed using the methodology previously described. 30,31or NCI gures in tables, a Gaussian cube of the computed electron density at ne resolution was computed and converted to an NCI analysis using the resource at DOI: n5b, which is based on using the Jmol applet.The output of this process comprises a coordinate le (.xyz) and a compressed isosurface (.jvxl), which are visualised using Jmol or JSmol embedded in the interactive version of the table.For Fig. 4, data were obtained with the NCIPLOT program. 44A density cutoff of r ¼ 0.1 a.u. was applied and the pictures were created for an isosurface value of s ¼ 0.5 and colored in the [À0.03,0.03]a.u.sign (l 2 )r range.Integrations described in the ESI † were carried out with an in-house program for cubic grids with a 0.1 a.u.increments along each side and re-scaled with respect to the total cube volume.Each data table or data gure is assigned a DOI in the Figshare repository (see footnotes), all retrievable as e.g.qd8.A 3D printable full-colour model of the lowest energy transition state for R ¼ Ph can be obtained at http:// shpws.me/pstF.

Conclusions
Computational modelling of the stereochemical outcome of catalysed solution-phase reactions has a stringent criterion for accuracy, dened in a sense by the equation DDG ‡ ¼ ÀRT ln K.The predicted relative transition state energy, DDG ‡ , has to be sufficiently accurate to result in reliable values for K, the ratio of the concentrations of two stereoisomers.In practice, this means achieving a computed accuracy for DDG ‡ of <1 kcal mol À1 or less for values of DDG ‡ of between 2 and 5 kcal mol À1 .The lower limit here is typical of moderately stereoselective reactions and the latter is towards the top end of the most highly stereoselective reactions known.To do so will require a reasonable exploration of conformational space, and mechanistic variations such as implicit passive solvation, and active proton-relay interventions, and this all to be achieved using methodology which can be implemented with reasonable timeturnover on modern computational resources.We have tested an updated methodology to that used in establishing the original Houk-List model for the intermolecular aldol condensation catalysed by proline.A triple-z quality basis set is required to remove signicant basis set superposition errors associated with the smaller 6-31G(d) basis set, coupled with a dispersion energy correction to the B3LYP functional.Alternatively, more modern functionals which already include such terms can be used.The solvation treatment now possible is both more accurate and more self-consistent, since thermal corrections to the free energy are evaluated from frequencies determined within the solvation model.It is possible to explore more conformational space for the basic reaction, and to augment the mechanism with either passive or explicit intervention of small molecules such as water.
Whilst routinely achieving accuracies in DDG ‡ of <1 kcal mol À1 is perhaps not entirely achievable at present, it surely will be in just a few years' time, as the density functionals and other methodologies continue to improve.Here progress may be hindered by the relative lack of accurate experimental data determined under standard kinetic conditions.Achieving a stochastic exploration of the mechanism using molecular dynamics is perhaps rather further off for these relatively complex mechanisms.

Fig. 4
Fig. 4 NCI analysis of several conformers of the computed transition state: (a) the lowest energy isomer, (b) the highest energy isomer (c) the highest D3 correction isomer (Table3).The gradient isosurfaces (s ¼ 0.5 a.u.) are colored on a BGR scale according to the sign (l 2 )r over the range À0.03 to 0.03 a.u.

Fig. 5
Fig. 5 The computed intrinsic reaction coordinate for the proton-relay mechanism for R ¼ Ph @B3LYP+D3/TZVP/SCRF¼DMSO, showing (a) the relative energy and (b) the gradient norm.The red arrow indicates the location of the hidden intermediate.The calculation is archived at DOI: n44.(c) The IRC recomputed at @uB97XD/TZVP/SCRF¼DMSO showing the gradient norm and the more prominent hidden intermediate.The calculation is archived at DOI: n43.

Fig. 6
Fig. 6 Hajos-Parrish mechanism, R ¼ Ph (see Table 2) showing (a) the computed B3LYP+D3/TZVP/SCRF¼DMSO geometry, (b) the IRC energy profile and (c) the IRC gradient norm, DOI: n5d (d) the IRC energy profile for the Hajos-Parrish mechanism, R ¼ Ph with inclusion of one water molecule acting as a proton relay and (e) the IRC gradient norm for the proton-relay mechanism, with hidden intermediates indicated with an arrow, DOI: n5n.
Fig. 6 Hajos-Parrish mechanism, R ¼ Ph (see Table 2) showing (a) the computed B3LYP+D3/TZVP/SCRF¼DMSO geometry, (b) the IRC energy profile and (c) the IRC gradient norm, DOI: n5d (d) the IRC energy profile for the Hajos-Parrish mechanism, R ¼ Ph with inclusion of one water molecule acting as a proton relay and (e) the IRC gradient norm for the proton-relay mechanism, with hidden intermediates indicated with an arrow, DOI: n5n.

Table 2
Basis set superposition errors calculated for Houk-List transition state, R a B ¼ benzaldehyde, CP ¼ cyclohexanone-proline derived enamine.Energies in Hartree.bSuperposition error, E (SE) d kcal mol À1.An interactive version of this table is archived at DOI: qd7.

Table 4
Calculated transition state properties for R ¼ i Pr (Scheme 2) 17kcal mol À1 .bPersistent(persistent-object-identiers) for digital repository entry.cGrimme'sD3 dispersion correction,17in kcal mol À1 .An interactive version of this table is archived at DOI: qcd.

Table 5
Calculated a kinetic isotope effects at 298 B3LYP+D3/TZVP/SCRF¼DMSO, derived from the thermally corrected free energy DG 298 based on harmonic uncorrected frequencies, with appropriate isotope masses.b Reactant DOI: pn6.c d 1 on OH. d d 4 on a

Table 6
Passive explicit solvation of the Houk-List transition state, R ¼ Ph a Model comprising one additional water molecule, hydrogen bonding in various locations.b kcal mol À1 .c Persistent identier for digital repository entry.d As dened in Scheme 3.An interactive version of this table is archived at DOI: qcs. a

Table 7
Proton relay mechanism, R ¼ Ph a

Table 8
Houk-List Transition state analogues 4-Piperidinone-benzaldehyde-proline a Alternative pyramidalisation of the secondary amine denoted by prime ( 0 ).b kcal mol À1 .c Persistent identier for digital repository entry.An interactive version of this table is archived at DOI: qc4.