Fernanda Duarte, Beat Anton Amrein and Shina Caroline Lynn Kamerlin*
Uppsala University, Science for Life Laboratory (SciLifeLab), Cell and Molecular Biology, Uppsala, Sweden. E-mail: fernanda.duarte@icm.uu.se; beat.amrein@icm.uu.se; kamerlin@icm.uu.se
First published on 2nd May 2013
In recent years, it has become increasingly clear that promiscuity plays a key role in the evolution of new enzyme function. This finding has helped to elucidate fundamental aspects of molecular evolution. While there has been extensive experimental work on enzyme promiscuity, computational modeling of the chemical details of such promiscuity has traditionally fallen behind the advances in experimental studies, not least due to the nearly prohibitive computational cost involved in examining multiple substrates with multiple potential mechanisms and binding modes in atomic detail with a reasonable degree of accuracy. However, recent advances in both computational methodologies and power have allowed us to reach a stage in the field where we can start to overcome this problem, and molecular simulations can now provide accurate and efficient descriptions of complex biological systems with substantially less computational cost. This has led to significant advances in our understanding of enzyme function and evolution in a broader sense. Here, we will discuss currently available computational approaches that can allow us to probe the underlying molecular basis for enzyme specificity and selectivity, discussing the inherent strengths and weaknesses of each approach. As a case study, we will discuss recent computational work on different members of the alkaline phosphatase superfamily (AP) using a range of different approaches, showing the complementary insights they have provided. We have selected this particular superfamily, as it poses a number of significant challenges for theory, ranging from the complexity of the actual reaction mechanisms involved to the reliable modeling of the catalytic metal centers, as well as the very large system sizes. We will demonstrate that, through current advances in methodologies, computational tools can provide significant insight into the molecular basis for catalytic promiscuity, and, therefore, in turn, the mechanisms of protein functional evolution.
In 1976, Jensen7 and later O'Brien and Herschlag8 posited that enzyme promiscuity, i.e. the ability of many enzymes to catalyze the turnover of multiple substrates, plays a key role in the evolution of new function. The past two and a half decades have seen substantial progress in both experimental and theoretical studies6,8–26 that aim to rationalize the origin of such promiscuity, as well as illustrate it's applicability in enzyme design. However, addressing the precise origins of enzyme multifunctionality (and therefore by extension it's role in protein evolution) is far from trivial. This is due to the sheer complexity of the problem, which spans from the need to be able to, on the one hand, not just understand the topology of relevant fitness landscapes27,28 and how this would be perturbed by mutations, but also understand the precise evolutionary role of, for instance, protein–protein interactions29 and protein conformational diversity,30,31 as well as the fine details of the chemical step in enzyme catalysis (which is a topic of significant debate, as can be seen from the discussion in e.g., ref. 32 and 33 and references cited therein).
The advent of techniques such as error-prone PCR34 has played an important role in laboratory evolution, allowing protein engineers to artificially mimic the process of natural Darwinian evolution in vitro, in order to iteratively refine proteins for desired properties35 such as a specific function or better thermostability. Such approaches also provide valuable insight into how actual proteins evolve.36 That is, through artificially mimicking the process of natural evolution, it is possible to better understand the constraints that determine and limit the evolution of function, as well as constructing putative evolutionary trajectories between modern and ancestral or progenitor-like enzymes (see discussion in ref. 36). Similarly, there have been impressive advances using bioinformatics and machine-learning based approaches in order to predict promiscuous activities,37,38 reconstruct protein evolutionary trajectories,28,39 and resurrect ancestral proteins.40,41 However, computationally addressing this problem at the chemical level poses a significant challenge, due to the tremendous computational cost involved in examining not just native but also promiscuous activities involving multiple substrates with many potential binding modes (that can change upon mutations), as well as the large-scale effect of mutations. As a result of these combined advances in both experimental and theoretical approaches, there has been an explosion of interest in studies of catalytic promiscuity in the literature (Fig. 1).
![]() | ||
Fig. 1 Illustrating the exploding popularity of studies on catalytic promiscuity in the literature. This plot highlights the number of citations to an article with the words “moonlighting” or “promiscuity” in the title, in the period spanning the years 1976–2012. Citation data obtained from Web of Knowledge (http://www.isiknowledge.com). |
In the present perspective, we will expand on this idea, and outline the fact that computational power has, in fact, reached a stage where it is finally possible to examine enzymatic catalytic activity for multiple substrates and potential mechanisms, as well as the effect of large numbers of mutations on each of these substrates and mechanisms at the atomic level. This will finally allow us to understand the precise molecular basis for observed multi-functionality in catalytically promiscuous enzymes, and, through the insights this provides, aid us in the artificial engineering of new enzyme functionality. Such computational studies can then also be extended to studying and predicting evolutionary trajectories, as well as rationalizing and guiding laboratory evolution studies. If this is done in a systematic way through an enzyme superfamily, it will allow for the creation of a “roadmap” for the structural and electrostatic contributions to functional evolution within that superfamily.
In the present work, we will begin by outlining the role of catalytic promiscuity in protein evolution. Following from this, we will provide a brief overview of recent advances in relevant computational approaches, comparing the inherent strengths and weaknesses of each of them. Specifically, we will demonstrate that, while individual approaches may have their own specific traps and pitfalls, when selected carefully and in combination, computational tools can be extremely powerful in rationalizing chemical effects in complex biological systems. To illustrate this point, we will present as a case study computational work on different members of the alkaline phosphatase (AP) superfamily by both ourselves and other workers in the field, showing the complementary insights theory can provide, which could not be obtained by experiment alone (although experimental data are critical for providing actual physical observables). The AP superfamily has been a topic of significant research interest in recent years, since its members are not only highly promiscuous, but also, selectivity and specificity patterns within this superfamily are particularly well-defined.14 That is, there is a wealth of both kinetic and structural data available in the literature due to a large body of experimental work on these systems.14,42–60 Finally, to conclude, we will discuss future perspectives in the field, in line with the increasing role of computational approaches in rationalizing protein evolution.
![]() | ||
Fig. 2 Schematic overview of the classification of different kinds of promiscuity, as presented in the main text. |
Obviously, none of the classifications listed above is absolute, and both the manifestations of promiscuity as well as the level at which it occurs are complementary aspects of the same phenomenon. However, we have raised these examples here in order to introduce the reader to the semantic complexity of the field. During the last few decades, a number of detailed reviews have discussed various aspects of the phenomenon of promiscuity, including mechanistic issues,15 evolutionary aspects,11,64 and its role in protein design.10,63,65 For the purposes of the present work, our focus will specifically be on catalytic promiscuity. Here, we will focus on a slightly different aspect of the field, namely recent advances in computational methodologies that can probe the underlying basis for catalytic promiscuity at the atomic level, as well as the important role they can play in understanding protein functional evolution.
As already discussed, catalytic promiscuity has been suggested to play an important role in the evolution of new enzymes through divergent evolution.8 Jensen's original hypothesis7 suggested that primitive enzymes displayed low activities and very broad specificities. Over time, evolutionary pressure caused them to divergently evolve in order to acquire higher specificities and activities (Fig. 3). However, and as is clear from ongoing experimental studies today (e.g.ref. 2, 8, 11–16, 22, 23, 58, 62 and 81–83), some of these enzymes appear to have retained varying levels of the promiscuous activities of their generalist progenitors.15 Therefore, as outlined in Fig. 3, one could use this principle and perform “retroevolution” back towards a generalist progenitor or progenitor-like enzyme, and use this as a trampoline for re-specialization towards new functionality.11 This approach has recently been discussed by Tawfik and coworkers.2,15 Using in vitro evolution they have demonstrated that the evolution of a new function can be driven by mutations that have little effect on the native function, but large effects on the promiscuous functions.15
![]() | ||
Fig. 3 Schematic representation of Jensen's hypothesis for the evolution of enzyme function7 (A). According to this hypothesis primitive enzymes, which displayed low activities and broad specificities (denoted by lowercase a, b, c, d), have, once submitted to evolutionary pressure, divergently evolved in order to acquire higher specificities and (sometimes completely new) activities (denoted by upper case letters, e.g. B, D, E). However, they have retained low levels of their original promiscuous activities. This can in turn be exploited in artificial enzyme design (B). That is, direct switches of specificity, e.g., from A to E are rare. However, in the case of a promiscuous enzyme, one could perform “retroevolution” back towards a generalist enzyme, and use this as a trampoline for re-specialization towards new functionality. This figure is adapted from ref. 15. |
As we will illustrate in this Perspective, computational approaches provide a unique opportunity for reaching a better understanding of the origins of promiscuity. For example, at the molecular level, structure-based methods, docking approaches and mechanistic analysis can be used in order to reach a greater understanding of the features controlling enzyme catalysis and determining specificity patterns, the possible mechanisms involved, and the prediction of suitable starting points for experimental evolution.84,85 At the superfamily level, data analysis86 and sequence-based methods can be used for the study of evolutionary relationships within large protein families.37,87
In the present perspective, we will discuss the recent work of both our group and others in the field to model promiscuity in highly multifunctional enzymes. We will demonstrate that computational power has reached a stage where theory can play a substantial role not only in rationalizing experimental observables, but also in playing an active role in predicting evolutionary trajectories. This, by extension, will also ultimately play an important role in artificial enzyme design.
![]() | ||
Fig. 4 The increasing performance of (super)computers in Flops (Floating-point operations per second) (orange), Flops per core (red), and number of cores (blue) from the 1960s to the present day. Note, that Flops as performance criteria only help to have a reference between different computers, and also, that the here presented supercomputers are only a representative subset for illustration purposes. The data was collected from ref. 88 and from www.top500.org. |
Additionally, describing the surrounding environment using implicit solvent (typically) saves substantial computational time. However, although there are many advantages to such models, several limitations are also present in this approach. For example, the assumption that chemical changes involved in the reaction are confined to a relatively small region of the system can in many cases be an oversimplification, particularly as long-range electrostatic interactions play an important role in enzyme catalysis.103,104 This issue was observed in the (otherwise elegant) study of the catalytic reaction of the Ras-GAP complex105 (to name one example), where, due to incomplete electrostatic (and thus pKa) treatments in a limited enzyme model, an incorrect residue was suggested as a general base in the reaction. We would also like to refer the reader to the discussion about the relative advantages and challenges of cluster models (which allow accurate local energy minimization in a small region), and QM/MM studies, which provide an improved description of the coupling to the protein, but only allow for limited sampling, see e.g.ref. 106 and 107. Furthermore, neither conformational sampling (required in order to obtain meaningful convergent results that are not dependent on the precise starting structure used108) nor entropy effects (which are usually neglected because it is difficult to predict them in the harmonic approximation109) are currently included in this approach. Finally, the choice of reacting subsystem can substantially affect the outcome of the calculations.110,111 Despite these challenges, when used with care and with detailed chemical knowledge of the system under study, cluster models can provide useful insights and detailed information of the fundamental chemistry as recently discussed by Ramos and coworkers.100 Particularly, cluster models provide a fast effective way to perform initial tests of the viability of different mechanistic options.
QM/MM approaches (in their different implementations) have become one of the most popular approaches for the study of enzymatic reaction, as they have the advantage of improving the description of the enzyme environment and its contribution to the catalytic process (compared to QM-only approaches using a limited description of the system of interest). However, QM/MM approaches have also been demonstrated to have several limitations. One of the main limitations of these is the large computational cost required for the repeated evaluation of the energies and forces in the QM region, which, by extension, results in limited configurational sampling during the simulation. This is particularly challenging in cases where the system involves a rugged multidimensional landscape,117 as, without proper conformational sampling, one ends up trapped in local minima and different starting conformations can give completely different results (see also discussion in ref. 108). Important advances to resolve this problem have been achieved by means of specialized approaches, such as using a classical potential as a reference for the QM/MM calculations,118–121 or through other strategies, such as the QM/MM free-energy perturbation (FEP) scheme combined with optimized chain-of-replicas95,113 or QM/MM interpolated correction methodologies.122
Among the wide variety of approaches available to study enzymes, the one that we choose to use in the majority of our work is the empirical valence bond (EVB) approach of Warshel and coworkers.115,123 As the name suggests, this is a QM/MM approach based on valence bond (bond description) rather than molecular orbital (atomic description) theory. Its major advantages are that it is, on the one hand, fast enough to perform the extensive conformational sampling required to obtain convergent free energies, while, at the same time, it carries enough chemical information to be able to describe bond making/breaking processes in a physically meaningful way.115,123 Finally, inherent to the philosophy of the EVB approach is the use of the energy gap reaction coordinate.123,124 The power of this reaction coordinate comes from the fact that, rather than being a geometric coordinate, it is simply the energy gap between different diabatic (valence bond) states involved in the reaction process, and, as such, allows one to take into account the entire multidimensional nature of the relevant process as well as environmental reorganization without the need to apply external restraints.125,126 This choice of reaction coordinate also allows for much faster convergence in free energy calculations, compared to other currently popular approaches.127
In addition to long established approaches such as the EVB, there have been several interesting developments in this area, which we would like to summarize here. For example, transition path sampling128 (which is a Monte-Carlo based rare event sampling approach) has been successfully combined with QM/MM calculations in order to study a range of systems, including human purine nucleotide dephosphorylase129 and chorismate mutase.130 QM/MM calculations can also be combined with energy minimization across approximate reaction coordinates to obtain the potential energy surface, in an “adiabatic mapping” approach, that has been successfully applied to a range of enzymatic systems.131–134 Another alternative that has been successfully used to estimate the free energy profiles of enzymatic reactions19,135–137 is the combination of QM/MM calculations and molecular dynamics simulations, through the application of umbrella sampling and the weighted histogram analysis method (WHAM).138,139 A final recent development we would like to present in order to conclude this section is the combined quantum mechanical/discrete molecular dynamics (QM/DMD) approach of Alexandrova and coworkers.140 This approach has been specifically developed for the study of metalloenzymes, and combined the accuracy of QM approaches with extensive sampling of the surroundings using DMD, which has promise to substantially increase the simulation time available to ab inito dynamics of metalloenzymes.
To conclude this section we will refer to the pure use of classical approaches, such as molecular dynamics, in the study of biological systems. These techniques have been one of the most important computational techniques in the study of complex systems, providing important insight into protein mechanics,141 structural-dynamics of proteins,112,142 and features involved in the binding of substrates,143 to name just a few examples. However, as such approaches describe atoms and bonds in a more simplified way,144 they cannot be used to explore reaction mechanisms, which requires the making and breaking of chemical bonds. As will be seen in the coming sections, thanks to increasing computational power, QM-only and hybrid QM/MM approaches have allowed us to overcome this limitation, investigate the mechanisms of even complex enzyme-catalyzed reactions, and obtain important information about the fundamental chemistry involved in these processes. In addition to this, the use of approaches such as the linear response approximation as well as a novel screening approaches based either on the analysis of electrostatic group contribution or the more rigorous linear response approximation (LRA/β) approach145,146 allows us to identify and assign the specific contribution of individual residues to the chemical step and transition state stabilization.147,148 This, in turn, provides a molecular view of enzyme catalysis that can be used for driving artificial protein evolution and artificial enzyme design.
Tying in with this, the specificity and promiscuity of the individual members of this superfamily is well-defined,14 with members showing not just extensive promiscuity, but also cross-promiscuity, in that the native reaction of one member of this superfamily is often a promiscuous reaction in another (Fig. 5). Therefore, by carefully mapping the structural and electrostatic features linked to selectivity across this superfamily, one can potentially obtain significant insight into the factors dictating differences in functional evolution between superfamily members. The second reason this superfamily is particularly interesting to us as a model system is the inherent challenges in studying the specific reactions involved, which will be discussed in greater detail in Section 4.2.1.
![]() | ||
Fig. 5 Members of the alkaline phosphatase (AP) superfamily have a tendency towards “cross-promiscuity”, where the native substrate for one enzyme is a promiscuous substrate for another. This figure illustrates the native and promiscuous activities of four different members of the alkaline phosphatase superfamily, specifically alkaline phosphatase (AP), arylsulfatases (PS), nucleotide pyrophosphatase/phosphodiesterase (NPP) and a phosphonate monoester hydrolases (PMH). The substrate shown within each circle represents the native substrate for the enzyme, while the colored lines indicate the relevant promiscuous activities. Additionally, PMHs have been shown to also hydrolyse phosphotriesters and sulfonate monoesters, activities not observed in other members of the superfamily. This figure is adapted from ref. 22. |
![]() | ||
Fig. 6 A comparison of the active site architectures of a number of catalytically promiscuous members of the AP superfamily. The upper half illustrates the bimetallic enzymes, (A) alkaline phosphatase (AP) and (B) nucleotide pyrophosphatase/phosphodiesterase (NPP). The lower half illustrates the active sites of (C) Pseudomonas aeruginosa arylsulfatase (AS) and (D) phosphonate monoester hydrolase (PMH). The structures were generated from the PDB files 1ED957 (A), 2GSN160 (B), 1HDH55 (C) and 2VQR161 (D), respectively. |
A highly related member of this superfamily is the nucleotide pyrophosphatase/phosphodiesterase (NPP),47 which preferentially hydrolyzes phosphate diesters. The enzyme has low sequence identity (8%) with AP,47 however it possesses a strongly similar active site. For example, both enzymes contain a bimetallic zinc center, six conserved metal ligands (three aspartic acids and three histidines), and a threonine positioned in a manner analogous to that of a serine residue in AP (see Fig. 6(B)), which makes it difficult to understand the different specificity (primary phosphodiesterase activity and secondary phosphomonoesterase and sulfatase activities) compared to AP (see e.g.ref. 16 as an example of work that aims to address this challenging issue).
An overview of the active site of PAS is presented in Fig. 6(C), for comparison to other members of the superfamily such as AP and NPP. As can be seen from this figure, while there are a number of conserved features in the different active sites, there are also a number of significant differences between them. Most notable of these is the fact that the PAS active site is now mononuclear comprising a single Ca2+ cation rather than a dinuclear transition metal center,161 as well as the presence of the unusual formylglycine nucleophile common to all sulfatases.161,167 That is, a quirk that is common to all sulfatases is the fact that, as a nucleophile, they utilize either a cysteine168 or serine169 that is post-translationally modified to give an aldehyde and then hydrated to give a geminal diol (steps I to II of Fig. 7, which shows an overview of the catalytic mechanism of this enzyme). What is particularly remarkable about this enzyme is the comparatively low discrimination it shows for its different substrates,12,13 which extends to the fact that its promiscuous diesterase activity can almost compete with its native sulfatase activity (for the small model compounds used in the experimental studies).13 The proposed mechanism for the native sulfatase activity of PAS involves the attack of a water molecule on an aldehyde to form the corresponding geminal diol, followed by a nucleophilic attack on the sulfate with concomitant leaving group departure, and the subsequent hemiacetal cleavage to regenerate the geminal diol (Fig. 7).13,161 As illustrated in Fig. 7, an important part of the catalytic mechanism involves the initial deprotonation of the resulting geminal diol (FGly51), two possible candidates have been proposed to act as bases, and on the basis of the crystal structure the nearby metal-coordinated aspartate (Asp317) was proposed.161 More recently, in a revised mechanism, we have proposed that it is one of the histidines that acts as a base in the native reaction (but not in the promiscuous reactions).20,21
![]() | ||
Fig. 7 Our proposed revised mechanisms21 for (A) sulfate monoester hydrolysis and (B) phosphate ester hydrolysis by Pseudomonas aeruginosa arylsulfatase. In the case of the sulfatase activity, we propose that the sulfuryl group transfer proceeds through a histidine-as-base (His115) mechanism to activate the geminal diol that acts as a nucleophile. In the case of the phosphatase activity, we propose instead that the substrate itself can act as a base to deprotonate the nucleophile. Note that while we have only illustrated the case of a phosphate monoester (B), we also obtained similar results to this for phosphate diesters.21 This figure is modified from ref. 21. |
Additionally, other phosphatases from outside the AP superfamily also share many of the active site features found in AP superfamily, suggesting these features may be general for the capacity often observed in enzymes that catalyze phosphoryl transfer.22 Some examples of this include protein phosphatase-1 (PP1),172 a native phosphate monoesterase which also catalyzes phosphonate monoester hydrolysis; glycerophosphodiesterase (GpdQ),173 a diesterase that also catalyzes a series of phosphonate monoesters which are the hydrolysis products of the highly toxic organophosphonate nerve agents, sarin, soman, GF, VX, and rVX;174 and phosphotriesterase (PTE),175 which in addition to its native activity also catalyzes phosphodiesters and phosphonates, including organophosphate pesticides and military nerve agents. Note that, similarly to AP/NPP, each of these enzymes contain two metal ions in their active sites, although again the identity of these metal ions is varied depending on the enzyme, and includes: Zn2+ and Co2+ ions in GpdQ, two Zn2+ ions in PTE (although these metal ions can be replaced with Co2+, Ni2+, Mn2+, or Cd2+ with full retention of catalytic activity175), and two Mn2+ ions in PP-1 (although these ions could also correspond to Fe2+, and/or Co2+).176
Despite the ubiquitous role of metals in proteins, and in particular their potential for the development of new enzymatic functions, many challenges remain in the modeling of such systems, which include among other aspects the lack of parameters (or even protocols) in the current force fields and technical problems associated with the stability of such systems184,185 (although this is a non-trivial problem for quantum-chemical approaches to address as well185,186). Currently, a number of solutions have been suggested to model metal atoms and their interaction with the protein environment. The three most common approaches are the use of a hard sphere model,187 a covalent bond approach188,189 and a dummy-model approach.185,190–193 The simplest approach is the non-bonded or hard sphere model, in which the metal ligand interactions are simply described through electrostatic and van del Waals parameters. This approach has been highly successful for describing alkali and alkaline-earth ions, but can prove to be challenging for systems having either multinuclear centers with closely located metal ions at the active site185 or for the correct treatment of transition metals.187,190 On the other side, covalent or bounded approaches include defined covalent bonds between the metal and ligands, and, while overall useful, such a model will be highly system-dependent and therefore difficult to transfer to other systems.194 Additionally, the use of explicit (or partial) covalent bonds precludes the study of the effects of ligand exchange around the metal.
An alternative to both these sets of problems is the use of the dummy model approach185,190 (Fig. 8). In this approach, the metal center is described by a set of cationic dummy atoms placed around the metal nucleus, encouraging a specific coordination geometry on the metal center (note, however, that as this is a non-bonded model, the dummy model retains the flexibility to change ligand coordination, as was seen for e.g.ref. 195). Models for divalent Mn,190 Mg185 and Zn195,196 have been reported, which show a stable coordination sphere without the need of any additional constraint or restrains. A particular advantage of this model is the fact that, by delocalizing charge away from the metal center, this in turn reduces the repulsion between two metal centers, and makes it easier to maintain correct crystallographic geometries without the need for artificial constraints (see e.g.ref. 185, 189). Additionally, these models have been able to reproduce experimental data for catalytic effects of metal substitution with high accuracy.190 Following from this, Section 5 will discuss recent studies that illustrate the challenges involved in the correct treatment of metal centers.
![]() | ||
Fig. 8 (A) Schematic representation of the dummy model. Shown here is a system with octahedral coordination, however, in principle, the model can be parameterized for any coordination sphere by adjusting the relevant positions and the number of dummy atoms. (B) Representative active site of a phosphonate monoester hydrolase (PDB ID 2VQR55), where the active site metal has been replaced by an octahedral dummy model to represent the catalytic Mn2+ ion. The central atom and the dummy atoms are shown in grey and white, respectively, and the surrounding ligands have been highlighted to show the metal coordination. |
An alternative for modeling of phosphorous/sulfur containing molecules is the use of semi-empirical methods such as the AM1/d114 (AM1 formulation with d-orbital extension) method or the empirical valence bond approach of Warshel and coworkers206 (which is a reactive forcefield and therefore not dependent on the orbital description). The AM1/d implementation has been specially parameterized to a combination of high-level DFT calculations and experimental data, with a particular focus on H, O and P atoms. The main advantage of this implementation is that it simultaneously allows for greater conformational sampling along the reaction coordinate than would be viable using a higher level QM approach, while at the same time providing a better description of the solvation effects and of the central phosphorus atom than that currently typically provided by other conventional semi-empirical approaches. Additionally, the empirical valence bond approach, has been rigorously parameterized to reproduce experimental data, and has provided reliable quantitative results when modeling phosphoryl group transfer reactions, as has been seen for numerous systems (see e.g.ref. 20, 21, 190 and 207–209 as well as systems discussed in ref. 103 and references cited therein).
![]() | ||
Fig. 9 Generalized potential pathways for phosphate monoester hydrolysis, using the illustrative example of hydroxide attack on a phosphate monoester monoanion (we have chosen to show hydroxide rather than water as the nucleophile here to avoid any controversy with regard to proton positions at the transition state). Shown here are stepwise (A) dissociative, (B) associative, and (C) concerted mechanisms. Note that, while we have only shown inline pathways in this figure (nucleophile attacks from the opposite face as the departing leaving group), all pathways can also potentially proceed through corresponding non-inline mechanisms (nucleophile attacks from the same face as the departing leaving group with pseudo-rotation around the phosphorus center). Additionally, the concerted mechanisms can be associative or dissociative in nature, depending on the relative degrees of bond formation and cleavage at the transition state. |
A key feature to come out of these studies pertains to the nature of the transition state of the enzyme catalyzed reaction, which, in all cases, appears to be quite dissociative. Additionally, in the cases where the background reaction was also studied, the enzymatic transition state appears to be substantially more dissociative than its solution counterpart.16–18 In the case of phosphate monoester hydrolysis,17 a dissociative transition state would apparently be in line with the traditional interpretation of the experimentally observed linear free energy relationship (LFER) for the hydrolysis of this class of substrate in aqueous solution (see ref. 214 and references cited therein, although note that this interpretation is controversial,213 as discussed below). It would also appear to agree with arguments that electrostatic interactions with positively charged groups in the AP active site do not tighten the transition state compared to the corresponding reaction in aqueous solution,215 a conclusion that was again drawn based on the fact that similar Brønsted coefficients are observed when comparing LFER for the hydrolysis of phosphate monoester. The challenge with these empirical conclusions, however, is that not only is the qualitative interpretation of LFER exceedingly complex, particularly in the case of enzyme catalyzed reactions,209,213 but also both associative and dissociative transition states can give rise to similar LFER.210 Additionally, in the case of the spontaneous hydrolysis of phosphate monoesters, we have demonstrated that an associative pathway is as viable as a dissociative one.212,216 In fact, the preferred pathway appears to rather be dependent on the nature of the leaving group,209 with the system preferring an associative mechanism with basic leaving groups, that becomes gradually more dissociative as the leaving group becomes more acidic.
Now in this particular case, the nucleophiles for the reactions catalyzed by AP and NPP are an ionized serine and threonine, respectively, and therefore one would expect a looser transition state, due to charge–charge repulsion between the incoming nucleophile and the charged substrate (this effect appears to be particularly pronounced in the case of the alkaline hydrolysis of dianionic phosphate monoesters217,218). However, in the enzymatic reaction, this negative charge repulsion is being shielded by not just the catalytic metal centers, but, in the case of AP, also a nearby positively charged arginine.161 It has been argued that in NPP18 and AP,16,17 this is possible because the active site stabilizes the charge distribution of the dissociative transition state. However, one would expect so much positive charge in the presence of a reaction involving charged species to, if anything, tighten the transition state (TS), as it reduces the charge repulsion between the nucleophile and the substrate allowing them to come closer together at the TS. Such a tightening of the transition state has been theoretically observed in similar enzymes,20,21,208,209 as well as both experimentally and theoretically in model systems.219,220 From our work, it appears that a single metal ion is sufficient to render the transition state substantially more associative.219,221 We would also like to point the readers to another recent computational study of phosphodiester hydrolysis by both APP and NPP,19 which employed a specialized implementation of density functional theory222 specially parameterized for phosphate hydrolysis223 (SCC-DFTBPR), found significant tightening of the transition state for both enzymes. Specifically, the transition state for the hydrolysis of methyl-p-nitrophenyl phosphate was found to go from P–O distances of 2.43 and 2.23 to the nucleophile and leaving group, respectively, to ∼2.0 and 1.8–1.9 Å for the same two distances in the enzyme active sites.19 Similarly, another recent QM/MM study of phosphate monoester hydrolysis by the human placental alkaline phosphatase (PLAP) found an associative pathway proceeding through a phosphorane intermediate.224
To try to understand the discrepancy between these studies, it is useful to examine the structures for the dissociative transition states and intermediates provided in ref. 16–18. That is, a striking feature of these studies is the geometry changes of the Zn2+ sites during the process, in one case reaching the unexpectedly long Zn–Zn distance of as high as 7 Å in the transition state,17,18 as compared to 4.1 Å in the crystal structures.56 This is surprising in light of the fact that Zn2+ cations are known for having particularly tight coordination.225,226 This large distance has been commented on other groups than us,19 and, in particular, a recent study combined EXAFS and X-ray crystallography to demonstrate that the binuclear Zn2+ motif remains fairly stable in both AP and NPP during the course of the chemical reaction step.54 Our interest in the very large metal separation observed, however, comes from a methodological point of view, as we routinely work with metalloenzymes in our group. That is, correct modeling of metal centers, regardless of the level of theory used, is extremely challenging, and this problem is only aggravated when transition metals are included in the system.194 Additionally, a known problem when modeling multinuclear metal centers is that excessive repulsion between the metal centers can cause the metal ions to “fly away” from each other,185,192 as appears to be observed in ref. 16–18. Similarly, particularly in classical models, maintaining correct coordination during the course of the simulation poses it's own challenges.227
A number of solutions have been used to address this issue, none of which are completely satisfactory, however, all of which mitigate the problem to some extent. For example, in cases where the role of metal ions is purely structural, correct coordination can be maintained by using either full or partial bonds to the surrounding ligands,189,228 although such a model does not allow for ligand exchange.189 Alternately, some workers try to address this issue by using a non-bonded model in which medium-to-strong constraints are placed on the metal center and possibly also the surrounding ligands, in order to keep them in place during the simulation.229 Yet another alternative which sidesteps some of these problems is the dummy model185,190 presented in Section 4. In our experience of working with metalloenzymes, metal ions moving dramatically during the course of a simulation are usually the result of incorrect electrostatic treatments, which was also commented on in ref. 19.
In any case, the interesting issue here is the fact that this unusual behavior of metal ions appears to be dependent on the size of the QM region used. That is, in an AM1/d study of phosphate monoester hydrolysis by AP, three different QM models were used,17 which have been highlighted progressively using different colours in Fig. 10. In the first two models, either the Zn2+ cations were not included in the QM region at all, or only the Zn2+ cations (without the surrounding ligands) were included in the QM region. In both these cases, the binuclear zinc center was stable during the simulation, giving distances that were also in good agreement with higher-level DFT calculations. However, in the third case, the authors used a larger QM region, that included two of the Zn2+ metals as well as the surrounding residues, at which point this large repulsion between the metal centers was introduced. What is noteworthy here is that this increase in distance was not caused by the two metal centers being pushed away from each other, but rather, Zn1 apparently remained relatively stable, whereas Zn2 was pushed away from Zn1 (for numbering, see Fig. 10). This is unusual, because if this is the case, then Zn2 is being pushed directly towards the third metal center (Mg2+), which should not happen due to large charge–charge repulsion (the distance between Zn2 and the third magnesium ion is 4.7 Å in the relevant crystal structure used for this study17). Additionally, as can be seen from Fig. 6, Zn2 and the active site Mg are bridged together by the carboxylate sidechain of Asp51. It is possible that, if only the two Zn2+ and coordinating residues, but not the Mg2+ are included in the QM region, this could create potential problems. However, this discussion is specific to AP, and the authors observed a similar effect in NPP,16,18 and also in the bacterial phosphotriesterase, PTE.155
![]() | ||
Fig. 10 Definition of the three different QM regions used by López-Canut and coworkers17 in their QM/MM modeling of phosphate monoester hydrolysis by alkaline phosphatase. QM1 includes only the reacting system (in red). QM2 adds the zinc atoms (in green). QM3 incorporates the coordination shells of these two atoms and also Arg166 and Lys328 (in blue). This figure is adapted from ref. 17. |
Therefore, this raises a number of key questions: (1) is this inter-metal separation indeed real, or a simulation artifact due to improper treatment of the metal centers by the approach used? This is important to establish, as the dissociative transition states proposed in ref. 16–18 are dependent on this large inter-metal separation, which does not appear to be supported by experimental work.54 Tying in with this (2) considering that this large separation only occurs upon increasing the size of the QM region to include the metal centers and surrounding residues,17 what would happen if the QM region were extended even further to include the third metal center in AP or an even larger QM region for the other systems examined? That is, although it could be tempting to argue that the large internuclear separation is simply a problem with the treatment of the metal centers themselves, this large internuclear separation only seemed to appear once a very large QM region was included. Here, as long as the treatment was limited to just the reacting atoms and the dinuclear metal center, the system appeared to remain reasonably stable. Additionally, while transition metals are in general challenging to model, part of the problems should be mitigated by the d-orbital description included in the AM1/d approach. Therefore, it appears that substantially more validation (either by testing an even larger QM region or comparison to other approaches,19 or ideally both) is required to provide a definitive answer in either direction, however, we believe that these important works16–18 simultaneously provide an elegant example of both the power of computational approaches and the insight they can provide, as well as the significant challenges that still remain in the field.
In order to establish the molecular basis for the observed promiscuity, we started by exploring the fundamental reactions catalyzed by this enzyme in the absence of a catalyst. Specifically, we performed a detailed theoretical comparison of the hydrolyses of the p-nitrophenyl phosphate and sulfate monoesters216 (Fig. 11), which are prototype reactions for each class of compound respectively.232,233 These reactions have been observed to have similar rate constants234,235 as well as similar experimentally observed isotope effects,235,236 and have therefore been considered to have virtually identical transition states.233 The only anomaly is the large difference in experimentally measured activation entropies, which is +3.5 e.u. for phosphate monoester hydrolysis and −18.5 e.u. for sulfate monoester hydrolysis.233 To address this issue,216 we performed detailed DFT calculations on both reactions, generating the relevant free energy landscapes for the hydrolysis in each case, and using these to obtain the transition states highlighted in Fig. 11. In doing so, we were able to not only reproduce the virtually identical activation energies and isotope effects (within reasonable deviation from experiment), but also reproduce the large discrepancy observed in the activation entropies. Despite this, there were a number of significant differences in the transition states. That is, while the hydrolysis of the phosphate monoester was found to proceed via a compact, associative transition state, with proton transfer to the substrate216 in analogy to a large number of other systems,208,209,212,237–239 the hydrolysis of the corresponding sulfate monoester was found to be far more dissociative, with no deprotonation of the nucleophile at the transition state (note that sulfate monoesters are substantially more acidic than their phosphate counterparts). In addition to the difference in the geometries and protonation patterns of the transition states involved, we also found that the hydrolysis of the highly charged p-nitrophenyl phosphate dianion is solvent destabilized, whereas that of the corresponding phosphate monoanion is slightly solvent stabilized.216 While these differences are apparently trivial in aqueous solution, giving rise to very similar experimental observables, they are substantial in the fine-tuned environment of an enzyme active site, giving rise to the question of just how such diverse reactions can be catalyzed by the same enzyme in the first place.
![]() | ||
Fig. 11 Comparing transition state structures for water attack on (A) p-nitrophenyl phosphate and (B) p-nitrophenyl sulfate. In both cases, the system was examined by generating 2-D energy surfaces. In the case of the phosphate, it was then possible to obtain an unconstrained transition state through direct transition state optimization of the approximate structure from the surface. This was not possible for the corresponding sulfate, so only the approximate transition state is shown here. Note the difference in the proton position, with the hydrolysis of p-nitrophenyl phosphate proceeding with protonation of the phosphate at the transition state, whereas no proton transfer has occurred in the corresponding reaction of p-nitrophenyl sulfate. All distances are in Å. This figure is based on the coordinates provided in the Supporting Information of ref. 216. |
To address this issue, we performed detailed EVB studies20,21 comparing the hydrolysis of p-nitrophenyl sulfate, ethyl-p-nitrophenyl phosphate, bis-p-nitrophenyl phosphate, and p-nitrophenyl phosphate by PAS. As the EVB approach requires the user to define the relevant diabatic states, we explored multiple potential mechanisms, in order to eliminate energetically unfavorable alternatives. As the selectivity is likely to be determined during the group transfer, we focused only on Step II to III of Fig. 7, and did not, at this stage, model the subsequent hemiacetal cleavage. In the case of the phosphate monoester and diesters, we found that the preferred mechanism is a substrate-as-base mechanism, in which the nucleophile is activated by the substrate itself, in analogy to a number of other systems including Ras GTPase208,209,237 and EF-Tu,240,241 to name a few examples.
As mentioned above and observed in ref. 216, as sulfate esters are far more acidic, such a mechanism is not available for the native reaction as the sulfate will be a terrible proton acceptor. Here, the preferred mechanism appears instead to be one in which a nearby (more basic) histidine is utilized as a general base to activate the nucleophile.21 We also explored the suggestion that a metal-coordinated Asp adjacent to the nucleophile can act as a base,157,161 but found this mechanism to have very unfavorable energetics, perhaps unsurprisingly in light of the low pKa of a metal-bound aspartate, as well as the fact that this residue plays a clear structural role and protonating it will be detrimental to the stability of the metal coordination. Additionally, we found no need for acid catalysis to protonate the departing leaving group, however, in this particular case, one is dealing with a very good leaving group (p-nitrophenol) and this may not be the case for more basic leaving groups such as e.g. phenol (to name one example). From the results of this work,20,21,216 we proposed the revised mechanism shown in Fig. 7(B). Fig. 12 presents a comparison of representative transition states for the different substrates we have studied, and Table 1 presents a comparison of the corresponding distances in enzyme and in aqueous solution, based on data originally presented in ref. 21. We believe that the switch from a substrate-as-base mechanism to needing a general base when moving between the phosphatase and sulfatase activities is significant from an evolutionary perspective, as it puts additional pressure on the enzyme to acquire a general base, which is not necessary for the phosphatase activity. This suggests that the promiscuous activities observed in PAS are merely chemically “simplified” versions of the native sulfatase activity, and also provide one possible reason for the fact that sulfate esters are far less commonly used by Nature than their phosphate counterparts (see also discussion in ref. 213). However, our work20,21 led to a number of other unexpected observations. The first is with respect to the nature of the transition state for the sulfuryl transfer reaction catalyzed by PAS. That is, as can be seen in Table 1, while the presence of even a single metal ion tightens all transition states with respect to their solution counterparts, this effect is now most pronounced in the case of the sulfate ester which becomes even more associative in the enzyme than any of the phosphate substrates. Additionally, even when we tried to force a more dissociative mechanism for the sulfate ester in aqueous solution by adjusting the relevant forcefield parameters, once moving back to the enzyme (using the same parameter set as for the background reaction), the enzyme tried to substantially tighten the transition state, giving a reaction that was slightly less energetically favorable than the fully associative transition state shown in Fig. 12(A) and Table 1.21 In contrast to this, the substrate towards which PAS shows the next most proficient activity is the bis-p-nitrophenyl phosphate diester, which, in the PAS active site, now has the least associative transition state of all substrates studied in our work. When combined with the fact that PAS shows significantly greater proficiency towards monoanionic than dianionic substrates,13 this strongly points towards a role for electrostatics in driving the promiscuity. This was verified by overlaying the electrostatic contribution of different active site residues to the calculated activation barrier for the hydrolysis of each substrate (Fig. 13), which shows almost perfect qualitative overlap, with quantitative differences reflecting the differing demands of catalyzing such chemically diverse substrates. This strongly supports our belief that the molecular basis for the promiscuity is purely electrostatic, leading us to argue for chemically-driven protein evolution in this superfamily.21
![]() | ||
Fig. 12 A comparison of the representative transition state structures for the reactions of Pseudomonas aeruginosa arylsulfatase (PAS) complexed with (A) p-nitrophenyl phosphate, (B) ethyl-p-nitrophenyl phosphate, (C) bis-p-nitrophenyl phosphate and (D) p-nitrophenyl sulfate. Labeling for key active site residues can be found in Fig. 6C, and P(S)–O distances to the leaving group and nucleophile oxygens highlighted here are averages over ten trajectories. This figure is based on data presented in Table 1 and ref. 21. |
Substrate | Water | Enzyme | Difference | |||
---|---|---|---|---|---|---|
P(S)–Onuc | P(S)–Olg | P(S)–Onuc | P(S)–Olg | P(S)–Onuc | P(S)–Olg | |
a All distances are in Å and are averages over 10 MD trajectories (500![]() | ||||||
(1a) | 2.257 | 2.001 | 2.150 | 1.979 | −0.107 | −0.022 |
(1b) | 2.504 | 2.328 | 2.348 | 2.199 | −0.156 | −0.129 |
(2) | 2.466 | 2.356 | 2.320 | 2.307 | −0.146 | −0.049 |
(3) | 2.470 | 2.349 | 2.401 | 2.280 | −0.069 | −0.069 |
(4) | 2.443 | 2.272 | 2.350 | 2.234 | −0.093 | −0.038 |
![]() | ||
Fig. 13 Overlay of the electrostatic group contributions to the calculated activation barrier of Pseudomonas aeruginosa arylsulfatase (PAS) for each substrate calculated using the LRA approach (original data presented in Table S3 of ref. 21). This figure is adapted from ref. 21. |
As an aside, we would like to point out that, at a similar time to the publication of our work, an elegant experimental study made a strong case in favor of cooperating active site residues with multiple roles as well as catalytic backups in the case of the promiscuous serum paraoxonase PON1.242 At least qualitatively, this would support our hypothesis, as it reaches similar conclusions on an independent system using different techniques. Therefore, we believe that the examples illustrated here, both in the case of PAS but also in the case of AP and NPP demonstrate that there are a wide variety of computational tools available in order to quantitatively dissect the molecular basis of catalytic promiscuity, and, particularly when different approaches are combined for validation purposes, computational power has reached a stage where we are potentially able to not only rationalize but also guide protein evolution in silico.
This journal is © the Owner Societies 2013 |