Cross-species analysis of the glycolytic pathway by comparison of molecular interaction fields

Matthias Stein *a, Razif R. Gabdoulline ab and Rebecca C. Wade *a
aEML Research gGmbH, Molecular and Cellular Modelling, Schloss-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany. E-mail: matthias.stein@eml-r.villa-bosch.de; rebecca.wade@eml-r.villa-bosch.de; Fax: +49-6221-533-298; Tel: +49-6221-533-247
bBIOMS (Center for Modeling and Simulation in the Biosciences), University of Heidelberg, Im Neuenheimer Feld 368, D-69120 Heidelberg, Germany

Received 24th June 2009 , Accepted 27th July 2009

First published on 7th September 2009


Abstract

The electrostatic potential of an enzyme is a key determinant of its substrate interactions and catalytic turnover. Here we invoke comparative analysis of protein electrostatic potentials, along with sequence and structural analysis, to classify and characterize all the enzymes in an entire pathway across a set of different organisms. The electrostatic potentials of the enzymes from the glycolytic pathway of 11 eukaryotes were analyzed by qPIPSA (quantitative protein interaction property similarity analysis). The comparison allows the functional assignment of neuron-specific isoforms of triosephosphate isomerase from zebrafish, the identification of unusual protein surface interaction properties of the mosquito glucose-6-phosphate isomerase and the functional annotation of ATP-dependent phosphofructokinasesand cofactor-dependent phosphoglycerate mutases from plants. We here show that plants possess two parallel pathways to convert glucose. One is similar to glycolysis in humans, the other is specialized to let plants adapt to their environmental conditions. We use differences in electrostatic potentials to estimate kinetic parameters for the triosephosphate isomerases from nine species for which published parameters are not available. Along the core glycolytic pathway, phosphoglycerate mutase displays the most conserved electrostatic potential. The largest cross-species variations are found for glucose-6-phosphate isomerase, enolase and fructose-1,6-bisphosphate aldolase. The extent of conservation of electrostatic potentials along the pathway is consistent with the absence of a single rate-limiting step in glycolysis.


Introduction

In systems biology studies, one follows the emergence of biological function from macromolecular interactions.1,2 Proteins and enzymes are not considered as isolated components but as arranged in modules or molecular networks with interdependencies between them. With the availability of a large number of genome sequences, the detailed investigation of all proteins in a network or a pathway has become possible.

In mathematical modelling of systems biology networks, the molecular details of enzyme–substrate and proteinprotein interactions are not considered. In molecular systems biology, however, the intrinsic molecular interactions and enzymatic reaction mechanisms of each enzyme involved in the systems biology network are addressed.3,4 Thus it is important to derive relationships between enzyme activity–kinetics and molecular interactions between substrate molecules and enzymes or other critical constituents for each component of the biochemical network. This task is one of the challenges of modern enzymology.5 In a protein structure-based molecular systems biology approach, the challenge is to relate differences in protein structures to differences in enzyme kinetics and activity. This will provide the basis for computing the physicochemical properties that determine the enzyme activity.

Among other structure-based descriptors, the molecular electrostatic potential is the most appropriate property to study enzymatic function and to apply for estimating enzymatic kinetic parameters since, in general, it is considered to be the most important contributor to enzyme catalytic functionality, see for example ref. 6–10. Livesay et al. have shown that the conservation of the electrostatic potential is a key determinant of the conservation of function in enzyme families and superfamilies.11,12

We here present results from a structure-based molecular systems biology application of the PIPSA (protein interaction property similarity analysis) method13,14 in which the differences in the molecular interaction fields of proteins are calculated and analyzed. The quantitative application of PIPSA (qPIPSA) has been successful in correlating enzymatic kinetic parameters with differences in the electrostatic potential of individual enzymes such as glucokinase,15triosephosphate isomerase, acetylcholinesterase and aldolase.16 qPIPSA is here applied to the modelling and analysis of all the enzymes of a biological metabolic pathway. We study the glycolytic pathway as a well-characterized example of an enzymatic pathway that is conserved across a range of organisms and in which at least one protein structure has been solved by X-ray crystallography for every step of the pathway.

Glycolysis is one of the best conserved metabolic pathways. It involves the degradation of glucose to pyruvate, releasing two moles of ATP per cycle. The classical and most general route for glucose degradation is termed the Embden–Meyerhoff pathway.17 A variety of modified pathways for sugar degradation and some alternative routes and bypasses exist in micro-organisms (for example, the Entner–Dourodoff pathway in prokaryotes), in particular hyperthermophilic archaea, for a review see ref. 18 and 19. The chemical reactions considered in this study are given in Fig. 1.


Schematic representation of the aerobic Embden–Meyerhof glycolytic pathway as considered in this study. It consists of ten steps of enzyme-catalyzed chemical reactions which convert d-glucose to pyruvate. The ten enzymes compared are hexokinase/glucokinase (HK/GK, EC 2.7.1.1/2.7.1.2, reaction 1), glucose-6-phosphate isomerase (PGI, EC 5.3.1.9, reaction 2), phosphofructokinase (PFK, EC 2.7.1.11, reaction 3), fructose bisphosphate aldolase (FBA, EC 4.1.2.13, reaction 4), triosephosphate isomerase (TIM, EC 5.3.1.1, reaction 5), glyceraldehyde-3-phosphate dehydrogenase (GAPDH, EC 1.2.1.12, reaction 6), phosphoglycerate kinase (PGK, EC 2.7.2.3, reaction 7), phosphoglycerate mutase (PGM, EC 5.4.2.1, reaction 8), phosphopyruvate hydratase (enolase) (EC 4.2.1.11, reaction 9) and pyruvate kinase (PK, EC 2.7.1.40, reaction 10).
Fig. 1 Schematic representation of the aerobic Embden–Meyerhof glycolytic pathway as considered in this study. It consists of ten steps of enzyme-catalyzed chemical reactions which convert D-glucose to pyruvate. The ten enzymes compared are hexokinase/glucokinase (HK/GK, EC 2.7.1.1/2.7.1.2, reaction 1), glucose-6-phosphate isomerase (PGI, EC 5.3.1.9, reaction 2), phosphofructokinase (PFK, EC 2.7.1.11, reaction 3), fructose bisphosphate aldolase (FBA, EC 4.1.2.13, reaction 4), triosephosphate isomerase (TIM, EC 5.3.1.1, reaction 5), glyceraldehyde-3-phosphate dehydrogenase (GAPDH, EC 1.2.1.12, reaction 6), phosphoglycerate kinase (PGK, EC 2.7.2.3, reaction 7), phosphoglycerate mutase (PGM, EC 5.4.2.1, reaction 8), phosphopyruvate hydratase (enolase) (EC 4.2.1.11, reaction 9) and pyruvate kinase (PK, EC 2.7.1.40, reaction 10).

Previous comparative analyses of glycolytic pathways have been performed at the gene level18,20–22 or using a graph-topological approach against the NCBI taxonomy23 but have not addressed species-to-species variation at the protein structural level. The analysis on the protein structural level is particularly challenging since accurate protein structural models are needed for this analysis.16 Due to recent advances in structure determination and modelling, it is possible to investigate and compare all enzymes in an entire pathway using a combination of bioinformatics tools, protein structural modelling and comparison of 3D interaction properties. The comparison of 3D interaction properties provides complementary information to sequence and structure analysis and can, thereby, lead to the identification of novel protein properties. This approach has been applied to individual protein families13,24,25 and is here applied to a whole pathway.

We aimed at a comparison of glycolytic enzymes from Homo sapiens with those from other, closely related species. We thus selected a set of 10 additional eukaryotic model organisms for which protein sequences are obtainable for every enzymatic step of glycolysis and for which the enzyme structures are sufficiently homologous. The comparison of all electrostatic potentials along the metabolic pathway reveals a strict conservation of electrostatic potential around the catalytic sites. An inter-organism comparison was carried out for each chemical step of the glycolytic pathway. This analysis reveals the extent of conservation or variation of the electrostatic potentials of the ten enzymes along the pathway across different species, allows the discrimination of tissue-specific isoforms, identifies functional misannotations for a group of phosphofructokinases, assists the functional annotation of previously unidentified ATP-dependent phosphoglycerate mutases in plants and gives estimates of the values of kinetic parameters of triosephosphate isomerases from species for which published parameters are not available.

Results and discussion

Choice of model organisms and sequences

A set of model eukaryotic organisms were selected that show mutual similarity in primary sequence and protein secondary structure of the enzymes at all ten steps of the classical Embden–Meyerhof pathway of glycolysis (see Experimental section). In the case of isozymes and when possible, the enzyme expressed in the liver was chosen.

Comparison of conservation of protein sequence and electrostatic potential

Fig. 2 shows the relation between the conservation of amino acid residues (the percentage of conserved residues) versus the conservation of the electrostatic potential (measured by the Hodgkin similarity index) computed pairwise for the complete dataset.
Comparison of pairwise sequence similarity and electrostatic potential similarity. For all pairs (excluding self-comparisons, 550 in total) of each of the 10 enzymes studied, pairwise sequence identity is plotted against the Hodgkin similarity index of the electrostatic potentials in a skin around the protein surface. For this dataset, a high sequence identity is necessary for a high Hodgkin index (>0.8) but there is little correlation with sequence identity at lower values of the Hodgkin index.
Fig. 2 Comparison of pairwise sequence similarity and electrostatic potential similarity. For all pairs (excluding self-comparisons, 550 in total) of each of the 10 enzymes studied, pairwise sequence identity is plotted against the Hodgkin similarity index of the electrostatic potentials in a skin around the protein surface. For this dataset, a high sequence identity is necessary for a high Hodgkin index (>0.8) but there is little correlation with sequence identity at lower values of the Hodgkin index.

There is significant correlation in the region of high amino acid sequence identity in which the electrostatic potential is well conserved. However, below a Hodgkin similarity index of about 0.8 for the electrostatic potential, there is little relation between sequence identity and electrostatic potential conservation. This shows that analysis of the electrostatic potentials provides complementary information to a sequence analysis; this information is directly related to the catalytic function of the active site and is also relevant to other functions, like binding, and the adaptation of the protein to its environment (low/high pH, etc).

Discrimination of tissue-specific enzyme isoforms by electrostatic properties

Differences in the electrostatic potential at or near the surface of isofunctional enzymes may imply different binding partners or an adaptation to a tissue-specific environmental condition such as pH. For triosephosphate isomerase (TIM), an iterative PSI-BLAST search of related protein sequences from zebrafish (Danio rerio) retrieved 5 sequences that form two groups of TIM sequences. The detection of which of these 5 sequences is the most functionally similar to the human enzyme is difficult on the basis of sequence conservation (see Fig. 3, left). Q90XG0_DANRE and Q7T315_DANRE show a sequence identity of 83% with the human, generally expressed enzyme (TPIS_HUMAN), while Q90XF9_DANRE, Q1MTI4_DANRE and Q7ZWB0_DANRE show a sequence identity of 80% to the human enzyme.
Comparison of triosephosphate isomerase (TIMs), focusing on the five isoforms in zebrafish. (Top, left): phylogenetic tree based on amino acid sequence similarities of TIMs. Five different amino acid sequences from zebrafish (D. rerio, DANRE) are close in sequence and located between sequences from insects (A. gambiae, ANOGA; D. melanogaster, DROME) and mammalian species (mouse, rat, orang-utan, and human). Amino acid sequence identities of zebrafish TIMs relative to H. sapiens are given. (Top, right): epogram, tree-like visualization of the distance matrix of electrostatic potentials of TIMs. The five sequences from D. rerio fall into two categories: Q90XG0_DANRE and Q7T315_DANRE are closer to TIMs from chicken, African claw frog and mammalian species, while Q90XF9_DANRE, Q1MTI4_DANRE and Q7ZWB0_DANRE show electrostatic potentials remote from mammalian species and closer to the plant enzymes from rice and mouse-ear cress (see text for discussion). (Bottom): visualization of the electrostatic potentials at ±0.6 kcal mol−1 e−1 of the isoforms of TIMs from D. rerio. The subcluster Q90XF9_DANRE, Q1MTI4_DANRE and Q7ZWB0_DANRE shows a significantly more negatively electrostatic potential than the other two isoforms. This can be assumed to be an adaptation of these neuron-specific TIMs to environmental conditions in neuron cells (see text for details).
Fig. 3 Comparison of triosephosphate isomerase (TIMs), focusing on the five isoforms in zebrafish. (Top, left): phylogenetic tree based on amino acid sequence similarities of TIMs. Five different amino acid sequences from zebrafish (D. rerio, DANRE) are close in sequence and located between sequences from insects (A. gambiae, ANOGA; D. melanogaster, DROME) and mammalian species (mouse, rat, orang-utan, and human). Amino acid sequence identities of zebrafish TIMs relative to H. sapiens are given. (Top, right): epogram, tree-like visualization of the distance matrix of electrostatic potentials of TIMs. The five sequences from D. rerio fall into two categories: Q90XG0_DANRE and Q7T315_DANRE are closer to TIMs from chicken, African claw frog and mammalian species, while Q90XF9_DANRE, Q1MTI4_DANRE and Q7ZWB0_DANRE show electrostatic potentials remote from mammalian species and closer to the plant enzymes from rice and mouse-ear cress (see text for discussion). (Bottom): visualization of the electrostatic potentials at ±0.6 kcal mol−1 e−1 of the isoforms of TIMs from D. rerio. The subcluster Q90XF9_DANRE, Q1MTI4_DANRE and Q7ZWB0_DANRE shows a significantly more negatively electrostatic potential than the other two isoforms. This can be assumed to be an adaptation of these neuron-specific TIMs to environmental conditions in neuron cells (see text for details).

The relationship between the calculated electrostatic potentials can be visualized in a tree-like diagram (“epogram”). We generated such a diagram for the TIMs (see Fig. 3, right). Proteins from human, orang-utan, mouse and rat exhibit a very similar electrostatic potential in the protein surface skin, whereas TIMs from mouse-ear cress (Arabidopsis thaliana) and rice (Oryza sativa subsp. japonica) show a distinct more negative electrostatic potential (see ESI ). The three member group of TIMs from zebrafish (Q90XF9_DANRE, Q1MTI4_DANRE, Q7ZWB0_DANRE) shows an electrostatic potential which is close to those from mouse-ear cress and rice but distinct from the other zebrafish and mammalian enzymes. The two member group (Q90XG0_DANRE and Q7T315_DANRE), however, shows an electrostatic potential closer to TIMs from chicken, African claw frog and mammalian species. The discrimination between the two groups of zebrafish TIMs is much more pronounced at the level of the electrostatic potential than at the sequence level.

The low similarity (0.52) of the electrostatic potential of the zebrafish TIM, Q90XF9_DANRE (and the 2 other zebrafish sequences in that group), to mammalian enzymes and the higher similarity to mouse-ear cress and rice support an assignment of Q90XF9_DANRE to a negatively charged TIM of type A (TPISA) which is found specifically in neurons in eyes and brain.26 Such a negatively charged, neuron-specific TIM has also been found in teleost fish.27

qPIPSA of the related sequence Q90XG0_DANRE shows higher similarity to enzymes from rat (0.57), mouse (0.60), human (0.65) and orang-utan (0.65) and supports an assignment of Q90XG0_DANRE and Q7T315_DANRE to a TIM of type B (TPISB), which is generally expressed in all tissues.26,27 It can currently only be assumed that the increase of negative charges at or near the protein surface represents an adaptation to the molecular environment in neuron cells. Merritt and Quattro also speculate on a contribution of these negatively charged proteins to the resting midpoint potential of −70 mV inside neuron cells.27 Most of the sequence differences between the two TIM types are at residues that are surface exposed or near the surface. Similarly to the plant TIMs, these differences result in a more negative electrostatic potential in the vicinity of the active site and consequently, the kinetic parameters (kcat/Km and Km) are estimated to be similar to those of the plant enzymes (See Section ‘Estimation of enzyme kinetic parameters’).

Identification of an enzyme with an exceptional electrostatic potential but a conserved active site

Glucose-6-phosphate isomerase (PGI) catalyzes the isomerization of glucose-6-phosphate to fructose-6-phosphate (step 2). Fig. 4 shows electrostatic potential isosurfaces of the PGIs of the eleven model species contoured ±0.6 kcal mol−1 e−1.
Electrostatic potential isosurfaces of the glucose-6-phosphate isomerases from the eleven model organisms shown at levels of ±0.6 kcal mol−1 e−1. They are ordered by the Hodgkin similarity index of their electrostatic potential (entire skin) relative to human (1.00): orang-utan (0.95), mouse (0.84), rat (0.78), zebrafish (0.69), chicken (0.67), claw frog (0.60), rice (0.34), mouse-ear cress (0.27), fruit fly (0.23), and mosquito (0.07) (see text for details).
Fig. 4 Electrostatic potential isosurfaces of the glucose-6-phosphate isomerases from the eleven model organisms shown at levels of ±0.6 kcal mol−1 e−1. They are ordered by the Hodgkin similarity index of their electrostatic potential (entire skin) relative to human (1.00): orang-utan (0.95), mouse (0.84), rat (0.78), zebrafish (0.69), chicken (0.67), claw frog (0.60), rice (0.34), mouse-ear cress (0.27), fruit fly (0.23), and mosquito (0.07) (see text for details).

The proteins are ordered according to their Hodgkin similarity indices relative to H. sapiens PGI: orang-utan (0.95), mouse (0.84), rat (0.78), African claw frog (0.69), chicken (0.67) and zebrafish (0.60). The PGIs from rice, mouse-ear cress and fruit fly show a lower level of similarity to H. sapiens (0.34–0.23) and mosquito (Anopheles gambiae) has the lowest similarity (0.07) and an electrostatic potential that is clearly distinct from the other species. The PGI enzyme from mosquito (A. gambiae) shows a largely negative electrostatic potential at or near the protein surface whereas that of the other enzymes is largely positive. These results are robust to the choice of a different but equally suitable template structure for protein modelling (see ESI ). The PGI from mosquito is an example of a protein with a rather high sequence identity (68%) to the human PGI but with an absence of correlation in the electrostatic potential when comparing it over the protein surface.

The electrostatic potential at the active site, however, shows a more conserved value (0.87) in agreement with the conserved enzymatic function and points to an adaptation of the PGI from mosquito to an environmental condition.

Residues at or near the active site are generally much less prone to mutation than those on the protein surface. The conservation of enzymatic function is reflected in the reduced distance of the electrostatic potential of PGIs near the active site. The distance between potentials around the substrate is significantly smaller compared to the distance between potentials on the skin of the entire protein surface (see ESI ). This points to a conservation of enzymatic mechanism and possibly also similar enzyme kinetics. The PGI from A. gambiae serves here as an example of an enzyme with a high degree of sequence conservation (68% to human) but with very different interaction properties, i.e. considering the whole protein, there is no correlation of the electrostatic potentials (0.07) even though the electrostatic potential in the active site region is well conserved (0.87). This information is not directly available from a sequence-based analysis but immediately recognizable from our analysis.

The PGI enzyme is a multifunctional protein. Besides its involvement in glycolysis and gluconeogenesis in the cytosol, PGI is also known, among others, to act as a neuroleukin and autocrine motility factor in the extracellular space (‘moonlighting function’).28 The conservation of the electrostatic potential around the active site and the variation of the electrostatic potential over the protein surface suggest a conservation of the catalytic mechanism and recognition of binding partners via the protein surface. Recent results show that PGI enzymatic turnover is not critical for its moonlighting function.29

Electrostatic potential comparison aids functional annotation of phosphofructokinases in plants

Phosphofructokinase (PFK, step 3 of glycolysis) enzymes show a large cross-species variation in their electrostatic potential when compared for the entire protein. PFKs catalyze the interconversion of fructose-6-phosphate and fructose-1,6-bisphosphate. PFKs have a complex evolutionary history and different phosphor-donors can be used.30ATP-dependent phosphofructokinases (PFK, EC 2.7.1.11) are present in eukaryotes and use ATP as a source of phosphate whereas pyrophosphate(PPi)-dependent phosphofructokinases (PFP, EC 2.7.1.90) are present in plants and fungi and use inorganic phosphate.

ATP-dependent PFKs have only very recently been identified in plants. One was isolated and sequenced in spinach31 and Mustroph et al. discovered seven ATP-dependent phosphofructokinases, as well as four PPi-dependent PFPs, in A. thaliana.32 The plant ATP-dependent PFKs are closer in sequence to the group I ATP-dependent PFKs found in animals and bacteria than to the plant PPi-dependent PFPs. In the present study, we selected representative amino acid sequences of putative ATP-dependent PFKs from A. thaliana and Oryza sativa (Q9M0F9_ARATH and Q5SNH5_ORYSA).

The multiple sequence alignment showed an unusual fingerprint of ATPvs. PPi binding residues. The residues at positions 104 and 124 were shown to determine the ATP/PPi substrate specificity,33 see Fig. 5A.


Novel signature of ATP-dependent PFKs in plants at the amino acid sequence level and by comparison of electrostatic potentials. A: ATP-dependent PFKs in plants. Residues Gly/Asp104 and Gly/Lys124 were shown to determine the ATP/PPi substrate specificity, respectively.33 The sequences from Q9M0F9_ARATH and Q5SNH5_ORYSA show a novel signature of Gly104 and Lys124 residues and can now be annotated as ATP-dependent PFKs in plants (see text for details). B: Matrix visualization of an all-pairwise comparison of the electrostatic potentials of the entire protein surface (left) and the substrate binding site (right) of phosphofructokinases.
Fig. 5 Novel signature of ATP-dependent PFKs in plants at the amino acid sequence level and by comparison of electrostatic potentials. A: ATP-dependent PFKs in plants. Residues Gly/Asp104 and Gly/Lys124 were shown to determine the ATP/PPi substrate specificity, respectively.33 The sequences from Q9M0F9_ARATH and Q5SNH5_ORYSA show a novel signature of Gly104 and Lys124 residues and can now be annotated as ATP-dependent PFKs in plants (see text for details). B: Matrix visualization of an all-pairwise comparison of the electrostatic potentials of the entire protein surface (left) and the substrate binding site (right) of phosphofructokinases.

Gly104 is essential for the use of ATP.33 An Asp in position 104 in PFPs prevents ATP from binding by steric hindrance and possibly a charge effect.34 A Gly in position 124 is characteristic of ATP-dependent PFKs since it allows the steric incorporation of the α-phosphate of ATP. When this position is occupied by the sterically more constricting lysine (found in all PFPs), only the recognition of PPi is feasible. A third combination (Gly104 and Lys124) has been observed in amino acid sequence comparisons by ref. 30 and 35 and termed cluster `X′, for putative atypical PFKs using ATP. This pattern is also present in the Q9M0F9_ARATH and Q5SNH5_ORYSA sequences and supports their assignment to ATP-utilizing enzymes, even though they are currently given as PPi-dependent PFKs in the Interpro database.

Q9M0F9_ARATH and Q5SNH5_ORYSA display an overall low sequence identity to the human ATP-dependent PFK of only 27% and 26%, respectively. In addition, Q9M0F9_ARATH and Q5SNH5_ORYSA are both lacking the ProSite pattern PS00433 which is characteristic for ATP-dependent phosphofructokinases. Thus, on the basis of sequence comparison, it is difficult to make a reliable functional annotation. The electrostatic potentials around the protein surface show no correlation with the human enzyme overall (Hodgkin indices of 0.12 and 0.06) (Fig. 5B, left). The conservation of the electrostatic potential around the catalytic site, however (compare Fig. 5B, right), confirmed the close relation of the two plant sequences to human ATP-dependent PFKs (Hodgkin indices 0.67 and 0.55). This suggests the functional annotation of these plant sequences as ATP-dependent plant PFKs with a novel F6P binding signature. The catalytic residue Asp127, which acts as a proton acceptor during catalysis, is strictly conserved among all sequences. The difference in electrostatic potential around this residue between mammalian and plant enzymes indicates higher Km values for fructose-6-phosphate in plants. Such higher values (compared to the human liver enzyme with a Km of 0.24 mM36) were recently reported for bacterial PPi- (F6PKm value 0.4 mM) and ATP-dependent PFKs (F6PKm value 6–10 mM) from the bacterium Amycolatopsis methanolica but have not been reported for plant enzymes yet.37

The misannotation of protein sequences from other higher plant phosphofructokinases according to their cosubstrate specificity (PPi vs.ATP) was also recently noted by ref. 31. This example illustrates how the functional characterization of a large number of enzymes can be aided by a quantitative comparison of the electrostatic potentials which is not available from a sequence comparison alone.

Identification of a novel class of cofactor-dependent phosphoglycerate mutases in plants

Phosphoglycerate mutases (step 8 in glycolysis, PGM) show the lowest cross-species variation in electrostatic potential (see Fig. 9) of all the enzymes in the glycolytic pathway. PGMs convert 3-phosphoglycerate into 2-phosphoglycerate and come in two evolutionarily unrelated forms. The cofactor 2,3-bisphosphoglycerate-independent PGMs (iPGMs) are monomers of ∼60 kDa size and are found in plants, whereas the cofactor-dependent PGMs (dPGMs) are smaller in size (∼27 kDa), oligomeric, and found in vertebrates (including human), certain invertebrates, some bacteria and fungi. The plant iPGMs are binuclear metalloenzymes requiring Mn2+ for catalytic activity.38 Until recently, it was usually accepted that plants only express iPGM enzymes.

The conservation of the electrostatic potential around the active site in the dataset analyzed here (see Fig. 6) is thus surprising since two plant enzymes from O. sativa and A. thaliana were also included in the investigation. On the basis of iterative BLAST searches, multiple sequence alignment and comparison of the electrostatic potentials, the preliminarily annotated sequences from O. sativa (Q6Z8J0_ORYSJ, annotated as “putative phosphoglycerate mutase”) and A. thaliana (Q9LM13_ARATH, not functionally annotated in SwissProt) were chosen for the analysis. These plant sequences are significantly shorter than typical plant iPGM sequences (∼330 amino acid residues) and show a sequence identity of ∼43% to PGAM1 sequences from mammalian species, like human and mouse. The electrostatic potentials around the protein surfaces showed overall no correlation with the human enzyme (Hodgkin SIs 0.08 and 0.09). The conservation of the electrostatic potential in the region around His8, which is transiently phosphorylated during catalysis, is high (Hodgkin SIs 0.93 for rice and A. thaliana), and suggests the annotation of these two sequences as dPGMs (see Fig. 6). Cofactor-independent PGMs (iPGMs) in plants lack this conserved histidine residue.


Conservation of the electrostatic potential around the active site of phosphoglycerate mutases (PGMs). The strict conservation of the electrostatic potential around His8 and the conservation of amino acid residues binding the substrate 3-phosphoglyceric acid make an assignment of Q6Z8J0_ORYSJ and Q9LM13_ARATH as cofactor 2,3-bisphosphoglyceric acid dependent (dPGMs) from plants very likely.
Fig. 6 Conservation of the electrostatic potential around the active site of phosphoglycerate mutases (PGMs). The strict conservation of the electrostatic potential around His8 and the conservation of amino acid residues binding the substrate 3-phosphoglyceric acid make an assignment of Q6Z8J0_ORYSJ and Q9LM13_ARATH as cofactor 2,3-bisphosphoglyceric acid dependent (dPGMs) from plants very likely.

In addition to the conservation of the electrostatic potential, detailed inspection of the multiple sequence alignment showed the conservation of residues known to interact with the substrate 3-phosphoglycerate. The only differences in substrate binding residues were the non-critical Thr/Phe20 and Arg/Gln59 mutations in the A. thaliana and O. sativa sequences. The high conservation of the electrostatic potential around the 3PGA substrate binding site points to a conservation of enzymatic mechanism for the enzymes investigated. Until now, only 2,3-bisphosphoglycerate-independent PGMs have been found in plants.39 This is the first functional annotation of a dPGM in plants and is based on an intermediate amino acid sequence identity and on a very high conservation of the three-dimensional electrostatic potential around the active site. Apart from the Q9LM13_ARATH sequence used here as a representative, the A. thalianagenome reveals two additional nearly identical copies (Q8L832 and Q8LBD7).

Estimation of enzyme kinetic parameters

Variations of the electrostatic potential at the active site can be related to cross-species variations in the kcat/Km values of triosephosphate isomerases (TIMs).40 Gabdoulline et al. showed that differences in electrostatic potentials near the “hinge” loop region of TIMs were related to variations in Km values for various species and differences in the electrostatic potential near the active site relate to kcat/Km values.16 Gabdoulline et al. derived a correlation between the differences in electrostatic potentials and the differences in Km and kcat/Km values from a set of 12 different species. For the kcat/Km parameter, it was found that an increase of kcat/Km of 1 ln unit is related to a 1.59 kcal mol−1 e−1 increase of average electrostatic potentials in a region around L230. For Km values, an increase of electrostatic potentials by 0.85 kcal mol−1 e−1 in the region around W168 resulted in a 1 ln unit decrease in Km values. Here, we have used the correlations from ref. 16 and the experimental data from human41 and chicken42 TIMs to make two independent estimates of the kinetic parameters of the TIMs from the other species studied here, for which no experimental values are available.

For Km values, predictions are based on a quantitative comparison of the electrostatic potential differences relative to enzymes from human and chicken, respectively. A sphere of radius 10 Å around the CZ atom of tryptophan 168, of the N-terminal hinge region of flexible loop 6, was chosen as the comparison region. The computed Km values all lie in a narrow range between 0.4 and 1 mM (see Fig. 7). The differences of 0.04–0.08 mM come from the two independent estimates based on the experimental values from human and chicken, respectively. Only limited comparison with experimental data is possible: the computed similarity of the Km value of TPI from rat to that from chicken (0.47 mM) and human (0.49 mM) is in agreement with experiments in which a Km value of 0.4 mM for rat was determined.43 The larger computed Km value of 0.9–1 mM for Drosophila melanogaster can be understood from the substitution of a conserved Lys174 by a Gln in D. melanogaster. All other species maintain the positively charged lysine residue in the C-terminal region of the loop 6 that opens and closes upon substrate binding.44 The replacement of a positively charged lysine residue by a neutral glutamine is expected to reduce the binding affinity for the negatively charged substrate significantly.45,46 Slightly higher Km values were also computed for TIMs from plants Oryza and A. thaliana. Both plant sequences display a substitution in the C-terminal region of loop 6 of threonine 175 by a valine residue. The threonine provides an additional polar hydroxyl group and may assist the substrate binding.47 The replacement of this polar threonine by a neutral valine may weaken the substrate TIM interaction and lead to higher Km values.


Use of the calculated average differences in electrostatic potentials to estimate enzymatic kinetic parameters for triosephosphate isomerases using the correlation factors derived from a larger training set.16 Two independent computations were made based on the experimental values for H. sapiens41 (□) and G. gallus (chicken)42triosephosphate isomerases (■). The experimental data on which the computations are based are marked as such. (Top): computed Km values based on electrostatic potential differences around the CZ atom of residue Trp168. (Bottom): computed kcat/Km values based on electrostatic potential differences around the oxygen atom of L230.
Fig. 7 Use of the calculated average differences in electrostatic potentials to estimate enzymatic kinetic parameters for triosephosphate isomerases using the correlation factors derived from a larger training set.16 Two independent computations were made based on the experimental values for H. sapiens41 (□) and G. gallus (chicken)42triosephosphate isomerases (■). The experimental data on which the computations are based are marked as such. (Top): computed Km values based on electrostatic potential differences around the CZ atom of residue Trp168. (Bottom): computed kcat/Km values based on electrostatic potential differences around the oxygen atom of L230.

Predictions of kcat/Km values for the TIMs were based on a comparison of the electrostatic potential around the active site. Wade et al. showed that species-dependent differences in kcat/Km values were caused by differences in the electrostatic potential around the active site.40 Gabdoulline et al. used the differences in electrostatic potential around the oxygen atom of residue L230 for a correlation with species-dependent kcat/Km differences.16 All computed values of kcat/Km lie in a narrow range between 0.7 and 1.1 × 107 M−1 s−1 (see Fig. 7, bottom). The lowest kcat/Km values were computed for TIMs from D. melanogaster and A. thaliana. Mouse, rat and orang-utan were predicted to possess indistinguishable kcat/Km values of about 0.9–1.0 × 107 M−1 s−1. The differences of about 0.1 × 107 M−1 s−1 in Fig. 7 (bottom) come from the two independent estimates based on experimental values from human and chicken, respectively.

We can conclude here that the differences in electrostatic potentials can be used to predict enzymatic kinetic parameters for species for which no values have been reported in the literature.16 For TIMs, the electrostatic potential differences in loop 6 lead to cross-species differences in substrate Km values of a factor of 2.5 while the electrostatic potential around the active site is well conserved and kcat/Km values only differ slightly. This information is relevant for the systems biology kinetic modelling of glycolysis in various organisms.

Analysis of a complete metabolic pathway by comparison of molecular interaction fields

Clustering of enzymes according to the conservation of electrostatic potential around the active site

Fig. 8 shows the clustering of species according to the all-pairwise comparison of electrostatic potentials around the active site for all enzymes of the glycolytic pathway.
Clustering of the eleven model species based on the overall all-pairwise comparison of the electrostatic potential differences around the active site for all 10 glycolytic enzymes. The electrostatic potential distance matrix is represented as a tree-like diagram (epogram). See text for details.
Fig. 8 Clustering of the eleven model species based on the overall all-pairwise comparison of the electrostatic potential differences around the active site for all 10 glycolytic enzymes. The electrostatic potential distance matrix is represented as a tree-like diagram (epogram). See text for details.

The analysis of metabolic pathways by clustering according to the electrostatic potentials of the molecular constituents provides a new approach for the functional classification of biochemical networks. The observed clustering of the glycolytic enzymes according to their active site electrostatic potentials largely follows the taxonomic annotation by NCBI based on ribosomal 16S RNA sequences20 and the phylogeny of glycolytic enzymes based on nucleotide and protein sequences.21

Previous comparisons of enzymes in the glycolytic pathway have focussed on discussing differences between the enzymes and pathways from bacteria, eukaryotes and hypothermophilic archaea, see ref. 18 and references therein. The pathways have been classified on the gene level but no comparison on the protein molecular levels has been made.

Heymans and Singh23 and Clemente et al.48 observed a subclustering of glycolytic mammalian enzymes (human, mouse, rat) distinct from other eukaryotes (such as fruit fly, mouse-ear cress, yeast) based on a graph-topological analysis of metabolic pathways and on the gene ontology, respectively. This clear separation of mammalian and non-mammalian eukaryotic species can also be observed in the comparison of electrostatic potentials. The ordering of eukaryotic species according to their electrostatic potentials follows the full organism sequence comparison observed by Zdobnov et al.49

Electrostatic potentials along the glycolytic pathway—largest variations

In general, the electrostatic potential is very well conserved for all the species under investigation. The glycolytic enzymes for each step share a similar secondary structure, have the same EC number and thus supposedly follow the same enzymatic mechanism.

For the model organisms, the all-pairwise comparison of their electrostatic potentials can reveal relevant details of the cross-species conservation of enzymatic function and, in some cases, the enzyme kinetics. The all vs. all comparison of the Hodgkin similarity indices for the ten glycolytic reactions for the model organisms is shown in Fig. 9A. When comparing the electrostatic potential in a skin around the protein surfaces, the largest cross-species variation is found for phosphofructokinases (step 3) and phosphoglycerate mutases (step 8). Glyceraldehyde-3-phosphate dehydrogenase (step 6) and enolase (step 9) are the glycolytic steps with most conserved electrostatic potential of the protein skin.


All-pairwise comparison of the electrostatic potentials of the glycolytic enzymes from the 11 model organisms. For each protein, the region for comparison of the calculated electrostatic potential is a skin with a thickness of 3 Å extending from the probe accessible surface around the protein computed with probe of radius 2 Å. A: Hodgkin index computed for entire skin. B: Hodgkin index computed for region of skin with intersecting spheres of radius 15 Å around the ligand binding site. C: Absolute differences of the electrostatic potentials (in kcal mol−1 e−1) in the region of the skin within intersecting spheres of radius 15 Å around the active sites. The electrostatic potentials in the active site region are well conserved across the glycolytic pathway, though Fig. 9B and C show more variation between the species in the first half of the chemical reactions. D: Averaged electrostatic potentials in kcal mol−1 e−1 for the 11 model species in a region of 15 Å around the active sites.
Fig. 9 All-pairwise comparison of the electrostatic potentials of the glycolytic enzymes from the 11 model organisms. For each protein, the region for comparison of the calculated electrostatic potential is a skin with a thickness of 3 Å extending from the probe accessible surface around the protein computed with probe of radius 2 Å. A: Hodgkin index computed for entire skin. B: Hodgkin index computed for region of skin with intersecting spheres of radius 15 Å around the ligand binding site. C: Absolute differences of the electrostatic potentials (in kcal mol−1 e−1) in the region of the skin within intersecting spheres of radius 15 Å around the active sites. The electrostatic potentials in the active site region are well conserved across the glycolytic pathway, though Fig. 9B and C show more variation between the species in the first half of the chemical reactions. D: Averaged electrostatic potentials in kcal mol−1 e−1 for the 11 model species in a region of 15 Å around the active sites.

When focussing on the electrostatic potential around the enzymes’ active sites, the cross-species variation is significantly reduced (see Fig. 9B). This is in agreement with the concept of conserved enzyme active sites that perform the same chemical reaction. The largest cross-species variations around the enzymes’ active sites are found for glucokinase/hexokinase (step 1), phosphofructokinase (step 3) and triosephosphate isomerase (step 5). Differences in the electrostatic potential around the active site of step 1 enzymes can be associated with differences in substrate binding affinities between isozymes from vertebrate hexokinases and lower eukaryotes discussed in ref. 15. The differences in electrostatic potentials for phosphofructokinases can be rationalized by the adaptation of enzymes from plants (rice and mouse-ear cress) to use ATP as a phosphate source (see above). For triosephosphate isomerases, differences in the electrostatic potential around the active site lead to differences in kcat/Km values for different species (see above).

The corresponding electrostatic potential differences averaged over the number of grid points in the intersecting region (see Fig. 9C) are mostly small between the chosen model organisms in this study. Significant differences can be seen for step no. 2 (glucose-6-phosphate isomerase, PGI), step no. 4 (fructose bisphosphate aldolase, FBA) and step no. 9 (enolase, ENO).

For PGI, the maximum differences in electrostatic potentials are found for enzymes from mouse-ear cress and fruit fly which display electrostatic potentials that are more negative by about 1 kcal mol−1 e−1 than all other species. This is in agreement with experimental characterizations of ligand binding affinities of PGIs. For mammalian enzymes from rabbit, mouse and human, Km values of 0.12 mM for fructose-6-phosphate are reported.50,51 For the only plant for which data are available, spinach, 3 times higher Km values were measured.52 Charles and Lee report a 2.5-fold higher Km value for PGI from fruit fly compared to the enzyme from mouse.53

For FBA (step 4), the large differences in electrostatic potential come from pairwise comparisons of the enzyme from mouse-ear cress (ALF_ARATH) with all other model organisms. The enzyme from mouse-ear cress shows an electrostatic potential consistently more negative by about 1 kcal mol−1 e−1 around the active site than all other species in this study. Detailed amino acid sequence and protein structure investigations revealed that ALF_ARATH possesses a negatively charged residue (aspartate) near the active site at position 162 whereas the aldolases from all the other species in this investigation show a neutral (asparagine) residue. It remains to be investigated whether this N/D substitution is a sequencing artefact or has physiological implications.

The large electrostatic potential differences for enolases (step 09) have to be put in relation to the magnitude of the electrostatic potential around the active site. Enolases, step 09, show the most negative electrostatic potentials of all glycolytic steps (see Fig. 9D). They catalyze the dehydration of 2-phospho-D-glycerate (2-PGA) to P-enolpyruvate. The large negative electrostatic potential is required for the initial abstraction of the C2 protons from 2-PGA which has a pKa > 30.54 The enzyme accomplishes this task by providing a general-base catalytic site of lysine, glutamate and aspartate residues around the substrate. Two additional Mg2+ cations assist substrate coordination and stabilization of the enolate intermediate. We performed an analysis of the electrostatic potential around the active site in the absence and the presence of the two metallic cofactors. The electrostatic potentials are consistently shifted upward by about 1.1 kcal mol−1 e−1 upon inclusion of the Mg2+ ions. The electrostatic potential differences between the species, however, remain nearly constant (data not shown).

Conservation of electrostatic potentials along the pathway

Schwartz and Kanehisa have shown that the metabolic glycolytic network in Saccharomyces cerevisiae from Teusink et al.55 is robust and operational over a large range of random variations of enzyme concentrations and kinetic parameters.56 Reactions have large control coefficients indicating an almost complete adaptation of the enzyme to changes in metabolite concentrations. For the glycolytic network in yeast, the largest control coefficients were observed for PFK (our step3), GAPDH (step 6) and PGM (step 8). These are among the enzymes in our study with the least variation of electrostatic potentials around the active site (see Fig. 9C). This is an indication of an almost perfect design of a catalytic pocket of an enzyme which thus hardly varies between different species.

In biochemical network theory, it is commonly believed that one or several enzymatic steps are ‘rate-limiting’, implying that they are more important than the other steps for the control of the network. The eukaryotic organisms in this study all follow the classical Embden–Meyerhof pathway and thus regulation and control of glycolysis should be conserved for all the organisms.18 Usually, the slowest chemical reaction in a network is rate-limiting.57 What is usually meant by ‘rate-limiting’ in metabolic pathways is that the rate of this reaction is not limited by substrate availability but only by the availability and activity of the enzyme. This reaction is therefore said to be ‘enzyme-limited’ and because its rate limits the rate of the whole reaction sequence, the step is also sometimes referred to as the ‘rate-limiting’ step in the pathway.58 In general, these ‘enzyme-limited’ reactions are very exergonic and therefore irreversible under physiological conditions. The three glycolytic reactions with the largest and most negative free energy values (glucokinase (step 1), phosphofructokinase (step 3) and pyruvate kinase (step 10)) are considered irreversible and thus to be major points of control. In early work, some of them are also classified as ‘rate-limiting’59,60 although the annotation as ‘enzyme-limiting’ seems more appropriate. From our study, none of these three enzymes exhibits a strictly conserved electrostatic potential and these enzymes thus do not appear to be responsible for glycolytic flux control. This may indicate the stability of fluxes to variations in the kinetic parameters for these enzymes.

Fell and Thomas have given theoretical evidence that, in fact, multi-site modulation through action of a number of enzymes is more effective for the physiological control of metabolism.61,62 This has also been shown in recent experiments on glycolysis in Trypanosoma brucei,63S. cerevisiae55 and Lactococcus lactis.64 The results in our investigation point in the same direction. We see high conservation of the electrostatic potential in each step of glycolysis, and this makes control by only one enzyme very unlikely. Rather, the glycolytic flux through the network may be determined by a complex interplay of several, if not all, enzymes.

The organization of glycolysis in plants vs. non-plants

The electrostatic potential differences (Fig. 8) are largest for enzymes from plants (O. sativa and A. thaliana), as compared to the other species. In particular, wherever phosphorylationdephosphorylation occurs, the plant enzymes can be clearly distinguished by their electrostatic potentials from the non-plant enzymes (see ESI part 3). In plants, several alternative enzymes to those in the classical glycolytic pathway in animals and human exist: glucose-specific glucokinases (GLK), PPi-dependent phosphofructokinases (PFP), class II aldolases (FBA) and cofactor-independent PGMs (iPGM).65 We have shown that plants possess additional ATP-dependent phosphofructokinases, class I aldolases and cofactor-dependent phosphoglycerate mutases (see above). Although plants use sucrose and starch as the principal substrates for glycolysis, in the cytosol they metabolize the intermediates via the classical glycolytic pathway. In mammalian liver, primary control of glycolytic flux to pyruvate is believed to be mediated by positive feedback of fructose-1,6-bisphosphate on pyruvate kinase (control from “top-down”). In plants, on the other hand, control is exerted from “bottom-up” by the negative feedback of various cell-specific effectors on pyruvate kinase and of phosphoenolpyruvate on the ATP-dependent PFK.66 In mammals, isozymes are often expressed in a tissue-specific manner with distinct kinetic or regulatory properties. In plants, relatively little is known about tissue- or developmental-specific isozymes . One can only speculate that the presence of two pathways with the use of ATP and inorganic pyrophosphate, respectively, as phosphate sources is an adaptation to different environmental and nutritional limitations.66 From our analysis, we observe that for the ATP-dependent phosphorylation reactions in plants, the electrostatic potential near the active site is different from the other species considered in this study and thus, probably, the reactions occur with different enzymatic kinetic parameters. The active sites of the pyrophosphate-dependent phosphorylation enzymes show even larger differences in electrostatic potentials (data not shown).

Experimental

Choice of proteins and species for this study and protein sequence alignment

The choice of organism was based on the requirements that for each of the 10 glycolytic enzymes there was at least one protein sequence and a suitable protein structural template available. The corresponding secondary structure was checked to be similar to that of the homologous human enzyme according to the HSSP database.67Proteinamino acid sequences were taken from the UniProtKB/Swiss-Prot database.68,69 The 11 model species were H. sapiens, Rattus norvegicus, Mus musculus, Pongo pygmaeus, Gallus gallus, Xenopus laevis, D. rerio (Brachydanio rerio), D. melanogaster, A. gambiae, A. thaliana and O.sativa subsp. japonica.

Alternative enzymes have been identified for at least three steps in the eukaryotic glycolytic pathway.

(i) The first step of glycolysis, the conversion of glucose to glucose-6-phosphate, can be performed by sugar-specific glucokinases found in prokaryotes and lower eukaryotes or by one of the non-specific isozymes of hexokinase found in vertebrates (named hexokinases I–IV; for an overview see ref. 70). Here, we put the focus on a comparison of the liver-specific isozyme hexokinase IV in mammals which is characterized by its low affinity and cooperativity towards glucose and is responsible for about 95% of the hexokinase activity in hepatocytes.

(ii) The fourth reaction may be catalyzed by two unrelated enzymes: class I and class II fructose bisphosphate aldolases (FBA). Class I FBA is only found in eukaryotes (and is studied here), whereas the manganese-dependent class II enzyme can be found in prokaryotes as well. The organisms only containing a metallic cofactor-dependent class II aldolase, i.e. yeast, were thus not included in this study.

(iii) The conversion of 3-phosphoglycerate to 2-phosphoglycerate (step 8) may be performed by either the 2,3-bisphosphoglycerate cofactor-dependent phosphoglycerate mutase (dPMG) in eukaryotes (studied here) or by the cofactor-independent (iPGMs) in prokaryotes and eukaryotes. The two enzymes display different structural and kinetic properties and sometimes coexist in the same organism, for a review see ref. 71.

A list of the chosen protein sequences and protein structural templates can be found in the ESI , Section 1.

S. cerevisiae (baker’s yeast) was not included since it only has a class II fructose-1,6-bisphosphate aldolase which requires a divalent manganese cofactor. The class I aldolases of animals and higher plants, which were considered in this study, use covalent catalysis through a Schiff-base intermediate. Although class I and II aldolases catalyze the same reaction, they do not share common active site residues, are structurally dissimilar and seem to have evolved independently, see for example ref. 72. The class I aldolases from Escherichia coli form a distinct structural subclass with enzymes from other bacteria73 and are believed to be involved only in gluconeogenesis whereas a class II aldolase from E. coli is active in glycolysis.74

For each of the ten glycolytic steps, one protein structural template was chosen from the Protein Databank (PDB)75 (see ESI ). For each enzyme, an initial alignment of the protein sequences of the eleven model organisms to the chosen PDB template was taken from the HSSP database.76 The HSSP alignment was converted into a FASTA formatted multiple sequence alignment. The protein sequences were checked for completeness and consistency, and corrected and updated when necessary. The multiple sequence alignment was visualized and manually corrected with Jalview.77 Additional multiple sequence alignments were performed using ClustalW78 and TCOFFEE.79

Protein modelling procedures

Protein homology modelling was performed using Modeller8v.1.80 The quality of the models was evaluated with WHATCHECK81 to discard bad models. Polar hydrogen atoms were added to the best ranked homology models with WHATIF.81 Details of the modelling procedure used can be found in ref. 16. In nearly all cases, only protein atoms were considered. For enolases, the two catalytically critical Mg2+ ions were included.

Computation of protein electrostatic potentials

The University of Houston Brownian Dynamics (UHBD) program82,83 was used for the calculation of the electrostatic potentials. For the solvent, a dielectric constant of 78.0 was chosen. The protein dielectric was set to 4.0. The dielectric boundary of the protein was defined by the molecular surface of a probe of 1.4 Å radius. The ionic strength was assigned as 50 mM at room temperature. The grid dimensions were set to 150 × 150 × 150 Å3 and a grid spacing of 1.0 Å was used. The Mg cations in enolases were treated as charges of 2+ each with radii of 1.7 Å.

Comparison of protein electrostatic potentials

Comparisons were done as described in ref. 16 and carried out using the PIPSA software,13,14 available from http://projects.villa-bosch.de/mcm/software/pipsa. The comparison of the electrostatic potentials was done in an intersection of regions, termed “skins”. We defined a skin of thickness 3 Å extending from the probe accessible surface around the protein computed with a probe of radius 2 Å.16 Comparisons were performed for complete skins and for regions defined by spheres centred on the enzyme active sites defined by the geometric centre of the co-crystallized ligand, when present in the PDB template structure, or, alternatively, the SwissProt annotation of ACT_SITE residues.

The similarities of the molecular interaction fields of two superimposed proteins, a and b, were calculated using the Hodgkin similarity index.84,85

ugraphic, filename = b912398a-t1.gif
The sum is over all grid points in the overlapping skins of the two proteins, and Φa and Φb are the values of the potentials at each point.

The average difference in electrostatic potentials between two proteins a and b is defined as the sum of differences of electrostatic potentials Φa and Φb at all points in the region of comparison, divided by the number of points in the region of comparison, R:

ugraphic, filename = b912398a-t2.gif

Conclusion

The three-dimensional arrangement of amino acid residues is critical for the catalysis of enzymatic reactions. The comparison of 3D molecular interaction fields for a large number of proteins reveals features relevant to protein function. An efficient and largely automated procedure for protein sequence retrieval, multiple sequence alignment, comparative modelling, calculation and comparison of molecular interaction fields has been employed here to allow large-scale comparison of proteins from the same family as constituents of a biochemical pathway. This PIPSA procedure has been implemented in the webPIPSA webserver86 and in the SYCAMORE systems biology workbench environment.87

Here, we have applied the PIPSA approach to discriminate between tissue-specific isoforms of enzymes with significantly different surface interaction properties, and used it for functional annotation. Misannotations were identified by clustering the enzymes according to their similarities in electrostatic potential. The conservation of the electrostatic potential near the active site provides a fingerprint for the conservation of enzymatic function and, for example, cofactor dependence. Comparison of electrostatic potentials was also used to estimate enzymatic kinetic parameters.

According to the observed conservation of electrostatic potentials along the glycolytic pathway for the enzymes of the eleven model organisms, the regulation of control of glycolysis in eukaryotes is unlikely to be due to only one or a few enzymes. The differences in electrostatic potentials are small for the organisms considered here and make glycolysis a very conserved biochemical energetic pathway. Nevertheless, glycolytic enzymes do not only perform their metabolic function. The electrostatic properties will also influence the multifaceted roles of glycolytic enzymes in transcriptional regulation, stimulation of cell motility and the regulation of apoptosis by protein binding. These additional functions have recently stimulated new interest in the investigation of these enzymes with areas of application ranging from anti-cancer therapies to the development of anti-trypanosomal drugs.88,89

The multi-species investigation of glycolytic enzymes described here can be extended to larger sets of organisms and to other metabolic or signalling pathways. As more protein structural and kinetic data become available, this approach will become increasingly valuable for relating the level of atomic-detail protein structures and the level of biochemical pathways.

Acknowledgements

We thank Dr Stefan Richter for implementing the matrix visualization of distance matrices. Financial support from the German Federal Ministry for Research (BMBF) ‘HepatoSys’ project (grant nos. 0313076 and 0313078C), the Klaus Tschira Foundation and the Heidelberg Center for Modelling and Simulation in the Biosciences (BIOMS) is gratefully acknowledged.

References

  1. H. Kitano, Science, 2002, 295, 1662–1664 CrossRef CAS.
  2. C. M. Henry, Chem. Eng. News, 2003, 81, 45–55.
  3. P. Aloy and R. B. Russell, FEBS Lett., 2005, 579, 1854–1858 CrossRef CAS.
  4. P. Aloy and R. B. Russell, Nat. Rev. Mol. Cell Biol., 2006, 7, 188–197 CrossRef CAS.
  5. C. Kettner and M. G. Hicks, Curr. Enzyme Inhib., 2005, 1, 171–181 Search PubMed.
  6. A. Warshel, Proc. Natl. Acad. Sci. U. S. A., 1978, 75, 5250–5254 CAS.
  7. T. Selzer and G. Schreiber, J. Mol. Biol., 1999, 287, 409–419 CrossRef CAS.
  8. H.-X. Zhou, K.-Y. Wong and M. Vijayakumar, Proc. Natl. Acad. Sci. U. S. A., 1997, 94, 12372–12377 CrossRef CAS.
  9. G. Schreiber, Curr. Opin. Struct. Biol., 2002, 12, 41–47 CrossRef CAS.
  10. Y. Shechter, L. Preciado-Patt, G. Schreiber and M. Fridkin, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 1212–1217 CrossRef CAS.
  11. D. R. Livesay, P. Jambeck, A. Rojnuckarin and S. Subramaniam, Biochemistry, 2003, 42, 3464–3473 CrossRef CAS.
  12. D. R. Livesay and D. La, Protein Sci., 2005, 14, 1158–1170 CrossRef CAS.
  13. N. Blomberg, R. R. Gabdoulline, M. Nilges and R. C. Wade, Proteins: Struct., Funct., Genet., 1999, 37, 379–387 CrossRef CAS.
  14. R. C. Wade, R. R. Gabdoulline and F. De Rienzo, Int. J. Quantum Chem., 2001, 83, 122–127 CrossRef CAS.
  15. M. Stein, R. R. Gabdoulline and R. C. Wade, in Experimental Standard Conditions of Enzyme Characterizations, ed. M. G. Hicks and C. Kettner, Logos Verlag, Berlin, 2007, pp. 237–253 Search PubMed.
  16. R. R. Gabdoulline, M. Stein and R. C. Wade, BMC Bioinf., 2007, 8, 373 CrossRef.
  17. E. V. Koonin and M. Y. Galperin, Sequence-evolution-function, Kluwer Academic Publisher, Dordrecht, 2002 Search PubMed.
  18. C. H. Verhees, S. W. Kengen, J. E. Tuininga, G. J. Schut, M. W. Adams, W. M. DeVos and J. van der Oost, Biochem. J., 2003, 375, 231–246 CrossRef CAS.
  19. B. Siebers and P. Schönheit, Curr. Opin. Microbiol., 2005, 8, 695–705 CrossRef CAS.
  20. T. Dandekar, S. Schuster, B. Snel, M. Huynen and P. Bork, Biochem. J., 1999, 343, 115–124 CrossRef CAS.
  21. B. Canback, S. G. E. Andersson and C. G. Kurland, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 6097–6102 CrossRef CAS.
  22. D. Steinke, S. Hoegg, H. Brinkmann and A. Meyer, BMC Biol., 2006, 4, 16 CrossRef.
  23. M. Heymans and A. K. Singh, Bioinformatics, 2003, 19(suppl. 1), i138–i146 CrossRef.
  24. F. De Rienzo, R. R. Gabdoulline, M. C. Menziani and R. C. Wade, Protein Sci., 2000, 9, 1439–1454 CrossRef CAS.
  25. P. J. Winn, T. L. Religa, J. D. Battey, A. Banerjee and R. C. Wade, Structure (London), 2004, 12, 1563–1574 Search PubMed.
  26. Z. Yang, Curr. Opin. Genet. Dev., 2002, 12, 688–694 CrossRef CAS.
  27. T. J. S. Merritt and J. M. Quattro, Genetics, 2001, 159, 689–697 CAS.
  28. C. J. Jeffery, Trends Biochem. Sci., 1999, 24, 8–11 CrossRef CAS.
  29. M. Amraei and I. R. Nabi, FEBS Lett., 2002, 525, 151–155 CrossRef CAS.
  30. E. Bapteste, D. Moreira and H. Philippe, Gene, 2003, 318, 185–191 CrossRef CAS.
  31. C. Winkler, B. Delvos, W. Martin and K. Henze, FEBS J., 2007, 274, 429–438 CrossRef CAS.
  32. A. Mustroph, U. Sonnewald and S. Biemelt, FEBS Lett., 2007, 581, 2401–2410 CrossRef CAS.
  33. A. S. Chi and R. G. Kemp, J. Biol. Chem., 2000, 275, 35677–35679 CrossRef CAS.
  34. S. A. Moore, R. S. Ronimus, R. S. Roberson and H. W. Morgan, Structure (Cambridge, MA, U. S.), 2002, 10, 659–671 Search PubMed.
  35. M. Müller, J. A. Lee, P. Gordon, T. Gaasterland and C. W. Sensen, J. Bacteriol., 2001, 183, 6714–6716 CrossRef CAS.
  36. S. Vora, C. Seaman, S. Durham and S. Piomelli, Proc. Natl. Acad. Sci. U. S. A., 1980, 77, 62–66 CrossRef CAS.
  37. A. M. C. R. Alves, G. J. W. Euverink, H. Santos and L. Dijkhuizen, J. Bacteriol., 2001, 183, 7231–7240 CrossRef CAS.
  38. M. J. Jedrzejas, Prog. Biophys. Mol. Biol., 2000, 73, 263–287 CrossRef CAS.
  39. J. Carreras, J. Mezquita, J. Bosch, R. Bartrons and G. Pons, Comp. Biochem. Physiol., Part B: Biochem. Mol. Biol., 1982, 71, 591–597 CrossRef CAS.
  40. R. C. Wade, R. R. Gabdoulline and B. A. Luty, Proteins: Struct., Funct., Bioinform., 1998, 31, 406–416 Search PubMed.
  41. V. Mainfroid, P. Terpstra, M. Beauregard, J.-M. Frere, S. C. Mande, W. G. J. Hol, J. A. Martial and K. Goraj, J. Mol. Biol., 1996, 257, 441–456 CrossRef CAS.
  42. D. Straus, R. Raines, E. Kawashima, J. R. Knowles and W. Gilber, Proc. Natl. Acad. Sci. U. S. A., 1985, 82, 2272–2276 CrossRef CAS.
  43. J. J. Aragon, J. E. Feliu, R. A. Frenkel and A. Sols, Proc. Natl. Acad. Sci. U. S. A., 1980, 77, 6324–6328 CrossRef CAS.
  44. D. G. Joseph, A. Petsko and M. Karplus, Science, 1990, 249, 1425–1428 CAS.
  45. E. Hasson, I.-N. Wang, L.-W. Zeng, M. Kreitman and W. F. Eanes, Mol. Biol. Evol., 1998, 15, 756–769 CAS.
  46. W. F. Eanes, Annu. Rev. Ecol. Syst., 1999, 30, 301–326 Search PubMed.
  47. I. Kursula, M. Salin, J. Sun, B. V. Norledge, A. M. Haapalainen, N. S. Sampson and R. K. Wieringa, Protein Eng., Des. Sel., 2004, 17, 375–382 CrossRef CAS.
  48. J. C. Clemente, K. Satou and G. Valiente, Genome Inf., 2005, 16, 45–55 Search PubMed.
  49. E. M. Zdobnov, C. von Mehring, I. Letunic and P. Bork, FEBS Lett., 2005, 579, 3355–3361 CrossRef CAS.
  50. K. H. Röhm and F. Schneider, FEBS Lett., 1973, 33, 89–92 CrossRef CAS.
  51. E. A. Noltmann, in The Enzymes, ed. P. D. Boyer, Academic Press, New York, 3rd edn, 1972, vol. 6, pp. 271–354 Search PubMed.
  52. C. Schnarrenberger and A. Oeser, Eur. J. Biochem., 1974, 45, 77–82 CrossRef CAS.
  53. D. Charles and C.-Y. Lee, Mol. Cell. Biochem., 1980, 29, 11–21 CAS.
  54. G. H. Reed, R. P. Poyner, T. M. Larsen, J. E. Wedekind and I. Rayment, Curr. Opin. Struct. Biol., 1996, 6, 736–743 CrossRef CAS.
  55. B. Teusink, J. Passarge, C. A. Reijenga, E. Esgalhado, C. C. van der Weijden, M. Schepper, M. C. Walsh, B. M. Bakker, K. van Damm, H. V. Westerhoff and J. L. Snoep, Eur. J. Biochem., 2000, 267, 5313–5329 CrossRef CAS.
  56. J.-M. Schwartz and M. Kanehisa, BMC Bioinf., 2006, 7, 186 CrossRef.
  57. N. V. Torres, F. Mateo, E. Melendez-Hevia and H. Kacser, Biochem. J., 1986, 234, 169–174 CAS.
  58. D. L. Nelson and M. M. Cox, Lehninger principles of biochemistry, W. H. Freeman, New York, 2004 Search PubMed.
  59. D. Granner and S. Pilkis, J. Biol. Chem., 1990, 265, 10173–10176 CAS.
  60. S. J. Pilkis and D. K. Granner, Annu. Rev. Physiol., 1992, 54, 885–909 CrossRef CAS.
  61. D. A. Fell and S. Thomas, Biochem. J., 1995, 311, 35–39 CAS.
  62. D. Fell, Understanding the control of metabolism, Portland Press Ltd., London, 1997 Search PubMed.
  63. B. M. Bakker, P. A. M. Michels, F. R. Opperdoes and H. V. Westerhoff, J. Biol. Chem., 1999, 274, 14551–14559 CrossRef CAS.
  64. B. J. Koebmann, H. W. Andersen, C. Solem and P. R. Jensen, Antonie van Leeuwenhoek, 2002, 82, 237–248 CrossRef CAS.
  65. N. A. Liapounova, V. Hampl, P. M. K. Gordon, C. W. Sensen, L. Gedamu and J. B. Dacks, Eukaryotic Cell, 2006, 5, 2138–2146 CrossRef CAS.
  66. W. C. Plaxton, Annu. Rev. Plant Physiol. Plant Mol. Biol., 1996, 47, 185–214 CrossRef CAS.
  67. C. Dodge, R. Schneider and C. Sander, Nucleic Acids Res., 1998, 26, 313–315 CrossRef CAS.
  68. A. Bairoch, B. Boeckmann, S. Ferro and E. Gasteiger, Briefings Bioinf., 2004, 5, 39–55 Search PubMed.
  69. A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, M. J. Martin, D. A. Natale, C. O’Donovan, N. Redaschi and L. S. Yeh, Nucleic Acids Res., 2005, 33, D154–159 CAS.
  70. M. L. Cardenas, in Glucokinase and Glycemic Diseases, ed. F. M. Matschinsky and M. A. Magnuson, Karger, Basel, 2003, pp. 31–41 Search PubMed.
  71. L. A. Fothergill-Gilmore and H. C. Watson, Adv. Enzymol. Relat. Areas Mol. Biol., 1989, 62, 227–313 Search PubMed.
  72. N. S. Blom, S. Tetreault, R. Coulombe and J. Sygusch, Nat. Struct. Biol., 1996, 3, 856–862 CrossRef CAS.
  73. S. A. Baldwin and R. N. Perham, Biochem. J., 1978, 169, 643–652 CAS.
  74. M. D. Scamuffa and R. M. Caprioli, Biochim. Biophys. Acta, 1980, 614, 583–590 CAS.
  75. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne, Nucleic Acids Res., 2000, 28, 235–242 CrossRef CAS.
  76. C. Sander and R. Schneider, Proteins: Struct., Funct., Genet., 1991, 9, 56–68 CrossRef CAS.
  77. M. Clamp, J. Cuff, S. M. Searle and G. J. Barton, Bioinformatics, 2004, 20, 426–427 CrossRef CAS.
  78. J. D. Thompson, D. G. Higgins and T. J. Gibson, Nucleic Acids Res., 1994, 22, 4673–4680 CrossRef CAS.
  79. C. Notredame, D. Higgins and J. Heringa, J. Mol. Biol., 2000, 302, 205–217 CrossRef CAS.
  80. A. Sali and T. L. Blundell, J. Mol. Biol., 1993, 234, 779–815 CrossRef CAS.
  81. G. Vriend, J. Mol. Graphics, 1990, 8, 52–56 CrossRef CAS.
  82. M. E. Davis, J. D. Madura, B. A. Luty and J. A. McCammon, Comput. Phys. Commun., 1991, 62, 187–197 CrossRef CAS.
  83. J. D. Madura, J. M. Briggs, R. C. Wade, M. E. Davis, B. A. Luty, A. Ilin, J. Antosiewicz, M. K. Gilson, B. Bagheri, L. R. Scott and J. A. McCammon, Comput. Phys. Commun., 1995, 91, 57–95 CrossRef CAS.
  84. E. E. Hodgkin and W. G. Richards, Int. J. Quantum Chem., 1987, 32(s14), 105–110 CrossRef.
  85. A. C. Good, E. E. Hodgkin and W. G. Richards, J. Comput.-Aided Mol. Des., 1992, 6, 513–520 CAS.
  86. S. Richter, A. Wenzel, M. Stein, R. R. Gabdoulline and R. C. Wade, Nucleic Acids Res., 2008, 36, W276–W280 CrossRef CAS.
  87. A. Weidemann, S. Richter, M. Stein, S. Sahle, R. Gauges, R. Gabdoulline, I. Surovtsova, N. Semmelrock, B. Besson, I. Rojas, R. Wade and U. Kummer, Bioinformatics, 2008, 24, 1463–1464 CrossRef CAS.
  88. J.-W. Kim and C. V. Dang, Trends Biochem. Sci., 2005, 30, 142–150 CrossRef CAS.
  89. C. Verlinde, Drug Resist. Updates, 2001, 4, 50–65 CrossRef CAS.

Footnotes

Electronic supplementary information (ESI) available: Choice of protein sequences and structural templates; evaluation of the choice of PDB template on the similarity of electrostatic potentials; visualisation of distance matrices of electrostatic potentials; discussion of means to compare electrostatic potentials. See DOI: 10.1039/b912398a
Current address: BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbüttel, Germany.

This journal is © The Royal Society of Chemistry 2009
Click here to see how this site uses Cookies. View our privacy policy here.