Andrew D.
White
,
Ann K.
Nowinski
,
Wenjun
Huang
,
Andrew J.
Keefe
,
Fang
Sun
and
Shaoyi
Jiang
*
University of Washington, Box 351750, Seattle, USA. E-mail: sjiang@u.washington.edu; Fax: +1 206 543 3778; Tel: +1 206 616 6509
First published on 10th September 2012
The interactions which govern chemical processes may be broadly categorized into specific interactions, high activity for a certain target molecule, and nonspecific interactions, low activity for all targets. Despite their ubiquity in biology and chemistry, nonspecific interactions are generally overlooked and a fundamental understanding of nonspecific interactions is lacking. Molecular chaperones are large protein complexes which have evolved to resist nonspecific interactions. Their interior surface resists binding to thousands of types of misfolded proteins. Proteins found in the cytoplasm, a crowded environment with many spurious binding targets, are another example. These proteins have evolved high selectivity and stability despite nonspecific interactions. Using structural bioinformatics, we have studied the interiors of molecular chaperones from five species and examined the surface chemistry of 1162 proteins, categorized by if they are present in the cytoplasm or extracellular space. A better understanding of how nature resists nonspecific interactions is key for the chemistry of materials, surfaces, and particles which must remain stable in complex environments. The abundance of amino acids, their interactions, their hydration, and sequence patterns were compared in these two systems, molecular chaperones and proteins surfaces. Striking similarities were found and trends were identified as the system environments became harsher. Peptide based mimics were synthesized to test the conclusions. This, in turn, has led to the design of new stealth compounds and a deeper understanding of nonspecific interactions.
In this work, we turn to nature and use proteins and molecular chaperones as a guide towards understanding nonspecific interactions. Proteins resist nonspecific adsorption in order to be stable in complex environments such as the cytoplasm of a cell, which contains thousands of protein types.8 The cytoplasm is a crowded environment and provides many spurious binding targets for proteins; yet proteins have evolved high selectivity and stability through resisting nonspecific interactions. The non-interacting property of proteins is often put into practice in biomaterials and biosensors research where the protein albumin is used as a blocking agent to block nonspecific adsorption of non-target proteins onto surfaces.9 Another example of nature resisting nonspecific interactions is found in molecular chaperones, which guide proteins from a misfolded or unfolded conformation back into a native conformation.10 The defective (substrate) proteins fold while enclosed inside a cavity of the molecular chaperone. The chemistry of this cavity is unique among biological surfaces in that it contacts not only thousands of proteins, but many conformations of each protein.11 Yet chaperone proteins do not irreversibly bind with proteins. The cavity is sometimes described as a “non-stick” surface.12 Thus molecular chaperones provide a second system which has this non-interacting property.
By examining many proteins from both these systems, it is possible to separate nonspecific effects from the many specific functions of proteins. We use two types of bioinformatics methods for this. The first studies the sequence and abundance of amino acids in the proteins, similar to the molecular formula of a molecule. The second set examines the structure and interactions among the amino acids in a protein and solvent, similar to the 3D structure of a molecule. Through these two methods and two systems, it is possible understand the way these proteins avoid nonspecific interactions. We analyzed a database of protein surfaces and molecular chaperone cavity surfaces using these two techniques. The questions to be answered are which amino acids are most common, how often do they interact, do they interact with water more than other amino acids, and do they prefer to interact with protein surfaces or interiors? Next, the modeling conclusions were used to design peptide based materials which should resist nonspecific interactions. Finally, these peptides were synthesized to test their resistance to nonspecific interactions with proteins. These peptides do create surfaces which resist nonspecific interactions and compare well to other low-fouling peptides which have been reported in the literature.13–16
Statistical analysis was performed using the R statistical package.23 SQLShare was used for managing data.24 The PDBs and data used for each dataset are available from: http://sqlshare.escience.washington.edu. The human, cytoplasm, and extracellular datasets are available as SQL data tables under the ‘h2’, ‘cph2’, and ‘eh2’ tags, respectively. X_1, X_2, and X_3 (where X is the dataset) contain the protein information, residue information, and atomic information, respectively. X_c contains the surface contacts (only available for ‘h2’).
The identification of interior residues in molecular chaperones consists of three steps: (1) identify surface residues, (2) tabulate heavy atoms from the surface residues which are occluded, (3) identify which residues have more than h heavy atoms occluded. Once a residue is identified as a surface residue, it may be either an interior or an exterior surface residue. To be an interior surface residue, h heavy atoms (non-hydrogen atoms), or more, in the residue must not be occluded by atoms from other residues. See the ESI† for the calculation of occlusion and Table S1† for h and other parameters used in the interior identification.
(1) |
(2) |
(3) |
The peptides were synthesized using the AAPPTec Titan 357 automated synthesizer by a solid-phase technique, starting from a polystyrene Rink amide AM resin (0.58 mmol g−1 loading capacity). Coupling was performed using amino acid monomer, HBTU, HOBt, and DIPEA prepared in DMF in a molar ratio of 1.1:1:1:2 in four times excess of the loading capacity of the resin. Deprotection of Fmoc groups was achieved using 20% piperidine in DMF. N-terminal acetylation was achieved with a solution of pyridine (5%), acetic anhydride (5%) and DMF (90%) (v/v/v). Random peptide sequences were created using the mix and split capability of the AAPPTec Titan 357. The cleavage of the final product was performed with a TFA (75%), DCM (15%), DMB (4%), water (2%), TIS (2%), and EDT (2%) (v/v/v/v) cleavage cocktail. The peptide purity was evaluated by preparative reverse phase high pressure liquid chromatography (RP-HPLC) for known sequences and purified as needed. The purity of the glycine peptide sequences was 92% and the asparagine peptide was 97%. Peptides were analyzed by matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) (see Fig. S5†).
A laboratory SPR sensor developed at the Institute of Photonics and Electronics, Prague, Czech Republic was used26 as described previously27 to evaluate protein adsorption. Gold chips covered with self-assembling peptides were rinsed with Millipore water, dried by filtered air, and mounted to the device. The temperature controller was set to 25 ± 0.01 °C. Protein adsorption was measured by flowing PBS buffer at 40 μL min−1 for 10 min, 1 mg mL−1 protein solutions of fibrinogen and lysozyme for 10 min, and PBS buffer again for 10 min. The wavelength shift between baselines before protein injection and after buffer rinse was used to quantify the total amount of protein adsorbed. A reference channel containing solely PBS buffer was flown for each chip and its baseline drift was subtracted from the final wavelength change. A 1 nm wavelength shift from 750 nm corresponds to 17 ng cm−2 adsorbed proteins.28 The detection limit for the SPR sensor is 0.2 ng cm−2.27 For statistics reported in this edge article, each chip corresponds to one data point for calculating standard deviations.
Fig. 1 The fraction of amino acids on the surfaces of proteins found in three different locations: human proteins (N = 1162), human cytoplasmic proteins (N = 221), and human extracellular (N = 34) proteins. The y-axis is the median of the fractions of each amino acid over the entire dataset. The figure shows the large fraction of charged residues on protein surfaces, in particular E and K. The error bars are standard errors. |
The protein dataset is further broken into the cytoplasm and extracellular. The extracellular environment, the space between cells, generally is not as crowded8 as the cytoplasm and we expect that nonspecific binding is not as interfering compared to the cytoplasm. As seen from our results, the largest difference between cytoplasmic and extracellular proteins is the larger fraction of charged amino acids: 43.8% ± 0.3% for the cytoplasm and 37.5% ± 0.7% for the extracellular. The extracellular dataset is lower in charged amino acids, but higher in polar hydrophilic amino acids (27.3% ± 0.2% vs. 23.7 ± 0.6%), among which specifically serine (S) and threonine (T) are more abundant than those in the cytoplasm. Polar hydrophilic amino acids are S, T, asparagine (N), and glutamine (Q). These results generally follow the trends seen by Andrade et al.,30 who examined a similar though smaller dataset. In the cytoplasm, a crowded environment prone to protein aggregation,8 the proteins have more E and K. Thus, the K and E play an important role in these nonspecific interactions. Evidence that E and K are located in nonspecific regions of protein surfaces (unrelated to function) can be found in work from Jimenez as well, who analyzed protein surfaces broken into regions related to protein function and regions unrelated to function.31 He showed an increase of 36% of charged amino acids in regions unrelated to function relative to regions related to function. See Fig. S1–S3† for a more detailed analysis of the error and sensitivity of these results.
A similar abundance analysis on the interiors of molecular chaperones’ abundance is shown in Fig. 2. The analysis here though is for the interior of a single chaperone protein complex. All the chaperones are in the closed (cis) conformation, during which encapsulated substrate proteins are folding. Again E and K are the most common amino acids in the interior cavities of this collection of chaperone proteins. The fraction of E and K is much larger than for the previous dataset and the fraction of E and K is over 20% on these large proteins. The fraction of charged amino acids, in general, is much higher as well for the molecular chaperones. A dataset containing 528 Escherichia coli (E. coli) protein surfaces was constructed to ensure the results seen on GroEL–GroES are not simply general to E. coli proteins. Also, it is possible to calculate if the fraction of charged amino acids is significant on the chaperone interiors. The fraction of charged amino acids on the surface of E. coli GroEL–GroES is 56%, which is a higher fraction than 98% of all of the E. coli proteins considered (shown in blue in Fig. 2). This large fraction of charged amino acids has been noted before.10,11 However, here we can see exactly which amino acids are more common than expected (E) and how significant it is. Fig. 2 also shows how the interior cavity surface changes between the mesophilic GroEL–GroES and thermophilic GroEL–GroES (optimal growth temperature of 65 °C). Protein folding is typically more difficult at higher temperatures due to the increasing importance of entropy as temperature increases, and thus the thermophilic GroEL–GroES represents a more challenged chaperone. The fraction of charged amino acids is increased to 70% for the thermophilic GroEL–GroES, with most of the increase coming from E and K. Molecular chaperones, where nature requires strong stealth against many protein types, appear to use charged amino acids to accomplish this. E and K are the most utilized charged amino acids in both systems considered.
Fig. 2 (A) shows three views of the location of interior residues (red) for E. coli GroEL. (B) The fraction of each amino acid on the interiors of 5 molecular chaperones. The median fraction of each amino acid type on the surface of a collection of 528 E. coli proteins is shown in blue for comparison to GroEL–GroES, which is also an E. coli protein. GroEL–GroES is significantly different. All the chaperone structures used for these calculations were the cis or “closed” forms. The error bars come from a 95% confidence interval from quantiling. The fractions of E and K are the most different relative to E. coli proteins. The thermophilic GroEL–GroES mutant has a very high fraction of charged residues, 70%. |
The large fraction of K and E suggests that they may be part of a general sequence pattern on protein surfaces; this was found to not be the case. The data considered here are the amino acid pair frequencies in sequence space. The most frequently occurring pairs for the human protein dataset are shown in Fig. 3a. The plot of the pair frequency is in red and the blue shows what the predicted distribution would be if the pairs followed random chance, with a multinomial distribution as a background model. The multinomial model is the number of pairs that would occur by chance if we knew the amino acid surface fractions. Interestingly, few pairs occur next to each other more or less often than the multinomial model suggested within the uncertainty. Results show that on the surface at least, there are no global sequence patterns. There is one exception to this trend in Fig. 3a: the glycine/serine pair. As shown in Fig. S4,† that pair is most commonly found as a type-II turn, resulting in its increased frequency over the multinomial model. This demonstrates that the methodology can discover surface motifs; in this case solvated type-II turns. The large fractions of K and E, however, are not correlated with a frequently occurring sequence motif and are instead nearly randomly distributed.
Fig. 3 These plots show statistics based on protein sequences (a) and structures (b–d). (a) shows the observed number of pairs of amino acids on the surfaces of proteins. The blue bars are the expected numbers of pairs if the sequence were random. The order of left to right is from most frequently observed to least. Only the G–S pair is considered to be significantly more common than expected. (b) shows the proportion of amino acids interacting from the human dataset. An amino acid is considered as ‘interacting’ if it is in contact with any other amino acid. We see small chains (S, P, A), E and K have the lowest proportions of interactions. (c) shows the preference of each amino acid for protein surfaces or protein interiors. See text for details. D, E, and N have the highest preference for protein surfaces relative to protein interiors. (d) shows the preference of amino acids for water relative to interacting with another amino acids. E and K have the highest water per contact. The green and purple colors are to guide the eyes. |
Fig. 3b shows how often an amino acid is interacting with any other amino acid, which characterizes their nonspecific interactions. These data are normalized by the number of the amino acids, so that each bar is comparable. These data are only for amino acids observed on the surface. S, K and E have the lowest proportion interacting among the charged and polar hydrophilic amino acids. Alanine and proline are lower due in part to the small size of their side-chains. K and E have large side-chains but still rarely interact. R has the highest proportion of interactions among the hydrophilic amino acids, perhaps explaining why it is so much less often observed on protein surfaces.
In addition to the amount of interactions described above, it is important to discover with what amino acids interact. For nonspecific interactions, reversible interactions are preferred to irreversible. For example, aggregation is often an irreversible process which we expect proteins to disfavor. Therefore we plot the strength of interactions of each amino acid with surface and buried amino acids on an average protein. Generally, more favorable interactions with the buried residues of a protein destabilize the protein, possibly leading to unfolding and aggregation. The average protein is a hypothetical protein which has a surface and buried residue distribution given by Fig. 1. The strength of interactions was calculated using quasi-chemical theory,32 which is described in the ESI.† The results are plotted in Fig. 3c where the red bars indicate strength of interaction between an amino acid and the average protein's surface. The blue bars show the strength of interaction between an amino acid and the average interior of proteins. Amide, charged, and alcohol amino acids have the highest preference for protein surfaces. Combined with our previous results, we see that E and K have fewer interactions (Fig. 3b) and also favor interactions with protein surface amino acids, not interior amino acids (Fig. 3c). Nature seems to prefer these amino acids as well in locations where resisting nonspecific interactions is necessary for function (i.e., the cytoplasm and molecular chaperones).
It is well established that hydration is the key to resisting nonspecific interactions.33,34 Thus, we also analyzed crystallographic proximity of waters to amino acids. The numbers of water per contact are shown in Fig. 3d. The choice of using the number of contacts as the normalization was done to eliminate the size effects of amino acids. Those amino acids with more atoms tend have more waters near them. Therefore, Fig. 3d should be thought of as the preference of an amino acid to interact with water relative to interacting with another amino acid. Again, we see E and K resisting other amino acids and preferring water. The amides and alcohols follow the trend as well. Overall, Fig. 3 shows E and K are randomly distributed and rarely interact. If E and K do interact, they prefer water to amino acids. If E and K do interact with amino acids, they prefer to interact with those found on the surface of proteins instead of the interior. Regardless of the abundances of E and K found, these structural results strongly indicate E and K resist nonspecific interactions.
Experimental studies of protein adsorption on a peptide surface containing random K and E motifs were performed. The chosen sequence is Ac-[EK]7PPPPC-Am, where the square brackets indicate seven random E and K. Ac and Am indicate acetylation and amidation, respectively. The proline repeat and cysteine provide a stable anchor for self-assembling peptides on gold surfaces.16 The peptides were self-assembled onto a gold surface and adsorbed fibrinogen and lysozyme measured via SPR. The results are shown in Fig. 4. Untreated gold and poly-glycine were used as controls. The EK surface performance is comparable to the ultra-low protein fouling standard of <5 ng cm−2 of fibrinogen.35 Additionally a AC-[N]7PPPPC-Am peptide was synthesized because N had the best modeling results among the uncharged amino acids. It also had results below the ultra-low fouling threshold. See Fig. S5† for all SPR sensograms and Fig. S6† for peptide characterization.
Fig. 4 Protein adsorption results as determined by SPR. Ac-[EK]7PPPPC-Am (EK), Ac-GGGGGGGPPPPC-Am (G), and Ac-CPPPPNNNNNNN-Am (N) sequences were self-assembled onto gold and protein solution was flowed over. Bound protein after buffer wash is shown in the bars. Untreated gold and poly-glycine are shown for reference. EK and N show similar stealth performance. |
Footnote |
† Electronic supplementary information (ESI) available: Additional information on the sensitivity to the surface cutoff, details on the method of identifying interior surface residues, the glycine–serine sequence pair, the definition of interaction energy, a description of the calculation of error bars for each figure, and SPR sensograms. See DOI: 10.1039/c2sc21135a |
This journal is © The Royal Society of Chemistry 2012 |