Yue
Hu‡
a,
Federica
Rigoldi‡
a,
Hui
Sun
a,
Alfonso
Gautieri
*b and
Benedetto
Marelli
*a
aCivil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, USA. E-mail: alfonso.gautieri@polimi.it; bmarelli@mit.edu
bBiomolecular Engineering Lab, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy
First published on 31st July 2023
We used coarse-grain molecular dynamics simulations to screen all possible histidine-bearing tetrapeptide sequences, finding novel peptide sequences with pH-tunable assembly properties. These tetrapeptides could be used for various biological applications, such as triggered delivery of bioactive molecules.
As an alternative route, in silico tools can be used to predict oligopeptides’ properties, accelerating their design into biomaterials. Machine learning (ML) algorithms such as TorchMD, convolutional neural networks, and deep neural networks are also used to quickly model and predict peptides’ folding, energies, and reaction pathways, but the limited accuracy and completeness of high-quality training data still limit the resolution of predictive results of ML when compared to molecular dynamics (MD) simulations, which in turn are extremely intensive.12–17 To combine the benefits and limit the individual weaknesses of MD and ML, hybrid ML-MD approaches are now pursued, with the goal of accelerating simulations and improving the understanding of complex biomolecular systems (e.g. flexible molecular force fields, where ML tools are used to accelerate the simulation process).18–20 However, these tools are still at their infancy and their applications have to be fully explored.
Here, we developed a new computational design protocol to discover oligopeptides that self-assemble in nanostructured biomaterials by combining an unbiased AA selection (i.e. agnostic to chemical features) with coarse grain (CG) MARTINI forcefield (highly parameterized for natural amino acids), which yields a speed-up of 2–3 orders of magnitude compared to atomistic forcefield. We focused on tetrapeptides as n = 4 represents a wide but approachable sequences space (204 possible unique sequences) to screen and test computational unbiased methods, while possessing an amphiphilic form and the proven ability to self-assemble into nanofibrillar structures.21 This method allowed us to simulate the assembly of all the possible tetrapeptides without bias, resulting in the screening of appropriate side chain combinations to embed responsiveness to environmental stimuli, such as pH. As an example, the imidazole group of histidine has a pKa of 6.0, which allows controlling the AA protonated or deprotonated states across physiological pH. pH-induced control of imidazole's electrostatic interactions can be coupled with hydrophobic dissipative forces of aromatic amino acids, to provide pH control over tetrapeptide self-assembly behavior. This pH-triggered assembly of tetrapeptides can be used to engineer new biomaterials for drug delivery that assemble/disassemble in response to pH changes, nanofibrillar matrices for separation of large biomolecules like proteins, antimicrobial surfaces, and scaffolds to support cell growth. The protocol starts by generating the CG models of tetrapeptides that result from the permutation of 16 natural amino acids in the four positions, excluding C, M, W, and P. These four AA are excluded as they rarely appear in known naturally occurring oligopeptides that self-assemble. To provide pH-responsiveness for the transition between soluble to assembled peptides, we impose that each sequence must present at least one histidine residue to exploit the change in protonation state close to physiological pH. In total, ∼12000 tetrapeptides with different sequences are screened. Each peptide is modelled twice, either with protonated or deprotonated histidine residue(s), leading to a total of ∼24
000 different systems. For each peptide sequence, we model the aggregation propensity following a protocol described in previous works.22,23 Briefly, for each peptide sequence, we generate a periodic cubic box of 10 nm by side containing a random distribution of 60 peptides. The box is then solvated with MARTINI water beads, and counterions (Na+ and Cl−) are inserted to neutralize the system (Fig. 1). The final model consists of ∼7000 beads. For each peptide sequence, we generate three replicas of the box, leading to ∼72
000 different models. The molecular systems are simulated with GROMACS 2021.4 package following protocols used in previous studies.24–26 Analyses of the simulations are performed using GROMACS tools and in-house tcl-scripts in VMD.27,28
After the MD simulations (refer to ESI†), we ranked the peptides based on their aggregation potential, using a score similar to the one proposed by Frederix et al. with some modifications (Fig. 2).23 The scoring system aims to identify peptides that are prone to aggregate in large fibril-like clusters. The peptide score (total score – S) is obtained from:
S = w1 × C + w2 × AP + w3 × SF + w4 × H | (1) |
![]() | ||
Fig. 3 (A) Total score (S) used to select peptides candidates. FFHH and FHYH were chosen from the top of the ranking, negative controls were picked from the middle rank (0.45 < S < 0.55, red cross). (B) Chemical structure of three tetrapeptides considered. (C) Initial (t = 0 ns) and final (t = 50 ns) different conformations for fibrillar-like peptides (upper panel) and beads-like (Video S2, ESI†) tetrapeptides. All the simulations start from a random distribution of the 60 homologues tetrapeptides in the water box, after 50 ns the promising sequences (e.g. HHFF, Video S1, ESI†) assume a 1D fibrillar conformation, while the negative controls (e.g. DHHR, Video S2, ESI†) are homogeneously distributed in the water box or have spherical conformation. |
Each term of the scoring function is normalized from 0 to 1 over its distribution using min–max normalization formula; the overall score S ranges from 0 to 1, where a score near 1 indicates a sequence that is likely to form large fibril-like aggregate and presents a suitable solubility. The C term represents the size of the largest peptide cluster observed at the end of the simulation (in terms of the number of beads). The size of the cluster is obtained through the clustsize tool of Gromacs. The larger the clusters formed the higher the propensity of the peptides to aggregate. The AP term is a measure of the aggregation propensity:
![]() | (2) |
![]() | (3) |
![]() | (4) |
A high S-score with either the protonated or deprotonated form (indicating the propensity to form fibril-like aggregates, Fig. 3(C), Fig. S1 and Supporting Data File 1, ESI†) and a concurrent large value for the difference between the S-score in the protonated and deprotonated form (ΔS = |Sdeprotonated − Sprotonated|) were used as criteria to select the peptides to test experimentally. The latter condition favors peptides that upon the change of the protonation state can transition from assembled to disassembled states.
The computational screening provides the basis for a selection of the most promising peptides in term of self-assembling propensity (Fig. 3(B)). We first selected the peptide with the highest overall score at neutral pH (i.e., with deprotonated histidine residues). The best sequence is HHFF with an overall score of 0.846 (where C = 1.000, AP = 0.830, SF = 0.965, H = 0.503). This peptide has the largest cluster among all peptide sequences, a high aggregation propensity score, a shape factor close to 1 (indicating the propensity to form fibril-like aggregates) and is mildly hydrophobic. Generally, HHFF is expected to self-assemble in neutral-basic pH forming fibril-like aggregates.
A second peptide was chosen based on the highest difference between the score of the deprotonated form and the score of the protonated form. In this way, we aimed to identify a peptide that can self-assemble at basic pH, with deprotonated histidine residues and, at the same time, that is able to disaggregate at acidic pH, when histidine residues are protonated. The candidate peptide sequence for this class is FHYH, which showed a ΔS = 0.476, due to a Sdeprotonated = 0.826 (indicating a high tendency to self-assemble) and a Sprotonated = 0.350 (indicating a low tendency to self-assemble).
Finally, we chose sequences from the center of our ranking as negative controls, avoiding the bottom range of our ranking due to too severe and well-known hydrophilic behavior. All selected sequences, in the deprotonated form, present the highest score as defined in the work of Frederix et al.23 Their score only considers the aggregation propensity (AP) and the solubility of the sequence (H). Based on this scoring system, the selected candidate peptide sequences are DHHR, and VDKH. To validate our modeling, tetrapeptide DHHR, VDHK, FHYH, and HHFF with N-terminal acetylation and C-terminal amidation (purity >99%) were synthesized and tested in both protonated and deprotonated forms in phosphate buffer solutions. We found that DHHR and VDHK don’t show any gelation or precipitation at pH 5–9 (Fig. 4(A) and Fig. S3A, ESI†). DHHR generated micro-flakes (Fig. 4(D)) while VDHK generated a smooth film-like structure during the slow water evaporation process (Fig. S3B, ESI†). At an acidic pH of 5, FHYH and HHFF don’t assemble (Fig. 4(A)) and keep dispersing in solution with fiber-like structures when dried (Fig. 4(E) and (F), respectively). FHYH and HHFF, however, assemble at pH 7 and 9 (Fig. 4(B) and (C)), which corroborates our computational results. FHYH assembly generates transparent gel at a neutral or basic pH, which adheres to the bottom of the vial when turned upside down, whereas HHFF assembly results in the formation of nanofibrillar white precipitates in the vials that do not support the formation of a hydrogel. (Fig. 4(A)). ATR-FTIR was used to determine the secondary structure of assembled tetrapeptides by analyzing their spectra in the 1750–650 cm−1 region (Fig. S4A, ESI†). The Amide I resonance peaks at 1705–1595 cm−1 for DHHR, FHYH, and HHFF are depicted in Fig. S4B–D (ESI†), respectively. The broad Amide I resonance of DHHR and VDHK indicates a disordered structural conformation of the tetrapeptide (Fig. S3C and S4B, ESI†). These data corroborate with the lack of assembly in hydrogels (Fig. 4(A)) together with the formation of sparse nanoaggregates during solvent casting (Fig. 4(D)) and provide an experimental validation that a low (<0.6) S-score can predict a low tendency to assemble in ordered structures. Amide I peaks of FHYH and HHFF (Fig. S4C and D, ESI†), were centered on the 1635–1630 cm−1 region and attributed to β-sheet structures, indicating a disorder to order transition during assembly.29,30 Additionally, when pH increases to neutral or basic values and tetrapeptides become deprotonated, the contribution of ordered conformations in the Amide I peak becomes more prominent, denoting that higher-order assembly accrues with an increase in pH. These data corroborate the formation of hydrogels and SEM analysis, validating the use of a high S score (>0.8) to predict formation of ordered, assembled structures.
In conclusion, by computationally screening and ranking all possible combinations of histidine-bearing tetrapeptides, we successfully identified peptides with pH-tunable gelation properties. Amino acids can change their protonation state based on local effects during self-assembly, as described by Tang et al.31 Considering intermediate protonation scenarios or implementing a constant pH molecular dynamics method, as described by the Tuttle group,32 could further improve the accuracy of the molecular models and could provide valuable insights into the behavior of the identified systems of interest. However, these approaches would require a significant increment in the computational cost, which would impair high-throughput in silico screening of different peptide sequences. In future works we can envision a first large-scale screening with fixed protonation, followed by a second-more accurate-filter with constant pH MD to select self-assembling peptide sequences. Tunable biomaterials with a pH-switch can find applications in releasing bioactive cargos molecules at specific tissue targets such as inflammatory sites, mineralizing bone, and tumors. Additionally, hydrogels that assemble in response to pH can be easily deployed in remote tissues by injection.
Conceptualization, YH, FR, AG, BM; methodology, YH, FR, AG; formal analysis, YH, FR; investigation, YH, FR; resources, AG and BM; data curation, YH, FR; writing, review and editing, YH, FR, AG, BM; supervision, AG, BM; funding acquisition, AG, BM. All authors have read and agreed to the published version of the manuscript.
BM acknowledges NSF (CMMI-1752172), ONR (N000142112402 and N000141912317), and MIT-IBM Watson AI Lab. AG acknowledges the CINECA award under the ISCRA initiative (grant code IsB26_W2EB and IsCa8_ReFPOX).
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cc02412a |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2023 |