Daria
de Raffele
ab and
Ioana M.
Ilie
*ab
aUniversity of Amsterdam, van 't Hoff Institute for Molecular Sciences, Science Park 904, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands. E-mail: i.m.ilie@uva.nl
bAmsterdam Center for Multiscale Modeling (ACMM), University of Amsterdam, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands
First published on 7th December 2023
Existing therapies for neurodegenerative diseases like Parkinson's and Alzheimer's address only their symptoms and do not prevent disease onset. Common therapeutic agents, such as small molecules and antibodies struggle with insufficient selectivity, stability and bioavailability, leading to poor performance in clinical trials. Peptide-based therapeutics are emerging as promising candidates, with successful applications for cardiovascular diseases and cancers due to their high bioavailability, good efficacy and specificity. In particular, cyclic peptides have a long in vivo stability, while maintaining a robust antibody-like binding affinity. However, the de novo design of cyclic peptides is challenging due to the lack of long-lived druggable pockets of the target polypeptide, absence of exhaustive conformational distributions of the target and/or the binder, unknown binding site, methodological limitations, associated constraints (failed trials, time, money) and the vast combinatorial sequence space. Hence, efficient alignment and cooperation between disciplines, and synergies between experiments and simulations complemented by popular techniques like machine-learning can significantly speed up the therapeutic cyclic-peptide development for neurodegenerative diseases. We review the latest advancements in cyclic peptide design against amyloidogenic targets from a computational perspective in light of recent advancements and potential of machine learning to optimize the design process. We discuss the difficulties encountered when designing novel peptide-based inhibitors and we propose new strategies incorporating experiments, simulations and machine learning to design cyclic peptides to inhibit the toxic propagation of amyloidogenic polypeptides. Importantly, these strategies extend beyond the mere design of cyclic peptides and serve as template for the de novo generation of (bio)materials with programmable properties.
Various approaches have been developed to interfere with the accumulation processes by stabilizing or eliminating specific monomeric or aggregated forms of the responsible polypeptides.8,9 They rely primarily on the design of small molecules or antibodies that bind to monomeric or aggregated protein species, thereby making the substrate unavailable for conversion10 and/or sterically interfering with the aggregation process.9,11–13 Traditional small molecule drugs and protein-based therapeutics have made good contributions, yet their limitations in terms of selectivity, stability, and bioavailability,14,15 as well as their repeated failure in clinical trials16,17 have inspired the search for alternative therapeutic approaches. Among these, peptides and particularly cyclic peptides are attracting considerable attention due to their unique structural properties and diverse biological activities.18,19 Their cyclic nature confers enhanced stability and resistance to proteolytic degradation, while maintaining a robust binding to the target.20 Cyclic peptides have proven to be excellent candidates for cancer therapy,21 organ transplantation22 and inhibition of amyloid aggregation.23,24 Their size and functional properties ensure that the contact area is large enough to provide high selectivity,25 their ability to form salt-bridges and hydrogen bonds can lead to strong binding affinities,26 and cyclization increases their proteolytic stability.27
Amyloid-forming polypeptides, such as amyloid-β (Aβ42), α-synuclein (α-syn) and amylin (hIAPP), share the intrinsic disorder independently of their size or residue sequence. The cellular prion protein (PrPC) consists of a membrane-anchored ordered globular domain composed of three α-helices and a two stranded anti-parallel β-sheet preceded by a 100 residue unstructured flexible tail. Despite the well-defined secondary structure in its monomeric form, the cellular prion protein lacks a specific binding site accessible to potential small molecule inhibitors.8,28,29 Due to their properties, cyclic peptides can selectively intervene in the folding and aggregation process, bind even to targets lacking an easily accesible druggable pocket30 or heterogeneous and dynamic species,31 regulate the conformational stability of the target polypeptide and potentially halt or slow down disease onset or progression. Furthermore, the stability and permeability of cyclic peptides enable them to cross the blood–brain barrier,32 a crucial requirement for effective neurodegenerative disease therapies.
Advances in peptide synthesis techniques, combinatorial chemistry, and computational tools allow the de novo design and tuning of the structural elements, target specificity, binding affinities, solubility, cell permeability and proteolytic stability of natural and synthetic cyclic peptides. De novo design of cyclic peptides often rely on protein engineering strategies, such as rational design and directed evolution, which aid in the discovery and/or improvement of peptides for drug-related applications.33 Over the past years, computer simulations and machine learning enabled the exploration of a vast chemical space, accelerating the design and optimization of lead peptidomimetic candidates.34,35 Combined with directed evolution, they are versatile tools that enable an initial in silico screening step to scan the full combinatorial libraries and proposed mainly small molecules to be tested in vitro.36 While most of these models are trained on experimental data, more recently machine learning combined with molecular dynamics simulations successfully proposed, optimized and reduced the number of chemical compounds to be tested experimentally at a later stage.37 In contrast to small molecules and protein optimization, the use of machine learning for de novo peptide design is still in its early stages38–40 and its potential has been demonstrated mainly in non-therapeutic applications.39
In this paper, we provide an overview of the recent advancements of the utilization of cyclic peptides as therapeutic or imaging agents for neurodegenerative diseases, particularly focusing on the amyloid-β peptide, α-synuclein, amylin and the cellular prion protein. We emphasize on the importance of the synergy between computer simulations and experiments in light of the latest developments in machine learning for cyclic peptide design and optimization. Additionally, we provide a recipe for a potential approach to capitalize on the predictive power and results from computer simulations and AI in the development of cyclic peptide-based therapeutics.
Fig. 1 (a) Ferulic aldehyde (MW 194 Da) inhibits the Aβ42 multimerization. (b) Cartoon representation of the Aducanumab antibody (PDB ID: 6CO341) (MW 146 kDa). Highlighted are the heavy chain (cyan), light chain (light blue). The area enclosed by the red circle represents the binding interface between the antibody and the N-terminus of the Aβ42 peptide. (c) Naturally-occurring cyclotide kalata B1 (MW 2.92 kDa), derived from residues 306–311 of tau.42 |
Antibodies are Y-shaped proteins of larger molecular weight than small molecules (>150 kDa), which can recognize and bind to protein targets with high specificity and modulate their toxic behavior.73 In particular, monoclonal antibodies have been designed both for therapeutic and diagnostic applications. They bind to amyloidogenic polypeptides and/or their aggregates to stabilize a desired conformations and make the substrate unavailable for conversion10,43 and/or sterically interfere with the aggregation process.11–13 For instance, the DesAbs single-domain antibodies targeting Aβ42 epitopes74 interact with the monomeric peptide, bind with high affinity to the oligomeric species, but not the fibrillar structures, can inhibit secondary nucleation12 and suppress Aβ42-mediated toxicity in C. elegans.74 Aducanumab and lecanemab, approved anti-Aβ42 agents, are monoclonal antibodies effective for patients in the early stages of AD due to their ability to reduce amyloid deposits in the brain.43 Aducanumab (Fig. 1(b)) binds at the N-terminal residues 3 to 7 and can discriminate between monomers and aggregated species,44 while lecanemab binds to soluble Aβ42 protofibrils.75 Cinpanemab and prasinezumab, two monoclonal antibodies directed against α-synuclein aggregates failed in clinical trials due to the lack of positive effects in disease progression.16,17 The POM-family of antibodies (POMologues) has been developed to recognize a variety of epitopes along the sequence of the cellular prion protein10 and modulate its toxic effects.76 Notwithstanding, no drug against prion diseases is currently in clinical trials. Despite the recent success in the Alzheimer's field with aducanumab43 and the potential of gantenerumab,77 antibodies have limitations as therapeutics, including stability and immunogenicity,14,15 which can impact clinical efficiency.
Because of their physicochemical properties, cyclic peptides present a series of advantages as compared to their linear precursors, small molecules and biological therapeutic agents such as antibodies. First, the rigidity obtained through cyclization provides increased stability, higher resistance to proteolysis27 and enhanced cell permeability as compared to linear peptides.20,87,88 Second, their size and functional properties ensure that the contact area is large enough to provide high selectivity, and their ability to form salt-bridges and hydrogen bonds can lead to strong binding affinities.26 Hence they can maintain a robust antibody-like binding to (undruggable) interfaces with high affinity,20,79,89 due to their larger surface and implicitly the higher number of hydrogen bonding partners. Third, cyclic peptides have good in vivo stability, which contributes to enhanced retention and circulation, particularly if they are rich in non-canonical amino acids.20
Cyclic peptide-based therapeutics also face a series of challenges. Orally administered cyclic peptide-rich drugs struggle with poor oral bioavailability,78 because of the susceptibility of cyclic peptides to resist proteolytic degradation in the gastrointestinal tract.27 Nevertheless, different routes of administration, such as subcutaneous or via intravenous injections, overcome these difficulties and aid in the efficient delivery of the peptide-drug to the target.84 Another obstacle involves preventing off-target interactions, a challenge often tackled by selectively modifying natural amino acids in the sequence to non-natural ones.84
In the amyloid world, the cyclic peptide development has been growing over the past decade.30,78,80 Typical methods involve designing peptides rich in aromatic moieties, hydrophobic amino acids, or D-amino acids (due to the stereoselectivity for L-amino acids of proteases) that disrupt the aggregation process, i.e. β-breakers or agents that bind to monomeric and oligomeric species, competing with the responsible polypeptide to hinder its aggregation and/or toxic transformation.90 For instance, the RD2D3 D-peptide (H-ptlhthnrrrrrrprtrlhthrnr-NH2†), designed to modulate the binding of PrPC to Aβ42 oligomers, interferes with Aβ42-PrPC heteroassembly in a concentration-dependent manner.91 Its cyclic successor presents better in vitro potency and pharmakinetic properties92 and could potentially alter Aβ42 aggregation. The bicyclic DesBP peptide (RAACKLGIKACTSVYHACGGKRR) was rationally designed to bind monomeric Aβ42 at residues 31–36 and 38–4224,93 and was shown to alter the morphology of Aβ42 aggregates in a dose dependent manner. In particular, higher peptide concentrations lead to increased aggregate disorder and reduced cytotoxicity.93 Similarly, the BD1 cyclic peptide (O-ySGLIKWTTALLRTYC-NH2) was shown to inhibit α-synuclein fibril formation in vitro.94 The D,L-α-peptide CP-2 cyclic peptide (IJwHsK‡) prevents α-syn aggregation into toxic oligomers by an “off-pathway” mechanism.95 Particularly, it interacts with the N-terminus and the non-amyloidogenic region, altering the protein's membrane interaction properties and fibril morphologies, thereby preventing the toxic membrane disruption. The macrocyclic inhibitory peptides (MCIPs), were designed to bind to amyloids by mimicking human IAPP (hIAPP) interaction surfaces while maintaining only minimal hIAPP-derived self-/cross-recognition elements.96 Inhibitor selectivity was tuned by chirality, which lead to nanomolar binding affinities to hIAPP, to both amyloid-β40 and amyloid-β42 peptides, high proteolytic stability in human plasma and human brain–blood-barrier crossing ability.96 Also, disulfide-rich macrocyclic peptides are versatile scaffolds for stable biochemical tool development. Two examples are SFTI-1 (GRCTKSIPPICFPD, disulfide connectivity: Cys3–Cys11), a cyclic peptide that inhibits trypsin, and the kB1 cyclotide (GLPVCGETCVGGTCNTPGCTCSWPVCTRN, disulfide connectivity: Cys5–Cys19, Cys9–Cys21 and Cys14–Cys26), which have an inherent ability to inhibit the fibril growth of the tau-derived hexapeptide Ac-VQIVYK-NH2 (AcPHF6).42 Particularly, kB1 is a stronger inhibitor of tau fibrillizatiom than SFTI-1, enabling better binding and/or disruption of AcPHF6 fibrils. Recently, tau mimetic peptides (β-bracelets) have been designed starting from the high-resolution structure of the tau fibril fold by extracting β-strand sequences linked by β-arcs.97 The newly generated peptides self-assemble into parallel β-sheet fibrils and can serve as templates for the design of soluble inhibitors of tau seeding.
In terms of the cellular prion protein, no progress has been made on the therapeutic cyclic peptide market, despite its well defined secondary structure in the soluble form. Potential causes are the lack of druggable pockets or a stable unique binding region in the globular domain, and the intrinsically disordered nature of the tail. Though, the existence of monoclonal antibodies that bind in the nanomolar regime to PrPC indicate that putative interaction sites are available.10 We hypothesise that the rational design of cyclic peptides starting from available high resolution structures of PrPC in complex with monoclonal antibodies may serve as starting points for the design of cyclic peptides that can potentially stabilize the soluble isoform of the protein and therefore prevent its toxic transformation. Alternatively, by tweaking the environmental conditions through mild solvent alteration, e.g. by replacing water with D2O98 or by adding organic compounds,99,100 one can delicately alter the conformational landscape of the protein to reveal new (allosteric) druggable pockets without disturbing the protein's secondary structure. We refer the interested reader to a series of reviews on peptide-based strategies to interfere with protein misfolding and aggregation,101,102 a review on the therapeutic potential of cyclic90 and bicyclic peptides.103 Studies older than 10 years focusing on anti-amylin cyclic peptides and peptide-based inhibitors have been reviewed elsewhere.104
Over the past years, rational design approaches for de novo peptide design have gained momentum. Rational design relies on a detailed understanding of the amino acid sequence, protein high resolution structure, function and interaction mechanisms.33 It involves the identification and mutation of key residues associated with protein stability to improve targeted physical and catalytic properties.33,110–112 Rational design relies on human intervention, which often offers an informed and efficient means to narrow down the search space for amino acids, resulting in a smaller and more manageable pool of effective peptides. De novo rational cyclic peptide design requires (a) high resolution three dimensional structures and biochemical/biophysical information of the target protein, and/or (b) detailed information of the ligand properties (e.g. hydrogen bonding abilities to the target, hydrophobicity, cyclization chemistry, existence of natural and non-natural residues) and conformations (i.e. the designed peptide may assume different conformations in the bound and unbound states).113 Recent advancements in cryo-electron microscopy (cryo-EM) have enabled the determination of the three-dimensional (3D) high resolution structures of new amyloidogenic aggregates and their monomeric precursors.114 These 3D structures corroborated with a comprehensive understanding of molecular interactions and structure–function relationships could enable the rational design of (cyclic) peptides tailored for amyloidogenic targets. As a matter a fact, rational design has been successfully used to generate the DesAbs antibodies targeting amyloid-β12 or specific α-synuclein and hIAPP epitopes.115
Starting from the high resolution structures of amyloid fibrils of tau, α-synuclein, and amyloid-β, miniproteins, ranging from 35 to 48 residues, were successfully designed to bind to the fibrillar tips of the targets and inhibit aggregation in in vitro and in vivo.116 First a library of peptide-based inhibitors was created using Rosetta. Subsequently, Rosetta's MotifGraft protocol117 was used to dock the inhibitors onto the fibrils and energy minimized. The top-ranking inhibitors, i.e. the best binders, were subjected to molecular dynamics simulations to assess the stability of the complexes. Lastly, Rosetta's ab initio structure prediction algorithm118 was employed for the final screening of inhibitors. Inhibitors with the most favorable energy predictions and the smallest root mean squared deviations from the original design were selected for experimental validation.
From a computational perspective, virtual screening allows fast screening of millions of compounds prior to experimental testing, thereby reducing cost and saving time. Virtual screening using cyclic peptides is limited by the availability of three-dimensional structures of the targets, by the absence of druggable pockets and by the lack of information on the structure of the designed cyclic peptide. To overcome some of the limitations, different computational techniques have been combined with machine learning to predict protein structures and complexes thereof. Notable examples include HADDOCK (High Ambiguity Driven protein–protein Docking),119–121 RosettaFold122 and AlphaFold2.123,124 HADDOCK uses biochemical and biophysical interaction data, such as nuclear magnetic resonance titration experiments or mutagenesis data, to facilitate the protein–protein docking process.119 Recent developments include the generation of cyclic peptide conformations and docking to the protein target using knowledge of the binding site on the protein side to drive the modeling.125 AlphaFold2 is a deep-learning algorithm that incorporates neural network architectures inspired by the physical and geometric aspects of protein structures.126 It employs insights from evolutionary conservation through the analysis of multiple sequence alignments. These alignments are generated by considering information from evolutionary related proteins, along with the 3D coordinates of a few homologous structures known as templates. Similarly, RoseTTAFold also utilizes multiple sequence alignments and a set of initial templates to accurately predict folded structures122 and protein–protein complexes.40,122 These technological advancements contribute significantly to the prediction protein structures through computational means. The intrinsic disorder associated with amyloidogenic polypeptides implies that the target protein lacks a stable structure and that its native state is better described by a diverse conformational ensemble rich in disordered structures.2,127,128 In this context, AlphaFold2 fails to predict such conformations, which often gives rise to unrealistic structures that do not accurately capture the states in the ensemble127–129 (Fig. 2). The lack of realistic and physically accurate ensembles of structures hampers the design of any type of inhibitor, which represents a limitation of these novel deep learning techniques.
Recently extensive molecular dynamics simulations at full atomistic resolution (Table 1) have been used to successfully identify transient monomeric Aβ42 conformations that have characteristics of fibrillar structures.133 States of the monomeric, dimeric, oligomeric and fibrillar amyloidogenic polypeptides have been thoroughly characterized and have been reviewed elsewhere.2,128,130 The identified pool of structures could be potentially used for small molecule or cyclic peptide docking and design. Ideally, access to a well organized, reliable, and consistently maintained database of molecular dynamics trajectories of amyloidogenic polypeptides would avoid the repeated generation of similar trajectories and enable more rapid and consistent advancement in amyloid-related drug discovery. Example of such a publicly available database is the Molecular Dynamics Data Bank. The European Repository for Biosimulation Data.
For small molecule docking, snapshots from molecular dynamics simulations of the Aβ42 monomer,134 dimer50 or multimers52,135 have been clustered to generate representative ensembles to be prepared for docking, which can be experimentally validated.62 Briefly, curcumin and a set of curcumin derivatives were docked onto Aβ42 multimeric conformations generated with molecular dynamics simulations.50,52 Results revealed that the small molecules interact with high probability with the amyloidogenic driving domains 16KLVFF20 and 29GAIIG33 of Aβ42 and disrupt their secondary structure in the hexameric52 and dimeric arrangements.50 Interestingly, Silybin A (Sil A) and Silybin B (Sil B), two diasteroisomers of silibinin were shown to have different interaction preferences to Aβ40 and distinct biological response.51 Sil A binding the aromatic residues F19 and F20 slowed down aggregation, while Sil B interacting primarily with the C-terminus of the polypeptide fully abolished amyloid aggregation. Compelling evidence suggests that Silybin B is a powerful inhibitor also against the toxic self-assembly of hIAPP.53 Simulation and experimental work, revealed that the frequent interactions of Sil B with the S20–S29 sequence induces disorder in the amyloidogenic core and attenuates hIAPP toxicity and aggregation propensity.53 Myricetin, another polyphenolic flavonoid was shown to bind hIAPP at the amyloidogenic core and its C-terminus preventing aggregation and distorting the fibrils.136 The differential binding score (DIBS) was introduced to determine the binding preferences of ligands to an ensemble of IDP conformations by comparison against random coil ensembles of the same protein extracted from MD simulations.137 The validation was performed on epigallocatechin-3-gallate (EGCG) binding to the unstructured N-terminus of the tumor suppressor p53 protein, which compared favorably to experimental results. The predictive ability of simulations has been demonstrated in a translational study, in which atomistic simulations were used to design new polythiophene derivatives against prion aggregation, prior to in vivo testing.54 The compounds subsequently showed substantial prophylactic and therapeutic potency in prion-infected mice. Hence, simulations are powerful tools to generate conformational ensembles of the target polypeptide, which can act as scaffolds for the docking and design of molecules to target specific amyloid-forming regions.
The effects of antibodies on the structural and dynamic properties of amyloidogenic polypeptides have also revealed valuable insight into their modulating properties. Specifically, molecular dynamics simulations of Aducanumab in complex with Aβ42 revealed that the antibody sterically binds to monomeric, oligomeric and fibrillar species, with the binding site at the N-terminus (residues 2–7) preserved across all systems.41 Additionally, the results showed that the monomer unfolds and hydrophobically collapses on the antibody's surface, while in the complexes with aggregated species, the β-sheet structure of the peptide remains conserved.41 All-atom simulations of PrPC in complex with the neurotoxic POM1 and the innocuous POM6 antibodies revealed that the two antibodies, despite targeting similar epitopes, modulate differently the intrinsic flexibility of the protein28 and its orientation with respect to the cellular membrane.29 The information extracted from the simulations of amyloidogenic polypeptides in complex with antibodies could serve as starting points for the optimization and design of agents (e.g. antibodies, peptides) to bind with higher affinity towards selected species or for the rational design of cyclic peptides to modulate the target's conformational landscape enabling access to new binding sites.
Aside from the structure and the conformational landscape of the target, the conformations of the designed cyclic peptide in the target-bound and target-unbound states play an important role. Essentially, the designed cyclic peptides often adopt different conformations in solution as compared to the target-bound state. To design an efficient peptide-based inhibitor one needs to understand the conformational transitions of the cyclic peptides between the different states. While some peptide–protein complex structures are available, obtaining high resolution structures of cyclic peptides in solution is hampered by their low core-to-surface ratio, absence of specific couplings (e.g. NH-Hα) and diverse conformations in solution.131 Hence, molecular dynamics simulations have been successfully used to predict the energetically relevant conformational ensembles of cyclic peptides in solution, which compare favorably to available experimental data (e.g. NMR chemical shifts).138 We refer the interested reader to a comprehensive review of computational methods to characterize the behavior of cyclic peptides in solution131 and underline the synergistic effects of experimental and computational works.
Regarding the implications to the cyclic peptide design aspect, molecular dynamics simulations exceed experimental resolutions and can provide insight into the structural interactions between the peptide and the target at atomistic level of detail. For instance, macrocyclic peptides found in plants (cyclotides) have been experimentally shown to inhibit the aggregation of tau and amyloid-β42 fibrils.139 The peptide was subsequently docked onto 3D structures of Aβ42 fibrils and subjected to molecular dynamics simulations.140 The results explained experimental observations to reveal that the Cter-M cyclotide from C. ternatea (GLPTCGETCTLGTCYVPDCSCSWPICMKN) binds the Aβ42 fibril via hydrogen bonding, hydrophobic, electrostatic and π–π interactions, thereby inhibiting aggregation.140 Particularly, the peptide disrupts intermolecular hydrogen bonds and salt bridges in the Aβ42 fibril, which are crucial for its structural integrity. The effects occur within the first 50 ns of the simulations with disruptions in the fibril secondary structure at residues 2–7 and 38–41, resulting in the loss of extended β-sheet conformations. Importantly, the Aβ42 fibril in absence of the peptide maintains stable extended β-sheet conformations throughout the simulation trajectory.
Other approaches rely on available high resolution structures of protein complexes to identify linear interface motifs with an appropriate distances between residues to facilitate subsequent cyclization.141 In particular, backbone motifs of epitopes within protein–protein interfaces were identified and compared against available cyclic peptide databases to pinpoint promising candidates with desired structural features.141 Subsequently, the generated cyclic peptide–protein complexes underwent refinement through molecular dynamics simulations in explicit solvent to determine the binder with the highest target affinity. To validate the efficacy of this method, initial tests were conducted on a complex formed by the bovine trypsin inhibitor (BPTI) protein and the trypsin protease. The method identified a cyclic peptide that resembled the BPTI protein backbone at the interface, which is in agreement with experimentally known structures.
Despite extensive simulations, challenges remain when exploring the conformational space of IDPs both in the presence and in absence of modulators.142 Convergence is an issue due to the rugged free energy landscapes of the polypeptides, their size (which at times imposes the use of large simulation boxes) and/or kinetic traps. Some of these difficulties are overcome by using enhanced sampling techniques, implicit solvents and/or coarse-grained models, which together with advances in computing power enable the access to longer time- and length-scales. Alternative approaches, include reducing the size of the system by simulating fragments of the polypeptide of interest and using statistical mechanics tools to derive the conformational free energies of the full IDP.143 Current force fields struggle with over- or underestimating the properties of an IDP as compared to experimental quantities. Here, the IDP-tailored choices are the all-atom additive Charmm36m144 and Amber ff14IDPSFF,145 which have been fine tuned to reach experimental agreement and improve the conformational sampling of intrinsically disordered proteins.146 More recently, machine learning has been integrated into the development and improvement of force fields147,148 and novel techniques are emerging for IDP-specific force fields. An example is Charmm-NN, which uses atom-type based neural networks to calculate energies and forces149 and is subject to further improvements. A detailed overview of the challenges associated with IDP simulations and their reconciliation with experimental data have been reviewed in ref. 2 and 142. On the methodology side, the determination of the binding free energies of the cyclic peptide to the target also require special attention. For instance, using perturbation free energy calculations, a popular method with small molecules, one can determine the relative binding free energies and mechanistic detail, while preserving the flexibility of the complex.150 Nevertheless, the convergence still remains an issue. Alternatively, umbrella sampling, a technique that provided valuable insight into the themodynamics of monomer attachment to amyloid fibrils,143,151,152 would be a suitable choice for the determination of the binding free energies of a peptide to the target.
Ref. | Peptide | Agent | Model | Solvent | Method | Samplinga |
---|---|---|---|---|---|---|
a Cumulative sampling over all replicas. Abbreviations. MD, molecular dynamics; H-REMD, Hamiltonian replica exchange molecular dynamics; MC/MD, hybrid Monte Carlo/molecular dynamics. RESPA, reversible multiple time scale molecular dynamics. b Two sets of simulations at different concentrations. | ||||||
Barz et al.133 | Aβ42 monomer | — | Charmm36m | TIP3P | H-REMD | 40.8 μs |
Jakubowski et al.52 | Aβ42 fibril | 94 small molecules | Charmm36m | TIP3P | MD | 10.4 μs |
Dehabadi et al.50 | Aβ42 dimer | Ferulic aldehyde | Charmm36m | TIP3P | MD | 2.6 μs |
Vanillin | 2.6 μs | |||||
Sciacca et al.51 | Aβ40 monomer | SilA, SilB | Charmm36 | TIP3P | MD | 3 μs |
Garcia-Vinuales et al.53 | hIAPP monomer | — | Charmm36m | TIP3P | MD | 6 μs |
SilA | 6.5 μsb | |||||
SilB | 6.5 μsb | |||||
Dubey et al.136 | hIAPP fibril | Myricetin | Amber99sb | TIP3P | MD | 1.05 μs |
Chen & Krishnan137 | p53-NTD | EGCG | OPLS-AA 2005 | TIP4D | MD | 500 ns |
Frost & Zacharias41 | Aβ2–7 | AduFab | Charmm22* | TIP3P | MD | 500 ns |
Aβ42 | AduFab | 1 μs | ||||
Aβ42 dimer | 781 ns | |||||
Aβ42 hexamer | 1 μs | |||||
Aβ42 fubril | 254 ns | |||||
Ilie & Caflisch28 | PrPC | — | Charmm36m | TIP3P | MD | 5 μs |
POM1 | MD | 5 μs | ||||
PrPC | POM6 | MD | 5 μs | |||
Ilie et al.29 | PrPC | — | Charmm36 | ABSINTH | MC/MD | 4.8 μs MD + 240M MC |
PrPC | POM1 | MC/MD | 4.8 μs MD + 240M MC | |||
PrPC | POM6 | MC/MD | 4.8 μs MD + 240M MC | |||
Kalmankar et al.140 | Aβ42 monomer | Cter-M cyclotide | OPLS3e | TIP4P | RESPA | 900 ns |
Aβ17–42 pentamer | 900 ns | |||||
Aβ11–42 fibril | 900 ns | |||||
Aβ42 double fibril | 900 ns |
Des3PI (design of peptides targeting protein–protein interactions) is a novel computational fragment-based approach for designing cyclic peptides with high target specificity.157 This algorithm performs docking calculations of an amino acid library onto the targeted protein surface and subsequently connects residues with favorable target binding affinities to generate novel cyclic peptide sequences and structures. We envision that the potential of this method can be exploited to the maximum when combined with quantitative representations from molecular dynamics simulations to generate novel amyloid-binding cyclic peptides.
Among the computational methods employed in designing peptides, FoldX emerges as a powerful tool due to its ability to determine the free energy contributions of each atom at protein interfaces based on its own position relative to neighbours in the complex.158 It can thereby predict the impact of mutations on protein stability and optimize protein sequences for improved stability and desired functional properties. Relying on FoldX to perform an exhaustive thermodynamic profiling, the tandem peptide CAP1 was designed to inhibit tau aggregation.159 Both in vitro and in vivo experiments confirmed computational predictions by showing that CAP1 binds with high specificity and affinity (EC50 = 145 ± 49 nM) to tau aggregates, impeding their spread within cells. Additionally, CAP1 proved effective in hindering the ability of tau polymorphs obtained from the brains of Alzheimer's disease patients to initiate aggregation.
In fact, by using data from molecular dynamics simulations of cyclic pentapeptides with diverse sequences and structural attributes as training datasets, machine learning models have been trained to predict structural ensembles for novel cyclic-peptide sequences, a method known as structural ensembles achieved by molecular dynamics and machine learning (StrEAMM).162 Alternative methods rely on generating comprehensive training datasets comprising sequences of blood–brain barrier penetrating linear peptides (BBPs) sourced from established databases and scientific literature, alongside non-BBPs peptides from UniProt to predict and explore novel BBPs with improved properties.163 AbDiffuser introduced a diffusion model tailored for the generation of three-dimensional antibody structures and corresponding sequences for biotechnological applications.164 Large protein families can be reliably mapped to a sequence ordinate using sequence alignment. AbDiffuser is an equivariant diffusion model designed to take advantage of these properties. The model adheres to physics-based constraints (e.g. bonds, torsional angles) and can accommodate different sequence lengths, thereby reducing the memory complexity. AbDiffuser relies on the Aligned Protein Mixer (APMixer), a neural network operating within the SE(3) equivariance framework to ensure consistent behavior, when subjected to rotations and translations in the three-dimensional space. Validation of the predictions through in silico and in vitro work underlines the importance of computational and experimental synergies when designing molecules with tailored properties.
Within the landscape of neurodegenerative diseases, MobiDB emerges as a resource that provides a comprehensive view of polypeptide disorder.165 This repository compiles an array of comprehensive data related to intrinsically disordered proteins (IDPs) and regions (IDRs) encompassing both experimental and computational information on protein disorder, (e.g. sequences, structures, and functional annotations). Its utility is extended to experimental scientists seeking detailed information about individual protein systems, as well as bioinformaticians who require substantial, unified protein datasets for building statistical classifiers. More recently, MobiDB integrates AlphaFold predictions sourced from AlphaFoldDB.166
A recent study highlighting the synergy between modern computational techniques and experiments, focused on developing a versatile method for designing proteins capable of targeting specific peptide sequences derived from armadillo proteins.167 Using no known structure, Monte Carlo simulations were employed to construct a hash table for bidentate side-chain-backbone interactions, to ensure the stability of the desired protein–peptide interface. Identified key residues were optimized using Rosetta to construct both the protein and peptide sequences while keeping the identified residues unchanged. To enhance the binding affinity and specificity, alanine scanning was performed and the binding free energies were determined to select the most favorable binders, validated by experimental techniques (e.g. X-ray crystallography, circular dichroism and biolayer interferometry). For IDPs, a similar approach may aid in the initial generation of polypeptide–cyclic peptide complexes than can then be investigated and optimized via molecular dynamics simulations.
An alternative approach known as hallucination relies on reversing deep neural networks trained to predict native protein structures, to design novel protein sequences and structures.168 Briefly, information encoded in several parameters of protein structure prediction networks containing learned representations and patterns that enable the networks to capture and predict various aspects of protein structures, including amino acid interactions and statistical relationships, is used to create realistic protein backbones and their corresponding amino acid sequences. First, random amino acid sequences are input into the trRosetta structure prediction network169 to predict distance maps. Then Monte Carlo sampling is employed in residue space to refine the sequences and improve the predicted structures. This process generates a diverse array of proteins with varied sequences and structures. To validate the physical manifestation of these hallucinations, synthetic genes for 129 hallucinated proteins were expressed and purified. Among these, 27 proteins exhibited circular dichroism spectra consistent with the target structures and the resolved three-dimensional structures of three selected proteins matched the hallucinated models, underlying the potential of the method in de novo protein design.
Chroma introduced a generative approach to design peptides with customized structures and functions.170 It employs a diffusion process, which incorporates conformational statistics of polymer ensembles (e.g. dihedral angles, bonds) and a neural architecture for molecular systems based on random graph neural networks for molecular systems. The model can be conditioned via external constraints (e.g. symmetries, substructures, and natural language prompts) to generate proteins with specific properties, including inter-residue distances, distinct structural domains, and semantic properties guided by classifiers.
A recent investigation explored the synergistic potential of integrating advanced deep learning methods with a Rosetta-based approach to enhance the accuracy and efficacy of designed protein sequences binding to specific target molecules.171 The success rate is defined by the Cα root mean squared deviations of the binder between structures generated with AlphaFold2126 or RoseTTAFold172 and Rosetta-designed structures. Large differences between them, i.e. deviations larger than 2.0 Å, indicate potential design pitfalls for protein binders. Complemented by confidence metrics from pairwise atomic environment predictions, successful binders are separated from those that do not perform well. The results show that AlphaFold2 or RoseTTAFold as evaluation filters in the protein design process increases the design success rate by 10-fold as compared to Rosetta.
Other strategies integrate RoseTTAFold,172 into denoising diffusion probabilistic models (DDPMs) to design novel proteins with specific structural or functional attributes.173 This effort gives rise to RFdiffusion,174 which incorporates RoseTTAFold as a denoising network within a generative diffusion model. Briefly, protein backbones are created from scratch by initializing frames of random residues and RFdiffusion is used to produce a refined and denoised prediction. Subsequently, sequences for these structures are generated employing the ProteinMPNN network.175 RFdiffusion predictions can be optimized by incorporating additional information (e.g. partial sequence and fold data) and enhanced through pre-trained weights and the application of loss functions.
Novel methods for cyclic peptide generation and design are rapidly emerging and might prove to be useful in the amyloidogenic polypeptide landscape. For instance, RINGER is a novel macrocycle conformer generator, which is a diffusion-based transformer model tailored to generate novel peptide macrocycles with specific sequences.176 Alphafold has been recently modified to predict the structure of macrocyclic peptides (AfCycDesign), which have been then experimentally validated.177 On the coarse-grained side, CycloPep emerges as a powerful tool to generate cyclic peptides compatible with the MA(R/S)TINI force field.178
Following the recipe introduced throughout this paper, there are at least four ingredients required for the successful de novo peptide design binding amyloidogenic targets (Fig. 3). First, the target scaffold and, in particular, quantitative distributions of conformations of this scaffold are necessary elements.127 Available three dimensional high resolution structures are excellent candidates, however in absence thereof, deep learning based methods such as AlphaFold2,126 RosettaFold122 or Chroma170 can accurately predict 3D models of protein structures even under user-specified environmental conditions.179 For IDPs, structure prediction is more challenging because of their native disorder characterised by a rugged free energy landscape.127 Fortunately, existing or predicted structures can be investigated to obtain quantitative conformational distributions using molecular dynamics simulations at full atomistic resolution (if the system size allows) or at coarse-grained level (when dealing with bigger targets or aggregates).2 For the latter, different methods can be employed to reinstate atomistic detail,180,181 which would enable to extract the statistically relevant states of the target, to be used in subsequent steps.
Second, quantitative characterization of the conformations populated by the cyclic peptides in the target-bound versus target-unbound states are factors to be accounted for. The size of the peptides (below 20 residues) and their cyclic nature often limits their structure generation or even the complex prediction via deep learning approaches such as AlphaFold-Multimer182 or via experimental techniques.131 Assuming that the initial peptide–protein complex is known, e.g. from crystal structures183 or from de novo design,184 one can isolate the peptide from the complex and explore its conformational space in the unbound state via (enhanced sampling) molecular dynamics simulations at full atomistic resolution.131 For a cyclic peptide with unknown bound and unbound conformations, a convenient approach to obtain statistically meaningful conformations in solution is to generate its sequence by building its residues in an excluded volume-obeying manner,185 and sampling its conformational space via Monte Carlo simulations and/or relaxing it using (enhanced sampling) molecular dynamics simulations. Nevertheless, the latter contains no information on the conformations sampled by the peptide when bound to the target, which may represent a bottleneck when trying to dock to the target.
Third, a key aspect is the structure of the complex, which aids in understanding what type of interactions drive the assembly and which residues contribute the most to peptide binding and complex stability. Experimentally, a series of crystal structures of cyclic peptide–protein complexes have been resolved183 but none in complex with amyloidogenic targets. The thermodynamics and kinetics of peptide binding can be tested using methods such as surface plasmon resonance or isothermal titration calorimetry but none provides specific information on the binding epitope. Computationally, if the representative 3D structures are known, the peptide can be rationally designed and/or docked onto the target and enhanced sampling or deep learning techniques are employed to extract its binding free energy.186 Alternatively, in absence of known structures and/or unstably bound complexes, long molecular dynamics simulations could potentially reveal new binding sites. This approach may be efficient if the amyloidogenic target has a well defined secondary structure as is the case for , or has druggable pockets. However, for polypeptides with a high degree of plasticity this is a resource intensive and potentially ineffective strategy, which would only slow down peptide design. Machine learning can facilitate the design of peptides, and corroborated with simulations and/or experiments, can aid in the estimation of binding affinities,187 and improve the peptide sequence for optimal binding to the target.171,188 Hence, if combined in an effective manner, computer simulations and machine learning can considerably increase peptide design and optimization efficiency, and can therefore speed up drug development.
The fourth ingredient prior to clinical advancement is the experimental in vitro and in vivo validation. Given the complementarity of computational and experimental work, an attractive approach would be to integrate the trio, i.e. simulations, machine learning and experiments, into a dynamic and iterative engine. For instance, molecular dynamics simulations and deep learning, could be first used to predict and optimize protein and peptide conformations, stability, binding affinities, aiding in the selection of lead candidates prior to experimental validation. Then results from the trio can be incorporated into feedback loops37,189,190 that would allow the design of novel and improved peptide sequences, prediction of cyclic peptide bioactivity, better target selectivity, and off-target effects, thus aiding in the faster identification of potent and safe candidates. Hence, the unique integration of such methodologies can aid the design and optimization of novel experiments and computational work. Furthermore, an approach as such can significantly reduce the number of experiments that are required for validation and can increase the homogeneity across the experimental data sets (e.g. environmental conditions).39
The concepts and proposed strategies extend beyond drug design for therapeutic applications and hold the potential to aid in adjacent fields such as (bio)material design or controlled drug delivery.193 In essence, it all boils down to the gathering and the smart processing information from diverse sources to create a digital correspondent of a material capturing its composition, structure, responsiveness to external stimuli etc.194 to generate design rules for programmable and adaptable materials.195
Footnotes |
† Small letters indicate D-enantioneric amino acids. |
‡ J is the norleucine amino acid. |
§ Small letters indicate D-enantioneric amino acids. |
This journal is © The Royal Society of Chemistry 2024 |