 Open Access Article
 Open Access Article
      
        
          
            Alexander 
            van Teijlingen
          
        
       a, 
      
        
          
            Daniel C. 
            Edwards
a, 
      
        
          
            Daniel C. 
            Edwards
          
        
       b, 
      
        
          
            Liao 
            Hu
          
        
      b, 
      
        
          
            Annamaria 
            Lilienkampf
b, 
      
        
          
            Liao 
            Hu
          
        
      b, 
      
        
          
            Annamaria 
            Lilienkampf
          
        
       b, 
      
        
          
            Scott L. 
            Cockroft
b, 
      
        
          
            Scott L. 
            Cockroft
          
        
       b and 
      
        
          
            Tell 
            Tuttle
b and 
      
        
          
            Tell 
            Tuttle
          
        
       *a
*a
      
a1Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow, G1 1XL, UK. E-mail: tell.tuttle@strath.ac.uk
      
bEaStCHEM School of Chemistry, Joseph Black Building, University of Edinburgh, David Brewster Road, Edinburgh, EH9 3FJ, UK
    
First published on 30th May 2024
Membrane-disrupting and pore-forming peptides (PFPs) play a substantial role in bionanotechnology and can determine the life and death of cells. The control of chemical and ion transport through cell membranes is essential to maintaining concentration gradients. Likewise, the delivery of drugs and intracellular proteins aided by pore-forming agents is of interest in treating malfunctioning cells. Known PFPs tend to be up to 50 residues in length, which is commensurate with the thickness of a lipid bilayer. Accordingly, few short PFPs are known. Here we show that the discovery of PFPs can be accelerated via an active machine learning approach. The approach identified 71 potential PFPs from the 25.6 billion octapeptide sequence space; 13 sequences were tested experimentally, and all were found to have the predicted membrane-disrupting ability, with 1 forming highly stable pores. Experimental verification of the predicted pore-forming ability demonstrated that a range of short peptides can form pores in membranes, while the positioning and characteristics of residues that favour pore-forming behaviour were identified. This approach identified more ultrashort (8-residues, unmodified, non-cyclic) PFPs than previously known. We anticipate our findings and methodology will be useful in discovering new pore-forming and membrane-disrupting peptides for a range of applications from nanoreactors to therapeutics.
An archetypical example of a membrane protein nanopore is provided by alpha-hemolysin (α-HL), which is produced by the bacterium Staphylococcus aureus. α-HL is produced as a monomer (33.2 kDa) that self-assembles into a heptameric pore when inserted into a membrane.1 Insertion of α-HL into many cell membranes is profoundly damaging due to the swift passage of water, K+ ions, ATP, and from small molecules to those as large as 4 kDa through the newly formed channel.9 This transmembrane leakage results in osmotic swelling and cell death through rupture of the membrane. The broad natural scope of pore-forming biomolecules includes functions as cell receptors, molecular transporters, and ion channels for cell regulation.
Structurally, transmembrane channels possess certain characteristic features, which allow for their high stability as oligomers formed in membranes. Membranes are usually constructed of a lipid bilayer formed from phospholipids, sphingolipids, or glycolipids. Hence, nanopores require a variation of hydrophobic and hydrophilic surface functionality for interaction with the amphiphilic membrane constituents and insertion into the lipid layer.10
De novo or consensus design of membrane-spanning peptides has been employed, but these approaches often rely on knowledge of known PFPs.7,10–17 Our approach differs from those used previously as it does not use pre-existing databases of known PFPs. Rather, the active machine learning algorithm predicts the pore-forming ability of all octapeptides and then benchmarks the quality of these predictions using molecular dynamics simulations on the top ten predicted sequences. The complete sequence space of 25.6 billion octapeptides is explored in our approach via iterative feedback cycles of simulation and prediction.
PFPs are a diverse group of peptides that can create pores in the cell membrane of microorganisms, leading to their destruction. Naturally occurring pore-forming peptides (PFPs) are typically short (10 to 50 residues),18 cationic (+2 to +9),14,18 and amphiphilic.13 PFPs usually take one of four 3D structures: α-helical such as melittin19 (Apis mellifera) and temporin-SHf20 (a C-terminus modified octapeptide); β-stranded such as human α-defensin;15 αβ which contain both α and β regions and “other” such as the 13-residue peptide indolicidin, which is active against pathogenic bacteria, fungi and HIV.21 Typically, PFPs disrupt the membrane of a cell or viral envelope via an initial surface aggregation step that induces structural changes and/or assembly of the peptides. With increasing surface concentration, the peptides aggregate further and shift from parallel conformations to transmembrane conformations (barrel stave mechanism).22 Alternatively, peptides may assemble hydrophobically but leave hydrophilic regions exposed to solution such that they can sink through the bilayer (sinking raft mechanism).23 Toroidal pores differ from these two mechanisms as they are formed by the re-arrangement of hydrophilic phospholipid head groups into the centre of the pore. This re-arrangement is facilitated by attraction to polar or cationic peptide residues in the centre of the pore, this provides a mechanism by which less hydrophobic peptides such as magainin II24 and melittin22 can act as PFPs. Xu et al.25 showed that pores are stabilised by the reduced diffusion coefficient of phospholipid molecules associated with the pore complex. The fourth model of membrane-disruption is the carpet model. In this model, the peptides act as a detergent and destroy the bilayer by breaking it down into micelles and other phospholipid–peptide complexes, which typically requires a much higher concentration of peptides than the other models.26
Access to a greater array of robust channel-forming peptides has long been desired for both therapeutic and analytical purposes.4 Previous studies looking at designing PFPs have tended to focus on larger peptides that form pores via the assembly of a small number of monomers. For instance Vorobieva et al. used Rossetta protein structure predictions to design an octamer β-barrel that aggregates on the surface of the bilayer in an unfolded form that span the bilayer and folds into transmembrane β-barrel.27 Other existing approaches that utilize genetic algorithms and machine learning also rely on known PFPs as a starting point for their optimization approach.7,10–16,28–32 Recently, Woolfson and co-workers33 reported the de novo design of a water-soluble 30 amino acid peptide that formed membrane-spanning α-helical peptide barrels. Whilst transmembrane peptide channels have been designed, the successful designs tend to be at the longer end of the sequence length range14,33–37 (i.e., commensurate with the thickness of the bilayer) and accordingly require comprehensive synthetic and purification procedures.
Herein we provide a method for searching the octapeptide chemical space for peptides that change the bilayer morphology such as inducing curvature or perpendicular pressure. The extent of morphological change is assessed using the area per lipid (APL), these scores are then fed into an active learning cycle. The model is only trained on data that it itself selects. This cycle is then iterated until high-scoring peptides are identified by the active machine learning model (Fig. 1). This is achieved by using the extreme gradient boosting tree-based learning algorithm (XGB)38 to score each of the sequences. The area per lipid (APL) of the ten top-scoring peptides was then calculated using coarse grain molecular dynamics simulations (details of the selection procedure are provided in the ESI†). An active learning approach was employed in which the sequences selected for simulations were self-directed via an iterative feedback cycle (Fig. 1). Such an approach was chosen due to the computational expense of the simulations and the size of the chemical space (25.6 billion possible octapeptide sequences). Hence, the active learning approach helps to generate a dataset with more informative training points than the alternative of randomly or uniformly sampling of the vast sequence space, which likely contains a very large majority of non-pore forming peptides. In this work we specifically focus on discovering shorter peptides capable of assembling into membrane-spanning pores. To allow an unbiased search of this sequence space we did not place any constraints on the peptide sequence other than the length. We targeted octapeptides for this study since they would be easy to synthesize, but large enough to facilitate self-assembly into membrane-spanning superstructures. Meanwhile, the 25.6 billion octapeptide sequence space provides an excellent opportunity for demonstrating the ability of machine learning to accelerate sequence design.
|  | ||
| Fig. 1 Active machine learning for discovery of pore-forming octapeptides (A). A total sequence space of 208 octapeptides was generated. Full details of the active machine learning method are available in the ESI.† (B) 10 octapeptide sequences selected by the training model were subjected to three-stage molecular dynamics simulations to determine their ability to penetrate a phosphatidylcholine/1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (POPC/POPS) membrane contained within a water box. Details of the system setup and simulation details are available in the Methods section. (C) The pore-forming ability of each peptide was scored based on the average area per lipid in the final simulation frame across duplicate runs. (D) Area per lipid scores were fed back into the training model for the selected sequences. (E) The pore-forming ability of a selection of high scoring peptides after 7 iterative cycles were tested experimentally in planar lipid bilayers. | ||
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 20 bilayer was used to further accelerate the binding of peptides to the surface of the bilayer by electrostatic interactions as has been used in previous computational works.22,39,40 Following this, the steering bias on the peptides was released and the system allowed to equilibrate so that any peptides that did not interact favourably with the surface can escape back into the surrounding aqueous medium. The equilibration was then followed by a constant pH molecular dynamics (CpHMD) simulation,41,42 which was necessary to account for the significant change in the dielectric environment that the peptide experiences when moving from the solvent to the interior of the bilayer. We also tested neutralized peptides with MD rather as opposed to CpHMD, however we found that method to be overly permissive in allowing peptides to enter the bilayer (peptide 3 formed a pore using that method while CpHMD and experimental studies confirmed it should not). At the completion of this simulation the area per lipid (APL) score for the lipid bilayer was calculated, compared and fed back into the active machine learning algorithm, which is retrained using the new data. The use of APL is based on the proposal that membrane thinning precedes pore formation.43 Hence, membrane thinning and the associated increase in APL should predict the pore-forming ability of a peptide. Other measurements such as bilayer surface area were tested and were equally indicative of pore-formation (ESI,† Fig. S17–S20). The train-predict-test loop was then repeated until the APL score converged. CGMD simulations were performed using the GROMACS44 software package and CpHMD simulations were performed using NAMD.45 The phospholipid bilayers were built using INSANE.46 All simulations were performed using the MARTINI (v2.1) forcefield47,48 with helical secondary structure for the peptides. Full details of the workflow, comparison of machine learning models and the simulations are available in the ESI.† APL values were calculated for each system as the multiple for the X and Y dimensions divided by the number of lipids per leaflet (η/2).
20 bilayer was used to further accelerate the binding of peptides to the surface of the bilayer by electrostatic interactions as has been used in previous computational works.22,39,40 Following this, the steering bias on the peptides was released and the system allowed to equilibrate so that any peptides that did not interact favourably with the surface can escape back into the surrounding aqueous medium. The equilibration was then followed by a constant pH molecular dynamics (CpHMD) simulation,41,42 which was necessary to account for the significant change in the dielectric environment that the peptide experiences when moving from the solvent to the interior of the bilayer. We also tested neutralized peptides with MD rather as opposed to CpHMD, however we found that method to be overly permissive in allowing peptides to enter the bilayer (peptide 3 formed a pore using that method while CpHMD and experimental studies confirmed it should not). At the completion of this simulation the area per lipid (APL) score for the lipid bilayer was calculated, compared and fed back into the active machine learning algorithm, which is retrained using the new data. The use of APL is based on the proposal that membrane thinning precedes pore formation.43 Hence, membrane thinning and the associated increase in APL should predict the pore-forming ability of a peptide. Other measurements such as bilayer surface area were tested and were equally indicative of pore-formation (ESI,† Fig. S17–S20). The train-predict-test loop was then repeated until the APL score converged. CGMD simulations were performed using the GROMACS44 software package and CpHMD simulations were performed using NAMD.45 The phospholipid bilayers were built using INSANE.46 All simulations were performed using the MARTINI (v2.1) forcefield47,48 with helical secondary structure for the peptides. Full details of the workflow, comparison of machine learning models and the simulations are available in the ESI.† APL values were calculated for each system as the multiple for the X and Y dimensions divided by the number of lipids per leaflet (η/2).![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 3
3![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 3
3![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 3
3![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 1.5, 1 mL per 100 mg resin) were added to the resin (pre-swollen in DCM) and the mixture was shaken for 3 h at rt. The solution was filtered into ice-cold Et2O (50 mL) and the precipitate collected by centrifugation (15 min at ∼7200 rpm). The crude peptides were purified by semi-preparative RP-HPLC using an Agilent 1100 system (detection at 220 nm) with a Zorbax Eclipse XDB-C18 RP column (250 × 9.4 mm, 5 μm), with a flow rate of 2 mL min−1, eluting with a gradient of H2O and acetonitrile (from 5/95 to 95/5) over 20 min, followed by a 5-min isocratic elution. The lyophilized peptides were characterized by analytical RP-HPLC (Agilent 1100 modular HPLC system (detection at 220 nm) with a Phenomenex Kinetex® 5 μm XB-C18 100 Å column (5 cm × 4.6 mm) with a flow rate of 1 mL min−1, eluting with a gradient of H2O and acetonitrile (from 5/95 to 95/5) over 6 min, followed by a 3 min isocratic elution). The peptides 1 and 3–13 were characterized by matrix-assisted laser desorption/ionisation–time of flight mass spectrometry (Bruker UltrafleXtreme MALDI TOF/TOF mass spectrometer) using α-cyano-4-hydroxycinnamic acid as a matrix. The instrument was calibrated by the ‘nearest neighbour’ method, using Bruker peptide calibration standard II as reference masses. Peptide 2 was characterized using electrospray ionization mass spectrometry (ESI-MS) on an Agilent Technologies LC/MSD quadrupole 1100 series mass spectrometer.
1.5, 1 mL per 100 mg resin) were added to the resin (pre-swollen in DCM) and the mixture was shaken for 3 h at rt. The solution was filtered into ice-cold Et2O (50 mL) and the precipitate collected by centrifugation (15 min at ∼7200 rpm). The crude peptides were purified by semi-preparative RP-HPLC using an Agilent 1100 system (detection at 220 nm) with a Zorbax Eclipse XDB-C18 RP column (250 × 9.4 mm, 5 μm), with a flow rate of 2 mL min−1, eluting with a gradient of H2O and acetonitrile (from 5/95 to 95/5) over 20 min, followed by a 5-min isocratic elution. The lyophilized peptides were characterized by analytical RP-HPLC (Agilent 1100 modular HPLC system (detection at 220 nm) with a Phenomenex Kinetex® 5 μm XB-C18 100 Å column (5 cm × 4.6 mm) with a flow rate of 1 mL min−1, eluting with a gradient of H2O and acetonitrile (from 5/95 to 95/5) over 6 min, followed by a 3 min isocratic elution). The peptides 1 and 3–13 were characterized by matrix-assisted laser desorption/ionisation–time of flight mass spectrometry (Bruker UltrafleXtreme MALDI TOF/TOF mass spectrometer) using α-cyano-4-hydroxycinnamic acid as a matrix. The instrument was calibrated by the ‘nearest neighbour’ method, using Bruker peptide calibration standard II as reference masses. Peptide 2 was characterized using electrospray ionization mass spectrometry (ESI-MS) on an Agilent Technologies LC/MSD quadrupole 1100 series mass spectrometer.
      
      
        |  | ||
| Fig. 2 Characteristics and experimental validation of hit peptides. (A) Probability heatmap of amino acid residues, categorized by residue characteristic, and position in octapeptide of hit peptides selected by the active machine learning cycle (ESI,† Table S4). (B) Planar lipid bilayer ion current recordings of octapeptides 1–3 (positive and negative controls) and sequences selected by the active machine learning algorithm 4–10. Each peptide was added (10 μL of 50 μM, final concentration ∼0.8 μM) into a buffer-containing well (1 M KCl, 10 mM MOPS, pH 7.4) on the side of the POPC planar lipid bilayer containing the positive electrode. A potential difference +10 mV was applied, and the resulting ion currents shown were recorded (2 kHz lowpass Bessel filter). Snapshots of the final frame from the molecular dynamics’ simulations of peptide-membrane interactions; peptides are represented as blue beads with the bilayer in grey, and water beads are omitted for clarity. | ||
The selected peptides were obtained as lyophilized solids with >90% purity following solid-phase peptide synthesis and HPLC purification (ESI†). The ability of the peptides to form pores in 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) bilayers was assessed using ion currents recorded using a patch-clamp amplifier (ESI†).33,51,52
POPC lipid membranes exhibited disruption and leakage upon addition of 1 as determined by the observation of current flow through the membrane (0.8 μM peptide conc. in well, Fig. 2B). Subsequent addition of further aliquots of the positive control led to complete destruction of the membrane (peptide conc. in well ∼3 μM, ESI,† Fig. S4). Reassuringly, the lipid membrane remained experimentally unperturbed for over an hour at +10 mV following the addition of the negative control peptide 2, even at 10-times the concentration used for the positive control peptide (to a final conc. of 8 μM in the well added over 5 minutes). The second negative control peptide 3 that was predicted not to form channels by the MD simulations was also confirmed experimentally (Fig. 2B).
Peptides 4–10 were all experimentally observed to cause membrane-disruption as indicated by significant current flow upon the addition of peptide-containing solutions. In some cases, such as that of peptide 4 (VCVYWWRT), stable discrete current levels were observed, which supports the hypothesis of stable channel formation. However, the appearance of apparent discrete channels was generally observed prior to major membrane-disruption and eventual bursting of the bilayer even under relatively low applied potential difference (10–50 mV, see ESI,† Fig. S5). Whilst the ion traces indicate that the channels are transient and stochastic, it is remarkable that all selected sequences demonstrated disruptive potential against stable POPC bilayers.
Model A (see ESI,† Table S2 for models tested), repeatedly selected octapeptides with a YYYY motif, which despite producing relatively high APL scores, did not form pores in the simulations, except in one case. Models B and C yielded a far more varied selection of amino acids and motifs. Most of the short peptide sequences identified through our active machine learning protocol have a net charge of +1, which contradicts previous rules defined for longer peptides. However, since the aim is to identify short sequences, the overall positive charge per amino acid is still relatively high. For example, an octapeptide with a +1 charge has a charge per amino acid of +0.125. In comparison, melittin has five positively charged residues with a charge per amino acid of +0.19, LL-37 has a charge per amino acid of +0.16, and magainin-II has a charge per amino acid of +0.13. Based on these observations, a PFP should have a charge per amino acid of approximately +0.1 to +0.2.
Previous work has investigated specific sequences and demonstrated the different role that specific amino acids can have on the ability of peptides to form pores. For example, Cutrona, et al.53 demonstrated that arginine improves membrane translocation relative to lysine and MacCallum, et al.54 have provided residue-specific membrane interaction scales that report the relative affinity of peptide side chain mimicking small molecules for different portions of the bilayer. The analysis of the sequences obtained through the active machine learning protocol show that the four amino acids with the highest relative affinity for the bilayer core (I, V, L, & F) also contribute the most to area per lipid (Fig. 2A).55 Additionally, we observed that anionic residues (D/E), followed by polar uncharged residues (N, Q, S, T & H), have the least effect on APL due to their lower affinity for the bilayer.
The values for the amino acid positional probabilities within the octapeptide were calculated (Fig. 2A) over all systems studied, including both the random selection and active machine learning selections. These probability distributions suggested that the positioning of specific residues within the hit sequences may also be important. It is worth noting that a residue at a terminal position does not necessarily indicate its presence within the head group region of the bilayer, as multiple octapeptides are required to bridge the bilayer due to their short length. Our analysis suggests that cationic residues contribute more to pore-forming ability when positioned in the latter half of the octapeptide, closer to the C-terminus. However, it must be noted that these design rules are very generic and peptides such as peptide 3, which seem like perfect candidates based on these rules, may not be active. The specific order of the amino acids is also a very important characteristic, as shown in Fig. 3, two isomers of peptide 11 were tested and found not to produce stable pores.
Remarkably, peptide 11 was found to form discrete transmembrane channels, as indicated by the stepwise ∼4 pA increases in the current shown in Fig. 3A. Histogram analysis revealed the conductance of the three steps corresponded to 0.4, 1, and 1.7 nS (by comparison, α-hemolysin has a conductance of 1 nS). This suggests that either three different sizes of pores were inserted or that a single pore was changing its size. However, histogram analysis of the current change upon initial channel formation from an intact membrane had a normal distribution across a relatively narrow conductance range (Fig. 3C). This may point towards subsequent current increases being attributed to a single pore increasing in size during a sinking raft mechanism, as suggested by the MD simulation (Fig. 3D). The current/voltage response of these channels (Fig. 3B) was linear and symmetrical in the positive and negative regions (+10 to –10 mV), which indicates a lack of ion selectivity and a randomized orientation of the peptides in the bilayer. Since the octapeptides are too short (∼15 Å) to span the bilayer (∼25–30 Å) and were only added on one side of the bilayer, this indicates that the peptides must be able to diffuse between both leaves of the membrane. This diffusion requirement may also explain why largely apolar sequences with a charge per residue of +0.1 to +0.2 are favoured for channel formation. Continued exposure of the lipid bilayer to 1.6 μM peptide 11 led to rupture of the membrane within 5 minutes, which could not be reformed. To our knowledge this is the first observation of discrete single-channels formed from an 8-residue peptide. Similar behaviour was observed in 1 M KCl, NaCl and CsCl albeit with decreased channel stability (ESI,† Fig. S6). Channel formation was also observed in 1,2-diphytanoyl-sn-glycero-3-phosphocholine bilayers (DPhPC, ESI,† Fig. S7), which is frequently used in single-channel detection methods.51
Both peptide 11 and the positive control peptide 1 contain R in position 6, which is not particularly favoured in the probability heat map shown in Fig. 2A, yet channel-forming ability was confirmed in the experiments and MD simulations (Fig. 2B and 3). Hence, we synthesized two sequence-scrabbled variants of peptide 11 and performed retrospective MD modelling on them. The MD simulations correctly predicted that both the scrambled (12) and reversed (13) peptides possessed membrane-disrupting capacity but lacked the remarkable pore-forming ability of the parent peptide 11 to form discrete stepwise channels (Fig. 3E).
| Footnote | 
| † Electronic supplementary information (ESI) available: Peptide synthesis with RP-HPLC and MS characterisation, machine learning procedures, experimental details and additional computational and planar lipid bilayer experiment results. See DOI: https://doi.org/10.1039/d4cp01404a | 
| This journal is © the Owner Societies 2024 |