Mapping activity elements of protegrin antimicrobial peptides by HomoSAR

Maheshkumar R. Borkara, Raghuvir R. S. Pissurlenkarb and Evans C. Coutinho*a
aDepartment of Pharmaceutical Chemistry, Bombay College of Pharmacy, Kalina, Santacruz (E), Mumbai 400098, India. E-mail: evans@bcpindia.org; Fax: +91-22-26670905; Tel: +91-22-26670905
bMolecular Simulations Group, Department of Pharmaceutical Chemistry, Goa College of Pharmacy, Panaji, Goa 403001, India

Received 21st July 2015 , Accepted 9th September 2015

First published on 9th September 2015


Abstract

Antimicrobial peptides (AMPs) are naturally occurring small peptides which are an innate part of the host's defense mechanism. They are active against both Gram-negative and Gram-positive bacteria, various viruses, fungi, and parasites. There is little consensus in the amino acid sequences of AMPs but evidently they do possess some definite common features, such as relative hydrophobic and a positively charged amphipathic structure that has been associated with the biological activity. Optimization of the activity and specificity of the AMPs using large peptide libraries is a tedious and expensive route. In this venture, QSAR can be used to shed light or reveal the structural features that should be incorporated in the design of new AMPs. However within the realm of QSAR, 3D-QSAR of peptides is an overwhelming task due to the sheer number of conformational degrees of freedom for peptides. To achieve this, we propose the use of a validated 2D-QSAR technique coined HomoSAR that is specifically designed for peptide QSAR. It has the ability to extract all necessary information from a set of peptides to elucidate the underlying structure activity relationships, based on homology principles and similarity techniques. The present work is a comprehensive study on a dataset of protegrin antimicrobial peptides isolated from porcine leukocytes with a broad spectrum of activity against both Gram-positive and Gram-negative bacteria, as well as the fungus C. albicans and HIV-1 virus. The HomoSAR models for antimicrobial activity against six different species highlighted two major determinants of activity; firstly the optimal length of protegrins for exhibiting broad-spectrum antimicrobial activity against bacteria and fungi is 16 residues. Secondly, for antimicrobial activity against the yeast C. albicans, it turns out that it is the electronic property that should be tempered to modulate activity. This is not a major attribute for both Gram negative and Gram positive bacteria.


Introduction

Almost all multicellular organisms produce endogenous peptides with antimicrobial activity (AMPs) to protect themselves from pathogenic microbes. Cathelicidins and defensins are two major groups of antimicrobial peptides found in humans. AMPs show broad spectrum antimicrobial activity against various microorganisms, including both Gram-positive and Gram-negative bacteria, various fungi, parasites and viruses. AMPs can be divided into several categories based on their amino acid composition, size and secondary structures; such as peptides with α-helices e.g. magainin, cecropin, and pexiganan; β-sheets stabilized by disulfide bridges e.g. α-, β-defensins and protegrin; peptides with extended structures e.g. indolicidin, Bac5 and Bac7 and finally the loop AMPs e.g. bactenecin.1–4

The general features of AMPs are: they have short sequences of 12 to 50 amino acids in length and are amphipathic in nature with large diversity in their sequences and structures. They have at least two positive charges from the presence of arginine (Arg) and lysine (Lys) residues that enable them to directly act on the cell wall and phospholipid membranes of microorganisms which are negatively charged. The result is an accumulation of the AMPs on the membrane surface, which causes displacement of native Ca2+ and Mg2+ ions. On the other hand the hydrophobic portion of the peptide interacts with hydrophobic components of the membrane and together with the positive charges is responsible for the disruption of the membrane leading to a bacterial death.3,5–7

Bacterial infections are amongst the most common cause of human diseases and the third leading cause of death worldwide and along with the emergence of multi drug resistant (MDR) forms are posing a grave and growing threat to public health. AMPs are currently of great interest as promising alternatives to conventional antibiotics. Their unique mechanism of action lends a low proclivity for development of resistance. Furthermore, AMPs have also been implicated in cancer and inflammatory disorders besides their prominent role in infectious diseases.8–10 Targeting the bacterial cell membrane is a promising approach to combat drug resistance as it is hard for bacteria to develop resistance against the membrane molecules.

Protegrins are small peptides about 16 to 18 amino acid residues in length, isolated from porcine leukocytes.11 There are five known naturally occurring porcine protegrins, PG-1 to PG-5; and several derivatives of protegrins have been synthesized.12–15 The structure of protegrin-1 has been published and is available in the protein data bank with PDB i.d.: 1PG1. It is composed of 18 amino-acids (RGGRL5CYCRR10RFCVC15VGR18) with a high content of cysteine (Cys) and several positively charged arginine (Arg) residues. The NMR structure of PG-1 reveals a β-hairpin structure with two antiparallel β-strands connected by a turn that is stabilized by two interstrand disulfide bonds (S–S) between the cysteine residues Cys6–Cys15 and Cys8–Cys13.16 It has been revealed that protegrins have a broad spectrum of activity against both Gram-positive and Gram-negative bacteria including E. coli, P. aeruginosa, and N. gonnorhoeae, the fungus C. albicans and the HIV-1 virus; this spectrum has attracted researchers to tap its full potential.11,12,17

Several experimental and computational works on the protegrin antibacterial peptides by a number of research groups have attempted to explain the mechanism of action of the protegrins. This includes molecular dynamics simulations of protegrins in micelles,18,19 in lipid bilayer membrane,20–24 thermodynamic calculations and potentials of mean force,25–27 molecular dynamics simulations of protegrin pores28–31 and models of conductance and biological effects of protegrin pores.30,32–34 All these have been reviewed by Bolintineanu and Kaznessis.15 Besides these computational studies that deal with the physical descriptions of the mechanism of action of protegrin, there are some reports on the QSAR approach to explain the biological activity of the peptides based on correlation with its physicochemical properties. Ostberg and Kaznessis14 have carried out a 3D-QSAR study on 55 protegrin and its synthetic analogs active against six microbial species; the reported model correlating activity with the physicochemical properties leaves much to be desired in terms of the statistics. On the same path, Langham et al.35 developed QSAR models to provide insight into the mechanism of the cytotoxic action of protegrin and its analogs but the model does not shed any light on structural activity relationship. A third QSAR analysis of antimicrobial and haemolytic activities of porcine protegrin-1 (PG-1) mimetics-cyclic cationic peptides with β-hairpin fold was reported by Frecer.36 All these studies have not been able to develop a unified model for activity and have not been able to explain the features that distinguish activity against Gram-positive, Gram-negative bacteria and fungi. The unanswered questions in these accounts, suggest the need of a model that can better underlie the basic structure activity relationship for AMPs. With this in focus, we report in this paper, the use of HomoSAR to build models that can delineate features that distinguish Gram-positive from Gram-negative activity and bacterial from fungal activity.

We had reported some new 2D and 3D-QSAR methodologies to help understand the structure activity relationship of peptides, some of these methods are HomoSAR,37,38 Comparative Residue Interaction Analysis (CoRIA) and its variants39 and eQSAR.40 Of these methods, HomoSAR which is a 2D QSAR technique, is simple and straightforward, and has been quite successful in establishing structure activity relationships underlying peptide datasets.37,38 The beauty of this methodology is that the difficulty of obtaining a unique alignment of peptides in 3D space has been reduced to the much simpler alignment in 2D space. This method is an integrated approach that adopts the principles of comparative protein modeling (homology modeling) in conjunction with the QSAR formalism to design and predict the activity of new peptide sequences, irrespective of their size and length. In this study, we have build several models that explain the activity of protegrin peptides and its analogs using the HomoSAR formalism. The schematic representation of the steps involved in the HomoSAR is shown in Fig. 1.


image file: c5ra14402g-f1.tif
Fig. 1 Schematic representation of the steps involved in HomoSAR.

Material and methods

Dataset

In the present work, HomoSAR models have been built for the dataset reported by Ostberg and Kaznessis.14 This dataset is a compilation made from numerous experimental studies on natural and synthetic protegrins that was carried out by Lehrer and coworkers11,41–44 and other research groups.12,45,46 The activities reported as MIC in μg ml−1 were converted to molar concentration and the negative logarithm was then taken; in this transformation, a higher value implies greater activity of the peptide. The sequences of the protegrin peptides and their corresponding biological activities against six different species of Gram-negative bacteria (Escherichia coli, Psendomonas aeruginosa, Neisseria gonorrhoeae F-62 and FA-19), Gram-positive bacteria (Listeria monocytogenes), and yeast (Candida albicans) have been listed in the ESI in Tables S-1 to S-6.

HomoSAR methodology

HomoSAR is an integrated approach that adopts the principles of Homology modeling in union with QSAR to design and predict the activity of new peptide sequences. In homology modeling with multiple sequence alignment (MSA) one can only compare the similarity across peptides in a dataset at the residue (amino acid) level but cannot quantify the similarity with activity; QSAR on the other hand can establish this relationship. Hence an appropriate amalgamation of the two areas of molecular modeling viz. comparative protein modeling and QSAR can create a novel means to understand the variations in peptide sequences with biological activity in a quantitative manner. The steps in HomoSAR are briefly discussed; the complete protocol can be read from an alternate source.37
Step 1. Multiple Sequence Alignment (MSA): the protegrin peptides were first subjected to MSA to arrive at a consensus picture of the similarity. Alignment was based on the BLOSUM matrix which gave superior results over the GONNET and PAM matrices (results not reported). The MSA was executed over the ClustalW2 EBI web server.47
Step 2. Similarity indices: in this step, similarity indices are calculated for every amino acid in the peptide sequences against the corresponding amino acids of the reference peptide in the MSA. The reference peptide is usually the most active peptide in the dataset.

The similarity index (S) for an amino acid at the ‘ith’ position in peptide ‘B’ (query peptide) in relation to peptide ‘A’ (reference peptide) for a given physicochemical property (P) is given by the equation:

 
image file: c5ra14402g-t1.tif(1)
image file: c5ra14402g-t2.tif is the similarity between peptides ‘A’ and ‘B’ at the ‘ith’ position in the peptide sequences based on the physicochemical property [P]; ‘PAi’ and ‘PBi’ are the physicochemical properties of the amino acid in peptides ‘A’ and ‘B’ at the ‘ith’ position respectively and the denominator is a normalizing factor. The similarity ranges from −1 to +1. In the QSAR table, the similarity indices calculated for a particular property [P] for all sequences in the training and test set form the X-variables that are correlated with the biological activity (Y-variable). The influence of the neighboring amino acids on the activity through 1–2 and 1–3 interactions is also accounted and computed as follows:
 
image file: c5ra14402g-t3.tif(2)
 
image file: c5ra14402g-t4.tif(3)

In addition three other variables were also calculated

 
image file: c5ra14402g-t5.tif(4)
 
image file: c5ra14402g-t6.tif(5)
and
 
image file: c5ra14402g-t7.tif(6)

Step 3. Regression analysis: the final step in building the HomoSAR model is the statistical correlation of the similarity indices with the biological activity. This is achieved through the genetic function approximation/partial least squares (G/PLS) method. The statistical models were generated using the program Cerius2 v4.11 (Accelrys Inc., USA) running on a Linux Enterprise WS 4.9 OS environment. The HomoSAR models were built with the following parameters: population size – 500, maximum number of generations – 25[thin space (1/6-em)]000, generation gap – 0.8, crossover frequency – 0.5, smoothness value (d) – 1.0, and PLS components – 3 to 5, while keeping all other parameters at their default values. The scaling factor image file: c5ra14402g-t8.tif was used to scale/normalize the SS, DS and TS (eqn (4)–(6)) components of X, where ‘value’ is the descriptor value, [x with combining macron] is the mean of each descriptor and SD is the standard deviation.

Cross validation by leave one out (qLOO2), leave group out (qLGO2), bootstrap (rBS2), least squares error (LSE) methods were implemented to assess the robustness of the models. The predictive power of the resultant models expressed as rpred2 was measured by external validation on a test set. The possibility of chance correlation was checked by the randomization (or Y-scrambling) test. Additional statistical checks were executed with the computation of some more parameters for the HomoSAR models as proposed by Roy et al.48–51 and Todeschini et al.52,53 viz. rm(LOO)2, rm(test)2, rm(overall)2, Rp2, image file: c5ra14402g-t9.tif, image file: c5ra14402g-t10.tif, Δrm(training)2, Δrm(test)2 and cRp2. These parameters indicate the model robustness when values are greater than or equal to 0.50 and when the Δrm(training)2, Δrm(test)2 values are less than 0.20.

In the current study, we initially calculated similarity indices based on 119 physicochemical properties categorized under hydrophobic, steric and electronic properties for the 20 natural amino acids and then attempted to build a correlation of the properties with the activity. Based on the correlation coefficient and correlation matrix, some properties were identified that were not orthogonal to each other and these were eliminated. A pruned set of 36 descriptors [P] consisting of 12 hydrophobic, 12 steric, 10 electronic and 2H-bond properties were finally used to build the correlation equations. These are descriptors from the amino acid index database.54 The description and accession ID of the amino acid index database is given in Table 1. The five best HomoSAR models (out of 500 models) for the activity of the peptides against the six microbial species are given in Table 2.

Table 1 List of descriptors used in HomoSAR study and accession ID of amino acid descriptors
Sr. no. Abbr. Description Accession ID
Electronic properties
1 eαNH Alpha-NH chemical shifts (Bundi–Wuthrich 1979) BUNA790101
2 eCTDC A parameter of charge transfer donor capability CHAM830108
3 ePosC Positive charge (Fauchere et al. 1988) FAUJ880111
4 eNegC Negative charge (Fauchere et al. 1988) FAUJ880112
5 eNetC Net charge (Klein et al. 1984) KLEP840101
6 ePol Polarity (Zimmerman et al. 1968) ZIMJ680103
7 eIP Isoelectric point (Zimmerman et al. 1968) ZIMJ680104
8 eαCH Alpha-CH chemical shifts (Bundi-–Wuthrich 1979) BUNA790102
9 eδH N.m.r. chemical shift of alpha-C-H (Fauchere et al. 1988) FAUJ880107
10 epKN pK-N (Fasman 1976) FASG760104
11 epKC pK-C (Fasman 1976) FASG760105
12 eppz3 Principal property value z3 (Wold et al. 1987) WOLS870103
[thin space (1/6-em)]
Steric properties
13 sFlexP Flexibility parameter for two rigid neighbors (Karplus–Schulz 1985) KARP850103
14 sGSI Graph shape index (Fauchere et al. 1988) FAUJ880101
15 sSTW STERIMOL minimum width of the side-chain (Fauchere et al. 1988) FAUJ880105
16 sSCV Side-chain volume (Krigbaum–Komoriya 1979) KRIW790103
17 sRGS Radius of gyration of side-chain (Levitt 1976) LEVM760105
18 svdw1 van der Waals parameter R0 (Levitt 1976) LEVM760106
19 svdw2 van der Waals parameter epsilon (Levitt 1976) LEVM760107
20 sSCAT Side-chain angle theta (AAR) (Levitt 1976) LEVM760103
21 sAFI Average flexibility indices (Bhaskaran–Ponnuswamy 1988) BHAR880101
22 sFlex0 Flexibility parameter for no rigid neighbors (Karplus–Schulz 1985) KARP850101
23 sFlex1 Flexibility parameter for one rigid neighbor (Karplus-Schulz 1985) KARP850102
24 sRSCV Residue side-chain volume (Zhou et al. 2006)
[thin space (1/6-em)]
Hydrophobic properties
25 hFE Free energy of solution in water, kcal mol−1 (Charton–Charton 1982) CHAM820102
26 hAHM Atom-based hydrophobic moment (Eisenberg–McLachlan 1986) EISD860102
27 hHV Hydrophilicity value (Hopp–Woods 1981) HOPT810101
28 hTEP Optimized transfer energy parameter (Oobatake et al. 1985) OOBM850103
29 hSHαh Surrounding hydrophobicity in alpha-helix PONP800104
30 hSHβs Surrounding hydrophobicity in beta-sheet (Ponnuswamy et al. 1980) PONP800105
31 hFEC Free energy change of a(Ri) to a(Rh) (Wertz–Scheraga 1978) WERD780103
32 hGEWpH9 Unfolding Gibbs energy in water, pH 9.0 (Yutani et al. 1987) YUTK870102
33 hGEpH7 Activation Gibbs energy of unfolding, pH 7.0 (Yutani et al. 1987) YUTK870103
34 hGEpH9 Activation Gibbs energy of unfolding, pH 9.0 (Yutani et al. 1987) YUTK870104
[thin space (1/6-em)]
Hydrogen bond property
35 HbD Number of hydrogen bond donors (Fauchere et al. 1988) FAUJ880109
36 HbA Number of hydrogen bond acceptors (Fauchere et al. 1988)


Table 2 HomoSAR models for activity against E. coli, P. aeruginosa, N. gonorrhoeae (F-62), N. gonorrhoeae (FA-19), L. monocytogenes, and C. albicans for the protegrin peptides
Sr. no. Species HomoSAR models
1-E Escherichia coli Log activity = 5.36 − 0.32SAB[sGSI][DS] + 0.49SAB[hTEP][5][6] + 0.46SAB[hHV][7][8][9] + 0.45SAB[sRGS][TS] − 0.45SAB[hFEC][6][7][8] + 0.30SAB[eCTDC][10] + 0.13SAB[ePol][10][11]
2-P Pseudomonas aeruginosa Log activity = 6.38 + 0.40SAB[hGEWpH9][TS] − 0.62SAB[eppz3][SS] + 0.66SAB[hFEC][4]
3-N Neisseria gonorrhoeae F-62 Log activity = 4.14 + 1.40SAB[hGEWpH9][8][9][10] + 1.63SAB[hFEC][1][2][3] − 0.90SAB[hFE][1][2]
4-N Neisseria gonorrhoeae FA-19 Log activity = 4.64 + 1.56SAB[sRSCV][TS] + 0.75SAB[hHV][12][13] − 1.44SAB[hHV][SS]
5-L Listeria monocytogenes Log activity = 5.60 + 0.23SAB[hTEP][1] + 0.41SAB[eppz3][10][11][12] + 0.76SAB[hSHαh][TS]
6-C Candida albicans Log activity = 3.90 + 0.44SAB[eppz3][11][12][13] − 0.20SAB[hFE][13][14] + 0.47SAB[ePol][14][15] + 0.48SAB[eppz3][15][16] + 0.88SAB[eIP][5]


Results and discussion

HomoSAR model for activity on Escherichia coli

 
Log activity = 5.36 − 0.32SAB[sGSI][DS] + 0.49SAB[hTEP][5][6] + 0.46SAB[hHV][7][8][9] + 0.45SAB[sRGS][TS] − 0.45SAB[hFEC][6][7][8] + 0.30SAB[eCTDC][10] + 0.13SAB[ePol][10][11] (7)
In the HomoSAR model eqn (7) (E-1 in Table 2), the term [hTEP][5][6] which is the optimized transfer energy parameter appears with a positive coefficient, signifying that the hydrophobic character of the dipeptide segment spanning 5th and 6th positions is important for activity. This implies that similarity with the reference should thus be maintained and therefore hydrophobic residues at positions 5 and 6 with positive values for hTEP will improve activity. Also the term [hHV][7][8][9] which is the hydrophilicity value for the tripeptide segment covering the 7th, 8th and 9th positions of the sequence appears with a positive coefficient; to maintain similarity with the reference peptide and thus improve activity these sites should be occupied by polar amino acids. The term [hFEC] which is the “free energy change of alpha (Ri) to alpha (Rh)” of amino acid residues at the 6th, 7th and 8th positions appears with a negative coefficient; this means that dissimilarity with the relevant triad in the reference can be obtained by replacing the amino acids in the reference peptide with polar types to increase the activity. This along with the above results, shows the inside/outside preferences (amphipathic nature) of amino acids in the polypeptide chain. It is true that most protegrin peptides have at these positions (6th, 7th and 8th) polar residues like Cys–Tyr–Cys. Thus, for antibacterial activity against E. coli the residues at positions 5–9 should be amphiphatic in nature. The term [sRGS][TS] a property related to the radius of gyration of the side-chain for tripeptide segments summed over the entire length of the peptide, is a steric attribute. This property appears with a positive coefficient indicating that the steric character of tripeptide segments over the entire length of peptide is a significant attribute for activity. The terms [eCTDC][10] and [ePol][10][11] are the charge transfer donor capability and polarity (the electric force due to the side chain acting on its immediate surroundings) of the side chain of residues at positions 10 and 11 in the sequence respectively. They emerge with positive coefficients, indicating that the electronic property of the 10th residue and the dipeptide segment spanning the 10th and 11th positions should similar to the reference for good antibacterial activity. This criterion is met by placing charged amino acids at these positions. A look at the sequences of all active protegrin peptides and their analogs, reveals that the positively charged arginine is often present at the 10th and 11th positions. The term [sGSI][DS] is a shape index (steric influence of a group that encodes the three attributes of complexity, branching, and symmetry as measured by graph theory) calculated for all dipeptide segments [DS] over the entire length of the peptide. This term appears in the model with a negative coefficient, implying that the steric property of dipeptide segments over the entire length of the sequence should be dissimilar to the reference for optimal activity. Based on this last and the earlier result, it can be concluded that there should be an optimal in the steric attributes of the side chain of residues spanning entire length of the peptide.

All 500 equations were investigated for the descriptors which evolve with the QSAR models with their respective coefficients signed positively or negatively. The result of this analysis in the form of a bar chart is shown in Fig. 2. The term [eCTDC][10] appears with a positive coefficient with a high frequency in almost all the QSAR equations; indicating that residues with admirable electronic property at this position of the sequence will contribute in improving the activity. Also, the term [hTEP][5][6] appears with a positive coefficient in more than 400 equations signifying that the hydrophobic property of the dipeptide segment encompassing the 5th and 6th positions is important for the antibacterial activity.


image file: c5ra14402g-f2.tif
Fig. 2 Frequency of appearance in the HomoSAR models of the physicochemical property associated with different positions in the sequence of E. coli.

The HomoSAR models for activity against P. aeruginosa, N. gonorrhoeae (F-62), N. gonorrhoeae (FA-19), L. monocytogenes, and C. albicans for the protegrin peptides are shown in Table 2. The schematic representation of the contribution of each term in the HomoSAR model for antimicrobial activity of all the species are depicted in Fig. 3a–f and are discussed here.


image file: c5ra14402g-f3.tif
Fig. 3 Schematic representation of contribution of each term in HomoSAR model for activity against (a) E. coli, (b) P. aeruginosa, (c) N. gonorrhoeae F-62, (d) N. gonorrhoeae FA-19, (e) L. monocytogenes and (f) C. albicans. S – steric property, E – electronic property, H – hydrophobic property and HB – hydrogen bond property.

From the schematic representations shown in Fig. 3b and e it can be concluded that for good activity against P. aeruginosa and L. monocytogenes, the hydrophobic property calculated for tripeptide segments and summed over the entire length of the peptide is vital, as the terms [hGEWpH9][TS] -the Gibbs energy of unfolding in water at pH 9 and [hSHαh][TS] – the surrounding hydrophobicity of the residue for alpha-helical propensity, appear with positive coefficients. Thus, hydrophobic character should be preserved as in the reference peptide. Also the term [hGEWpH9] makes a positive contribution towards the activity against the Gram negative bacteria N. gonorrhoeae (F-62) for the particular tripeptide segment covering positions [8], [9] and [10].

For antimicrobial activity against the yeast C. albicans, the electronic property specifically at position 5, the tripeptide segment spanning the N-terminal amino acids [11], [12] and [13] and the dipeptide segments [14][15] and [15][16] play a remarkable role as all these terms have positive coefficients in the model as shown in Table 2. With the exception of the eppz3 property for residues 11 to 13, the other properties eIP (isoelectric point) at residue 5, epol at residues 14 and 15 and again eppz3 at positions 15 and 16 should kept similar to the reference. The analysis leads to the conclusion that arginine is an ideal candidate at positions 5, 14 and 15 in the sequence. In this context, it is remarkable that the model for activity against the yeast C. albicans is singular in its character from the models for activity against Gram-negative and Gram-positive bacteria; in the latter cases the electronic property does not have a significant influence on the activity. The descriptor [hFE][13][14] is a hydrophobic term expressing the free energy of solution in water for the dipeptide segment at positions 13 and 14; its negative coefficient means that for good activity dissimilarity with the reference is essential. This requirement can be satisfied by positioning a residue like cysteine at position 13 (seen in most sequences) which will maintain the antiparallel β-sheet structure. This secondary structure is achieved by a disulfide bond with the cysteine at position 8. However, the cysteine at position 13 must be sequentially followed by either an arginine or more preferably a proline that will maintain the β-hairpin bend. This result is in harmony with the results of Cho et al., who by performing alanine substitutions of protegrin 1 peptides and their variants and testing against C. albicans have come to a similar conclusion.55

In the activity models against N. gonorrhoeae (FA-19) and E. coli, the terms [sRSCV][TS] and [sRGS][TS] are volume and radius of gyration of the side-chain, both of which encode the steric property. These terms appear with a positive coefficient indicating that the steric attribute should be similar to the reference peptide for activity against these organisms.

To continue with P. aeruginosa, the activity model has the term [eppz3][SS] which is a sum of the electronic property over the entire peptide. This term appears with a negative showing that such an attribute should be made dissimilar to the reference peptide for this particular Gram-negative bacterial species. This can be achieved if some of the amino acids in the reference are replaced particularly with cysteine.

In all models, it is seen that the physicochemical parameters of amino acid residues at positions 17th and 18th do not play a role in the antibacterial activity; this implies that the optimal length of protegrins for exhibiting broad-spectrum antimicrobial activity against bacteria and fungi is 16 amino acids and this corroborates the result published by Cho et al.55

The statistics of the best HomoSAR models generated for every species of micro-organism in terms of the regression coefficient r2, the internal (rBS2; qLOO2) and external (rpred2) correlation coefficients are satisfactory. The Roy et al. and Todeschini et al. parameters rm(LOO)2, rm(test)2, rm(overall)2, Rp2 and cRp2 were also calculated and found to be in the acceptable range, being greater than 0.50; the Δrm(training)2 and Δrm(test)2 values are less than or close to 0.20; all these values denote that the derived models are robust and not a result of chance correlation. The statistics of the HomoSAR models generated for each species of microorganism are given in Tables 3 and 4, the statistics of the HomoSAR models are placed alongside the statistics for QSAR models reported in the literature.

Table 3 Statistical data of HomoSAR models for activity against E. coli, P. aeruginosa, N. gonorrhoeae (F-62), N. gonorrhoeae (FA-19), L. monocytogenes, and C. albicans for the protegrin peptidesa
  E. coli P. aeruginosa N. gonorrhoeae (F-62) N. gonorrhoeae (FA-19) L. monocytogenes C. albicans
a r2: regression coefficient; rBS2: Bootstrap correlation coefficient; qLOO2: cross-validation by leave one out; qLGO(5 fold)2: cross-validation by leave group out; PRESS: Predictive Residual Sum of Squares; LSE: least square error; rpred2: predictive correlation coefficient of test set; rrand2: mean value of r2 after randomization at 99% confidence interval; Roy et al. validation parameters: rm(LOO)2, rm(test)2, rm(overall)2, Rp2, image file: c5ra14402g-t13.tif, image file: c5ra14402g-t14.tif, Δrm(training)2, Δrm(test)2; Todeschini et al. validation parameter: cRp2.
Total no. peptides 52 28 27 27 31 45
Training set 40 20 21 20 20 34
Test set 12 08 06 07 11 11
PLS 05 03 04 03 03 03
Terms 08 04 04 04 04 06
r2 0.85 0.80 0.81 0.80 0.83 0.85
rBS2 0.83 0.80 0.80 0.80 0.82 0.84
qLOO2 0.63 0.70 0.70 0.71 0.62 0.75
qLGO(5 fold)2 0.65 0.60 0.75 0.67 0.66 0.65
PRESS 0.98 1.13 2.57 2.20 1.43 1.00
LSE 0.02 0.03 0.08 0.07 0.03 0.01
rpred2 0.50 0.54 0.61 0.76 0.65 0.50
rrand2 0.45 0.46 0.46 0.41 0.45 0.45
F-Test 25.90 21.33 24.16 21.33 26.04 31.73
rm(LOO)2 0.85 0.79 0.80 0.80 0.83 0.84
rm(test)2 0.44 0.52 0.68 0.72 0.65 0.53
rm(overall)2 0.48 0.47 0.57 0.59 0.48 0.49
Rp2 0.54 0.47 0.48 0.50 0.51 0.54
image file: c5ra14402g-t11.tif 0.72 0.71 0.72 0.71 0.75 0.77
Δrm(training)2 0.13 0.17 0.18 0.17 0.15 0.14
image file: c5ra14402g-t12.tif 0.24 0.32 0.75 0.63 0.52 0.44
Δrm(test)2 0.38 0.38 0.13 0.18 0.25 0.20
cRp2 0.58 0.52 0.53 0.56 0.56 0.58


Table 4 Comparison between the statistics of the QSAR models for activity against E. coli, P. aeruginosa, N. gonorrhoeae F-62, N. gonorrhoeae FA-19, L. monocytogenes and C. albicans
Sr. no. Species HomoSAR (current work) QSAR ref. 14
n r2 n r2
1 E. coli 52 0.85 55 0.68
2 P. aeruginosa 28 0.80 32 0.67
3 N. gonorrhoeae (F-62) 27 0.81 28 0.51
4 N. gonorrhoeae (FA-19) 27 0.80 27 0.48
5 L. monocytogenes 31 0.83 36 0.63
6 C. albicans 45 0.85 45 0.60


Conclusions

In this paper we have shown that HomoSAR has been able to shed light on the relationship between sequences of protegrin peptides and their activity on six specific micro-organisms. There are clear cut requirements of amino acid attributes at specific positions in the sequence that determine their effectiveness against Gram positive or Gram negative bacteria and fungi. This information can be gainfully used to tailor peptide sequences that will be effective against a specific species of micro-organism. In conclusion, HomoSAR is a simple and straightforward approach to excavate the necessary information at the positional and residual level of the peptide sequences to design new peptide leads. This technique gives a perspective that helps one to optimize peptides and their analogs through an understanding of the relationship between activity and the nature of amino acids at every position in the peptide sequence. The proposed QSAR models not only have an admirable correlation, but also have good predictability on the test set samples. In a nutshell HomoSAR has been able to uncover the necessary traits required for binding of protegrin peptides and analogs to bacterial membranes and thus disrupt their integrity and so accomplish a lethal effect on the micro-organisms.

Acknowledgements

The authors are thankful for the computational and infrastructural facilities provided by the Department of Science and Technology, New Delhi through their FIST program (SR/FST/LSI-163/2003), the Department of Biotechnology, New Delhi (File No. BT/PR11810/BRB/10/690/2009) and the Council for Scientific and Industrial Research, New Delhi (File No. 01/2399/10/EMRII).

Notes and references

  1. M. Zasloff, Nature, 2002, 415, 389–395 CrossRef CAS PubMed.
  2. R. E. Hancock and H.-G. Sahl, Nat. Biotechnol., 2006, 24, 1551–1557 CrossRef CAS PubMed.
  3. L. T. Nguyen, E. F. Haney and H. J. Vogel, Trends Biotechnol., 2011, 29, 464–472 CrossRef CAS PubMed.
  4. M. D. Seo, H. S. Won, J. H. Kim, T. Mishig-Ochir and B. J. Lee, Molecules, 2012, 17, 12276–12286 CrossRef CAS PubMed.
  5. R. E. Hancock, Lancet, 1997, 349, 418–422 CrossRef CAS.
  6. R. M. Epand and H. J. Vogel, BBA, Biochim. Biophys. Acta, Biomembr., 1999, 1462, 11–28 CrossRef CAS.
  7. J.-P. S. Powers and R. E. Hancock, Peptides, 2003, 24, 1681–1691 CrossRef CAS PubMed.
  8. J. S. Mader and D. W. Hoskin, Expert Opin. Invest. Drugs, 2006, 15, 933–946 CrossRef CAS PubMed.
  9. J. H. White, J. Steroid Biochem., 2010, 121, 234–238 CrossRef CAS PubMed.
  10. K. Yamasaki, A. di Nardo, A. Bardan, M. Murakami, T. Ohtake, A. Coda, R. A. Dorschner, C. Bonnart, P. Descargues and A. Hovnanian, Nat. Med., 2007, 13, 975–980 CrossRef CAS PubMed.
  11. V. N. Kokryakov, S. S. Harwig, E. A. Panyutich, A. A. Shevchenko, G. M. Aleshina, O. V. Shamova, H. A. Korneva and R. I. Lehrer, FEBS Lett., 1993, 327, 231–236 CrossRef CAS.
  12. J. Chen, T. J. Falla, H. Liu, M. A. Hurst, C. A. Fujii, D. A. Mosca, J. R. Embree, D. J. Loury, P. A. Radel and C. Cheng Chang, Pept. Sci., 2000, 55, 88–98 CrossRef CAS.
  13. J. A. Robinson, S. C. Shankaramma, P. Jetter, U. Kienzl, R. A. Schwendener, J. W. Vrijbloed and D. Obrecht, Bioorg. Med. Chem., 2005, 13, 2055–2064 CrossRef CAS PubMed.
  14. N. Ostberg and Y. Kaznessis, Peptides, 2005, 26, 197–206 CrossRef CAS PubMed.
  15. D. S. Bolintineanu and Y. N. Kaznessis, Peptides, 2011, 32, 188–201 CrossRef CAS PubMed.
  16. R. L. Fahrner, T. Dieckmann, S. S. Harwig, R. I. Lehrer, D. Eisenberg and J. Feigon, Chem. Biol., 1996, 3, 543–550 CrossRef CAS.
  17. X.-D. Qu, S. Harwig, W. M. Shafer and R. I. Lehrer, Infect. Immun., 1997, 65, 636–639 CAS.
  18. A. Langham and Y. Kaznessis, Mol. Simul., 2006, 32, 193–201 CrossRef CAS PubMed.
  19. D. Bolintineanu, A. Langham, H. Davis and Y. Kaznessis, Mol. Simul., 2007, 33, 809–819 CrossRef CAS PubMed.
  20. S. Yamaguchi, T. Hong, A. Waring, R. I. Lehrer and M. Hong, Biochemistry, 2002, 41, 9852–9862 CrossRef CAS PubMed.
  21. K. L. H. Lam, Y. Ishitsuka, Y. Cheng, K. Chien, A. J. Waring, R. I. Lehrer and K. Y. C. Lee, J. Phys. Chem. B, 2006, 110, 21282–21286 CrossRef CAS PubMed.
  22. Y. Ishitsuka, D. S. Pham, A. J. Waring, R. I. Lehrer and K. Y. C. Lee, BBA, Biochim. Biophys. Acta, Biomembr., 2006, 1758, 1450–1460 CrossRef CAS PubMed.
  23. H. Jang, B. Ma, T. B. Woolf and R. Nussinov, Biophys. J., 2006, 91, 2848–2859 CrossRef CAS PubMed.
  24. H. Khandelia and Y. N. Kaznessis, BBA, Biochim. Biophys. Acta, Biomembr., 2007, 1768, 509–520 CrossRef CAS PubMed.
  25. A. Sayyed-Ahmad and Y. N. Kaznessis, PLoS One, 2009, 4, e4799 Search PubMed.
  26. J. Lee, S. Ham and W. Im, J. Comput. Chem., 2009, 30, 1334–1343 CrossRef CAS PubMed.
  27. H. Rui and W. Im, J. Comput. Chem., 2010, 31, 2859–2867 CAS.
  28. H. Jang, B. Ma and R. Nussinov, BMC Struct. Biol., 2007, 7, 21 CrossRef PubMed.
  29. H. Jang, B. Ma, R. Lal and R. Nussinov, Biophys. J., 2008, 95, 4631–4642 CrossRef CAS PubMed.
  30. A. A. Langham, A. S. Ahmad and Y. N. Kaznessis, J. Am. Chem. Soc., 2008, 130, 4338–4346 CrossRef CAS PubMed.
  31. R. Capone, M. Mustata, H. Jang, F. T. Arce, R. Nussinov and R. Lal, Biophys. J., 2010, 98, 2644–2652 CrossRef CAS PubMed.
  32. Y. Sokolov, T. Mirzabekov, D. W. Martin, R. I. Lehrer and B. L. Kagan, BBA, Biochim. Biophys. Acta, Biomembr., 1999, 1420, 23–29 CrossRef CAS.
  33. D. S. Bolintineanu, A. Sayyed-Ahmad, H. T. Davis and Y. N. Kaznessis, PLoS Comput. Biol., 2009, 5, e1000277 Search PubMed.
  34. D. Bolintineanu, E. Hazrati, H. T. Davis, R. I. Lehrer and Y. N. Kaznessis, Peptides, 2010, 31, 1–8 CrossRef CAS PubMed.
  35. A. A. Langham, H. Khandelia, B. Schuster, A. J. Waring, R. I. Lehrer and Y. N. Kaznessis, Peptides, 2008, 29, 1085–1093 CrossRef CAS PubMed.
  36. V. Frecer, Bioorg. Med. Chem., 2006, 14, 6065–6074 CrossRef CAS PubMed.
  37. M. R. Borkar, R. R. Pissurlenkar and E. C. Coutinho, J. Comput. Chem., 2013, 34, 2635–2646 CrossRef CAS PubMed.
  38. R. R. Pissurlenkar and E. C. Coutinho, Scholarly Res. Exch., 2008, 2008, 1–12 CrossRef PubMed.
  39. J. Verma, V. M. Khedkar, A. S. Prabhu, S. A. Khedkar, A. K. Malde and E. C. Coutinho, J. Comput.-Aided Mol. Des., 2008, 22, 91–104 CrossRef CAS PubMed.
  40. R. R. Pissurlenkar, V. M. Khedkar, R. P. Iyer and E. C. Coutinho, J. Comput. Chem., 2011, 32, 2204–2218 CrossRef CAS PubMed.
  41. D. A. Steinberg and R. I. Lehrer, in Antibacterial peptide protocols, Springer, 1997, pp. 169–186 Search PubMed.
  42. K. T. Miyasaki and R. I. Lehrer, Int. J. Antimicrob. Agents, 1998, 9, 269–280 CrossRef CAS.
  43. W. T. Heller, A. J. Waring, R. I. Lehrer, T. A. Harroun, T. M. Weiss, L. Yang and H. W. Huang, Biochemistry, 2000, 39, 139–145 CrossRef CAS PubMed.
  44. B. Yasin, M. Pang, E. A. Wagar and R. I. Lehrer, Sex. Transm. Dis., 2002, 29, 514–519 CrossRef CAS PubMed.
  45. J. P. Tam, C. Wu and J. L. Yang, Eur. J. Biochem., 2000, 267, 3289–3300 CrossRef CAS.
  46. J. R. Lai, B. R. Huck, B. Weisblum and S. H. Gellman, Biochemistry, 2002, 41, 12835–12842 CrossRef CAS PubMed.
  47. M. A. Larkin, G. Blackshields, N. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm and R. Lopez, Bioinformatics, 2007, 23, 2947–2948 CrossRef CAS PubMed . http://www.ebi.ac.uk/Tools/services/web_clustalw2/, accessed January 2014.
  48. P. Pratim Roy, S. Paul, I. Mitra and K. Roy, Molecules, 2009, 14, 1660–1701 CrossRef PubMed.
  49. I. Mitra, A. Saha and K. Roy, Mol. Simul., 2010, 36, 1067–1079 CrossRef CAS PubMed.
  50. K. Roy, I. Mitra, S. Kar, P. K. Ojha, R. N. Das and H. Kabir, J. Chem. Inf. Model., 2012, 52, 396–408 CrossRef CAS PubMed.
  51. K. Roy, P. Chakraborty, I. Mitra, P. K. Ojha, S. Kar and R. N. Das, J. Comput. Chem., 2013, 34, 1071–1082 CrossRef CAS PubMed.
  52. V. C. R. Todeschini, Handbook of Molecular Descriptors, Wiley-VCH, New York, 2000 Search PubMed.
  53. V. Consonni, D. Ballabio and R. Todeschini, J. Chemom., 2010, 24, 194–201 CrossRef CAS PubMed.
  54. S. Kawashima and M. Kanehisa, Nucleic Acids Res., 2000, 28, 374 CrossRef CAS PubMed.
  55. Y. Cho, J. S. Turner, N.-N. Dinh and R. I. Lehrer, Infect. Immun., 1998, 66, 2486–2493 CAS.

Footnote

Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra14402g

This journal is © The Royal Society of Chemistry 2015
Click here to see how this site uses Cookies. View our privacy policy here.