Maheshkumar R. Borkara,
Raghuvir R. S. Pissurlenkarb and
Evans C. Coutinho*a
aDepartment of Pharmaceutical Chemistry, Bombay College of Pharmacy, Kalina, Santacruz (E), Mumbai 400098, India. E-mail: evans@bcpindia.org; Fax: +91-22-26670905; Tel: +91-22-26670905
bMolecular Simulations Group, Department of Pharmaceutical Chemistry, Goa College of Pharmacy, Panaji, Goa 403001, India
First published on 9th September 2015
Antimicrobial peptides (AMPs) are naturally occurring small peptides which are an innate part of the host's defense mechanism. They are active against both Gram-negative and Gram-positive bacteria, various viruses, fungi, and parasites. There is little consensus in the amino acid sequences of AMPs but evidently they do possess some definite common features, such as relative hydrophobic and a positively charged amphipathic structure that has been associated with the biological activity. Optimization of the activity and specificity of the AMPs using large peptide libraries is a tedious and expensive route. In this venture, QSAR can be used to shed light or reveal the structural features that should be incorporated in the design of new AMPs. However within the realm of QSAR, 3D-QSAR of peptides is an overwhelming task due to the sheer number of conformational degrees of freedom for peptides. To achieve this, we propose the use of a validated 2D-QSAR technique coined HomoSAR that is specifically designed for peptide QSAR. It has the ability to extract all necessary information from a set of peptides to elucidate the underlying structure activity relationships, based on homology principles and similarity techniques. The present work is a comprehensive study on a dataset of protegrin antimicrobial peptides isolated from porcine leukocytes with a broad spectrum of activity against both Gram-positive and Gram-negative bacteria, as well as the fungus C. albicans and HIV-1 virus. The HomoSAR models for antimicrobial activity against six different species highlighted two major determinants of activity; firstly the optimal length of protegrins for exhibiting broad-spectrum antimicrobial activity against bacteria and fungi is 16 residues. Secondly, for antimicrobial activity against the yeast C. albicans, it turns out that it is the electronic property that should be tempered to modulate activity. This is not a major attribute for both Gram negative and Gram positive bacteria.
The general features of AMPs are: they have short sequences of 12 to 50 amino acids in length and are amphipathic in nature with large diversity in their sequences and structures. They have at least two positive charges from the presence of arginine (Arg) and lysine (Lys) residues that enable them to directly act on the cell wall and phospholipid membranes of microorganisms which are negatively charged. The result is an accumulation of the AMPs on the membrane surface, which causes displacement of native Ca2+ and Mg2+ ions. On the other hand the hydrophobic portion of the peptide interacts with hydrophobic components of the membrane and together with the positive charges is responsible for the disruption of the membrane leading to a bacterial death.3,5–7
Bacterial infections are amongst the most common cause of human diseases and the third leading cause of death worldwide and along with the emergence of multi drug resistant (MDR) forms are posing a grave and growing threat to public health. AMPs are currently of great interest as promising alternatives to conventional antibiotics. Their unique mechanism of action lends a low proclivity for development of resistance. Furthermore, AMPs have also been implicated in cancer and inflammatory disorders besides their prominent role in infectious diseases.8–10 Targeting the bacterial cell membrane is a promising approach to combat drug resistance as it is hard for bacteria to develop resistance against the membrane molecules.
Protegrins are small peptides about 16 to 18 amino acid residues in length, isolated from porcine leukocytes.11 There are five known naturally occurring porcine protegrins, PG-1 to PG-5; and several derivatives of protegrins have been synthesized.12–15 The structure of protegrin-1 has been published and is available in the protein data bank with PDB i.d.: 1PG1. It is composed of 18 amino-acids (RGGRL5CYCRR10RFCVC15VGR18) with a high content of cysteine (Cys) and several positively charged arginine (Arg) residues. The NMR structure of PG-1 reveals a β-hairpin structure with two antiparallel β-strands connected by a turn that is stabilized by two interstrand disulfide bonds (S–S) between the cysteine residues Cys6–Cys15 and Cys8–Cys13.16 It has been revealed that protegrins have a broad spectrum of activity against both Gram-positive and Gram-negative bacteria including E. coli, P. aeruginosa, and N. gonnorhoeae, the fungus C. albicans and the HIV-1 virus; this spectrum has attracted researchers to tap its full potential.11,12,17
Several experimental and computational works on the protegrin antibacterial peptides by a number of research groups have attempted to explain the mechanism of action of the protegrins. This includes molecular dynamics simulations of protegrins in micelles,18,19 in lipid bilayer membrane,20–24 thermodynamic calculations and potentials of mean force,25–27 molecular dynamics simulations of protegrin pores28–31 and models of conductance and biological effects of protegrin pores.30,32–34 All these have been reviewed by Bolintineanu and Kaznessis.15 Besides these computational studies that deal with the physical descriptions of the mechanism of action of protegrin, there are some reports on the QSAR approach to explain the biological activity of the peptides based on correlation with its physicochemical properties. Ostberg and Kaznessis14 have carried out a 3D-QSAR study on 55 protegrin and its synthetic analogs active against six microbial species; the reported model correlating activity with the physicochemical properties leaves much to be desired in terms of the statistics. On the same path, Langham et al.35 developed QSAR models to provide insight into the mechanism of the cytotoxic action of protegrin and its analogs but the model does not shed any light on structural activity relationship. A third QSAR analysis of antimicrobial and haemolytic activities of porcine protegrin-1 (PG-1) mimetics-cyclic cationic peptides with β-hairpin fold was reported by Frecer.36 All these studies have not been able to develop a unified model for activity and have not been able to explain the features that distinguish activity against Gram-positive, Gram-negative bacteria and fungi. The unanswered questions in these accounts, suggest the need of a model that can better underlie the basic structure activity relationship for AMPs. With this in focus, we report in this paper, the use of HomoSAR to build models that can delineate features that distinguish Gram-positive from Gram-negative activity and bacterial from fungal activity.
We had reported some new 2D and 3D-QSAR methodologies to help understand the structure activity relationship of peptides, some of these methods are HomoSAR,37,38 Comparative Residue Interaction Analysis (CoRIA) and its variants39 and eQSAR.40 Of these methods, HomoSAR which is a 2D QSAR technique, is simple and straightforward, and has been quite successful in establishing structure activity relationships underlying peptide datasets.37,38 The beauty of this methodology is that the difficulty of obtaining a unique alignment of peptides in 3D space has been reduced to the much simpler alignment in 2D space. This method is an integrated approach that adopts the principles of comparative protein modeling (homology modeling) in conjunction with the QSAR formalism to design and predict the activity of new peptide sequences, irrespective of their size and length. In this study, we have build several models that explain the activity of protegrin peptides and its analogs using the HomoSAR formalism. The schematic representation of the steps involved in the HomoSAR is shown in Fig. 1.
The similarity index (S) for an amino acid at the ‘ith’ position in peptide ‘B’ (query peptide) in relation to peptide ‘A’ (reference peptide) for a given physicochemical property (P) is given by the equation:
![]() | (1) |
![]() | (2) |
![]() | (3) |
In addition three other variables were also calculated
![]() | (4) |
![]() | (5) |
![]() | (6) |
Cross validation by leave one out (qLOO2), leave group out (qLGO2), bootstrap (rBS2), least squares error (LSE) methods were implemented to assess the robustness of the models. The predictive power of the resultant models expressed as rpred2 was measured by external validation on a test set. The possibility of chance correlation was checked by the randomization (or Y-scrambling) test. Additional statistical checks were executed with the computation of some more parameters for the HomoSAR models as proposed by Roy et al.48–51 and Todeschini et al.52,53 viz. rm(LOO)2, rm(test)2, rm(overall)2, Rp2, ,
, Δrm(training)2, Δrm(test)2 and cRp2. These parameters indicate the model robustness when values are greater than or equal to 0.50 and when the Δrm(training)2, Δrm(test)2 values are less than 0.20.
In the current study, we initially calculated similarity indices based on 119 physicochemical properties categorized under hydrophobic, steric and electronic properties for the 20 natural amino acids and then attempted to build a correlation of the properties with the activity. Based on the correlation coefficient and correlation matrix, some properties were identified that were not orthogonal to each other and these were eliminated. A pruned set of 36 descriptors [P] consisting of 12 hydrophobic, 12 steric, 10 electronic and 2H-bond properties were finally used to build the correlation equations. These are descriptors from the amino acid index database.54 The description and accession ID of the amino acid index database is given in Table 1. The five best HomoSAR models (out of 500 models) for the activity of the peptides against the six microbial species are given in Table 2.
Sr. no. | Abbr. | Description | Accession ID |
---|---|---|---|
Electronic properties | |||
1 | eαNH | Alpha-NH chemical shifts (Bundi–Wuthrich 1979) | BUNA790101 |
2 | eCTDC | A parameter of charge transfer donor capability | CHAM830108 |
3 | ePosC | Positive charge (Fauchere et al. 1988) | FAUJ880111 |
4 | eNegC | Negative charge (Fauchere et al. 1988) | FAUJ880112 |
5 | eNetC | Net charge (Klein et al. 1984) | KLEP840101 |
6 | ePol | Polarity (Zimmerman et al. 1968) | ZIMJ680103 |
7 | eIP | Isoelectric point (Zimmerman et al. 1968) | ZIMJ680104 |
8 | eαCH | Alpha-CH chemical shifts (Bundi-–Wuthrich 1979) | BUNA790102 |
9 | eδH | N.m.r. chemical shift of alpha-C-H (Fauchere et al. 1988) | FAUJ880107 |
10 | epKN | pK-N (Fasman 1976) | FASG760104 |
11 | epKC | pK-C (Fasman 1976) | FASG760105 |
12 | eppz3 | Principal property value z3 (Wold et al. 1987) | WOLS870103 |
![]() |
|||
Steric properties | |||
13 | sFlexP | Flexibility parameter for two rigid neighbors (Karplus–Schulz 1985) | KARP850103 |
14 | sGSI | Graph shape index (Fauchere et al. 1988) | FAUJ880101 |
15 | sSTW | STERIMOL minimum width of the side-chain (Fauchere et al. 1988) | FAUJ880105 |
16 | sSCV | Side-chain volume (Krigbaum–Komoriya 1979) | KRIW790103 |
17 | sRGS | Radius of gyration of side-chain (Levitt 1976) | LEVM760105 |
18 | svdw1 | van der Waals parameter R0 (Levitt 1976) | LEVM760106 |
19 | svdw2 | van der Waals parameter epsilon (Levitt 1976) | LEVM760107 |
20 | sSCAT | Side-chain angle theta (AAR) (Levitt 1976) | LEVM760103 |
21 | sAFI | Average flexibility indices (Bhaskaran–Ponnuswamy 1988) | BHAR880101 |
22 | sFlex0 | Flexibility parameter for no rigid neighbors (Karplus–Schulz 1985) | KARP850101 |
23 | sFlex1 | Flexibility parameter for one rigid neighbor (Karplus-Schulz 1985) | KARP850102 |
24 | sRSCV | Residue side-chain volume (Zhou et al. 2006) | — |
![]() |
|||
Hydrophobic properties | |||
25 | hFE | Free energy of solution in water, kcal mol−1 (Charton–Charton 1982) | CHAM820102 |
26 | hAHM | Atom-based hydrophobic moment (Eisenberg–McLachlan 1986) | EISD860102 |
27 | hHV | Hydrophilicity value (Hopp–Woods 1981) | HOPT810101 |
28 | hTEP | Optimized transfer energy parameter (Oobatake et al. 1985) | OOBM850103 |
29 | hSHαh | Surrounding hydrophobicity in alpha-helix | PONP800104 |
30 | hSHβs | Surrounding hydrophobicity in beta-sheet (Ponnuswamy et al. 1980) | PONP800105 |
31 | hFEC | Free energy change of a(Ri) to a(Rh) (Wertz–Scheraga 1978) | WERD780103 |
32 | hGEWpH9 | Unfolding Gibbs energy in water, pH 9.0 (Yutani et al. 1987) | YUTK870102 |
33 | hGEpH7 | Activation Gibbs energy of unfolding, pH 7.0 (Yutani et al. 1987) | YUTK870103 |
34 | hGEpH9 | Activation Gibbs energy of unfolding, pH 9.0 (Yutani et al. 1987) | YUTK870104 |
![]() |
|||
Hydrogen bond property | |||
35 | HbD | Number of hydrogen bond donors (Fauchere et al. 1988) | FAUJ880109 |
36 | HbA | Number of hydrogen bond acceptors (Fauchere et al. 1988) | — |
Sr. no. | Species | HomoSAR models |
---|---|---|
1-E | Escherichia coli | Log activity = 5.36 − 0.32SAB[sGSI][DS] + 0.49SAB[hTEP][5][6] + 0.46SAB[hHV][7][8][9] + 0.45SAB[sRGS][TS] − 0.45SAB[hFEC][6][7][8] + 0.30SAB[eCTDC][10] + 0.13SAB[ePol][10][11] |
2-P | Pseudomonas aeruginosa | Log activity = 6.38 + 0.40SAB[hGEWpH9][TS] − 0.62SAB[eppz3][SS] + 0.66SAB[hFEC][4] |
3-N | Neisseria gonorrhoeae F-62 | Log activity = 4.14 + 1.40SAB[hGEWpH9][8][9][10] + 1.63SAB[hFEC][1][2][3] − 0.90SAB[hFE][1][2] |
4-N | Neisseria gonorrhoeae FA-19 | Log activity = 4.64 + 1.56SAB[sRSCV][TS] + 0.75SAB[hHV][12][13] − 1.44SAB[hHV][SS] |
5-L | Listeria monocytogenes | Log activity = 5.60 + 0.23SAB[hTEP][1] + 0.41SAB[eppz3][10][11][12] + 0.76SAB[hSHαh][TS] |
6-C | Candida albicans | Log activity = 3.90 + 0.44SAB[eppz3][11][12][13] − 0.20SAB[hFE][13][14] + 0.47SAB[ePol][14][15] + 0.48SAB[eppz3][15][16] + 0.88SAB[eIP][5] |
Log activity = 5.36 − 0.32SAB[sGSI][DS] + 0.49SAB[hTEP][5][6] + 0.46SAB[hHV][7][8][9] + 0.45SAB[sRGS][TS] − 0.45SAB[hFEC][6][7][8] + 0.30SAB[eCTDC][10] + 0.13SAB[ePol][10][11] | (7) |
All 500 equations were investigated for the descriptors which evolve with the QSAR models with their respective coefficients signed positively or negatively. The result of this analysis in the form of a bar chart is shown in Fig. 2. The term [eCTDC][10] appears with a positive coefficient with a high frequency in almost all the QSAR equations; indicating that residues with admirable electronic property at this position of the sequence will contribute in improving the activity. Also, the term [hTEP][5][6] appears with a positive coefficient in more than 400 equations signifying that the hydrophobic property of the dipeptide segment encompassing the 5th and 6th positions is important for the antibacterial activity.
![]() | ||
Fig. 2 Frequency of appearance in the HomoSAR models of the physicochemical property associated with different positions in the sequence of E. coli. |
The HomoSAR models for activity against P. aeruginosa, N. gonorrhoeae (F-62), N. gonorrhoeae (FA-19), L. monocytogenes, and C. albicans for the protegrin peptides are shown in Table 2. The schematic representation of the contribution of each term in the HomoSAR model for antimicrobial activity of all the species are depicted in Fig. 3a–f and are discussed here.
From the schematic representations shown in Fig. 3b and e it can be concluded that for good activity against P. aeruginosa and L. monocytogenes, the hydrophobic property calculated for tripeptide segments and summed over the entire length of the peptide is vital, as the terms [hGEWpH9][TS] -the Gibbs energy of unfolding in water at pH 9 and [hSHαh][TS] – the surrounding hydrophobicity of the residue for alpha-helical propensity, appear with positive coefficients. Thus, hydrophobic character should be preserved as in the reference peptide. Also the term [hGEWpH9] makes a positive contribution towards the activity against the Gram negative bacteria N. gonorrhoeae (F-62) for the particular tripeptide segment covering positions [8], [9] and [10].
For antimicrobial activity against the yeast C. albicans, the electronic property specifically at position 5, the tripeptide segment spanning the N-terminal amino acids [11], [12] and [13] and the dipeptide segments [14][15] and [15][16] play a remarkable role as all these terms have positive coefficients in the model as shown in Table 2. With the exception of the eppz3 property for residues 11 to 13, the other properties eIP (isoelectric point) at residue 5, epol at residues 14 and 15 and again eppz3 at positions 15 and 16 should kept similar to the reference. The analysis leads to the conclusion that arginine is an ideal candidate at positions 5, 14 and 15 in the sequence. In this context, it is remarkable that the model for activity against the yeast C. albicans is singular in its character from the models for activity against Gram-negative and Gram-positive bacteria; in the latter cases the electronic property does not have a significant influence on the activity. The descriptor [hFE][13][14] is a hydrophobic term expressing the free energy of solution in water for the dipeptide segment at positions 13 and 14; its negative coefficient means that for good activity dissimilarity with the reference is essential. This requirement can be satisfied by positioning a residue like cysteine at position 13 (seen in most sequences) which will maintain the antiparallel β-sheet structure. This secondary structure is achieved by a disulfide bond with the cysteine at position 8. However, the cysteine at position 13 must be sequentially followed by either an arginine or more preferably a proline that will maintain the β-hairpin bend. This result is in harmony with the results of Cho et al., who by performing alanine substitutions of protegrin 1 peptides and their variants and testing against C. albicans have come to a similar conclusion.55
In the activity models against N. gonorrhoeae (FA-19) and E. coli, the terms [sRSCV][TS] and [sRGS][TS] are volume and radius of gyration of the side-chain, both of which encode the steric property. These terms appear with a positive coefficient indicating that the steric attribute should be similar to the reference peptide for activity against these organisms.
To continue with P. aeruginosa, the activity model has the term [eppz3][SS] which is a sum of the electronic property over the entire peptide. This term appears with a negative showing that such an attribute should be made dissimilar to the reference peptide for this particular Gram-negative bacterial species. This can be achieved if some of the amino acids in the reference are replaced particularly with cysteine.
In all models, it is seen that the physicochemical parameters of amino acid residues at positions 17th and 18th do not play a role in the antibacterial activity; this implies that the optimal length of protegrins for exhibiting broad-spectrum antimicrobial activity against bacteria and fungi is 16 amino acids and this corroborates the result published by Cho et al.55
The statistics of the best HomoSAR models generated for every species of micro-organism in terms of the regression coefficient r2, the internal (rBS2; qLOO2) and external (rpred2) correlation coefficients are satisfactory. The Roy et al. and Todeschini et al. parameters rm(LOO)2, rm(test)2, rm(overall)2, Rp2 and cRp2 were also calculated and found to be in the acceptable range, being greater than 0.50; the Δrm(training)2 and Δrm(test)2 values are less than or close to 0.20; all these values denote that the derived models are robust and not a result of chance correlation. The statistics of the HomoSAR models generated for each species of microorganism are given in Tables 3 and 4, the statistics of the HomoSAR models are placed alongside the statistics for QSAR models reported in the literature.
E. coli | P. aeruginosa | N. gonorrhoeae (F-62) | N. gonorrhoeae (FA-19) | L. monocytogenes | C. albicans | |
---|---|---|---|---|---|---|
a r2: regression coefficient; rBS2: Bootstrap correlation coefficient; qLOO2: cross-validation by leave one out; qLGO(5 fold)2: cross-validation by leave group out; PRESS: Predictive Residual Sum of Squares; LSE: least square error; rpred2: predictive correlation coefficient of test set; rrand2: mean value of r2 after randomization at 99% confidence interval; Roy et al. validation parameters: rm(LOO)2, rm(test)2, rm(overall)2, Rp2, ![]() ![]() |
||||||
Total no. peptides | 52 | 28 | 27 | 27 | 31 | 45 |
Training set | 40 | 20 | 21 | 20 | 20 | 34 |
Test set | 12 | 08 | 06 | 07 | 11 | 11 |
PLS | 05 | 03 | 04 | 03 | 03 | 03 |
Terms | 08 | 04 | 04 | 04 | 04 | 06 |
r2 | 0.85 | 0.80 | 0.81 | 0.80 | 0.83 | 0.85 |
rBS2 | 0.83 | 0.80 | 0.80 | 0.80 | 0.82 | 0.84 |
qLOO2 | 0.63 | 0.70 | 0.70 | 0.71 | 0.62 | 0.75 |
qLGO(5 fold)2 | 0.65 | 0.60 | 0.75 | 0.67 | 0.66 | 0.65 |
PRESS | 0.98 | 1.13 | 2.57 | 2.20 | 1.43 | 1.00 |
LSE | 0.02 | 0.03 | 0.08 | 0.07 | 0.03 | 0.01 |
rpred2 | 0.50 | 0.54 | 0.61 | 0.76 | 0.65 | 0.50 |
rrand2 | 0.45 | 0.46 | 0.46 | 0.41 | 0.45 | 0.45 |
F-Test | 25.90 | 21.33 | 24.16 | 21.33 | 26.04 | 31.73 |
rm(LOO)2 | 0.85 | 0.79 | 0.80 | 0.80 | 0.83 | 0.84 |
rm(test)2 | 0.44 | 0.52 | 0.68 | 0.72 | 0.65 | 0.53 |
rm(overall)2 | 0.48 | 0.47 | 0.57 | 0.59 | 0.48 | 0.49 |
Rp2 | 0.54 | 0.47 | 0.48 | 0.50 | 0.51 | 0.54 |
![]() |
0.72 | 0.71 | 0.72 | 0.71 | 0.75 | 0.77 |
Δrm(training)2 | 0.13 | 0.17 | 0.18 | 0.17 | 0.15 | 0.14 |
![]() |
0.24 | 0.32 | 0.75 | 0.63 | 0.52 | 0.44 |
Δrm(test)2 | 0.38 | 0.38 | 0.13 | 0.18 | 0.25 | 0.20 |
cRp2 | 0.58 | 0.52 | 0.53 | 0.56 | 0.56 | 0.58 |
Sr. no. | Species | HomoSAR (current work) | QSAR ref. 14 | ||
---|---|---|---|---|---|
n | r2 | n | r2 | ||
1 | E. coli | 52 | 0.85 | 55 | 0.68 |
2 | P. aeruginosa | 28 | 0.80 | 32 | 0.67 |
3 | N. gonorrhoeae (F-62) | 27 | 0.81 | 28 | 0.51 |
4 | N. gonorrhoeae (FA-19) | 27 | 0.80 | 27 | 0.48 |
5 | L. monocytogenes | 31 | 0.83 | 36 | 0.63 |
6 | C. albicans | 45 | 0.85 | 45 | 0.60 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5ra14402g |
This journal is © The Royal Society of Chemistry 2015 |