Edgar
López-López
*ab,
Oscar
Robles
c,
Fabien
Plisson
d and
José L.
Medina-Franco
*a
aDIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, 04510 Mexico City, Mexico. E-mail: elopez.lopez@cinvestav.mx; medinajl@unam.mx
bDepartment of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Section 14-740, 07000 Mexico City, Mexico
cMedicinal Chemistry and Chemogenomics Laboratory, Faculty of Bioanalysis-Veracruz, Universidad Veracruzana, 91700 Veracruz, Mexico
dDepartment of Biotechnology and Biochemistry, Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Irapuato Unit, Irapuato 36824, Mexico
First published on 1st September 2023
Peptide structure–activity/property relationship (P-SA/PR) studies focus on understanding how the structural variations of peptides influence their biological activities and other functional properties. This knowledge accelerates the rational design and optimisation of peptide-based drugs, biomaterials, or diagnostic agents. These studies examine peptide structures from their primary sequences, essentially encoded from the 20 amino acids. Current approaches often exclude peptide libraries with post-translational and synthetic modifications. The molecular fingerprint MAP4 was recently developed to map complex molecules' sequence/structure diversity, including peptides. This study used structure–activity landscape modelling to conduct the P-SA/PR studies of an exemplary dataset of 223 antimicrobial peptides against methicillin-resistant Staphylococcus aureus (MRSA). To this end, we employed the MAP4 fingerprint to represent the chemical structures of the peptides, study their relationship(s) with the antibacterial activity, and seek the potential activity cliff(s). We identified critical residues and structural motifs that play a crucial role in the anti-MRSA activity of the peptides. This is the first computational study to systematically explore the activity landscape of peptides with non-canonical residues, emphasising the quantification of structural similarity.
A central goal for computational peptide design is to create novel sequences that carry the underlying properties of natural peptides with defined structural and functional properties. Multiple informatic approaches have proven helpful in accelerating peptide design learning from their sequences or tridimensional structures.6,7 In addition, the automation of peptide synthesis on a solid support or the heterologous expression of proteins across biological systems has reduced production costs, making peptide space exploration accessible. These in silico methods predominantly learn from primary sequences from sizeable datasets rather than their structures due to the high costs associated with solving structures experimentally.8 Yet, current sequence-based approaches need to systematically study PTMs that can significantly affect the physicochemical, chemical, or biological properties of peptides.9 Chemoinformatics (also called in the literature “cheminformatics” or “chemical informatics”)10 and bioinformatics are independent disciplines regarding the focus of their study. The former focuses on small molecules, whereas the latter focuses on using computational methods to address biological entities. Computationally, one key difference between both disciplines is how chemical structures are represented and handled. In biology, the chemical structures are usually large (e.g., proteins, nucleic acids, receptors). They are described in strings of letters or Cartesian coordinates (i.e., Protein Data Bank11), unlike low molecular weight compounds that are encoded in various molecular fingerprints.12 However, some chemical structures are at the interface between traditional small molecules used extensively in drug discovery and proteins and nucleic acids in the biological realm. Peptides exemplify these chemical structures; they vary in size, ranging from small molecules to large proteins.
In recent years, new methodologies and technologies have reduced the literacy gap between chemoinformatics and bioinformatics. New molecular representations based on atom connectivity allow the systematic study of complex molecules that could be applied to mapping the structural diversity of peptides. They may help in understanding the roles of PTMs in their physicochemical properties or biological activities.12 Different computational strategies to develop peptides are based on analysing sequence alignments and physicochemical similarity metrics.13 However, only some post-traditionally modified peptides and their functional measurements are documented, limiting the use of alignment algorithms and the prediction of secondary structures.14 Recent computational methods have contributed to decoding the structure–activity/property relationships (SA/PR) of peptides (P-SA/PR).15,16 A growing number of methods based on primary sequences or derived physicochemical features of peptides (e.g., machine-learning methods, the de novo design, linguistic modelling, pattern insertion methods, and genetic algorithms)17 represent new research opportunities to explore P-SA/PR and guide a new era of peptide-based drug design. Computational drug design approaches have decoded the physicochemical and sequence–activity relationships on peptides.18,19 Such approaches remain to be applied, describing the relationships between small structural changes and their specific biological activity.
Physicochemical properties are commonly used to compare, filter, and classify molecular structures of pharmaceutical interest.20,21 They generally describe global changes in contrast with more localised structural conformations, small chemical changes, or fold peptide differences. Consensus similarity metrics were recently implemented to compare peptide structures considering features including tridimensional structure, topology, backbone structure, drug-like properties, amino acid sequence, and molecular fingerprints.17,22 From a conceptual point of view, combining physicochemical, chemical, sequence and structural descriptors commonly used in chemoinformatics and bioinformatics would provide a comprehensive picture of the peptides.23 For example, different authors recently demonstrated that predicting properties and designing new peptide structures from the consensus description of known peptidic information24–27 states that not all similar peptides conserve identical properties. Such a highlight is related to the activity cliff concept frequently used in chemoinformatics,28 two or more peptides with high structural similarity but distinct functional measurements. Activity cliffs have decoded the SA/PR studies of linear and circular peptides against different endpoints.29,30 Also, the presence of activity cliffs in datasets reduces the performance of predictive models by challenging their ability to capture precise relationships between chemical structures and biological activity and their generalisability to new compounds.31,32 The novel circular and topological fingerprint MAP4 is more sensitive to identifying small structural changes in complex molecules, especially in peptides, than conventional fingerprints used in small molecule drug discovery, such as MACCS keys, ECFP4, or ECFP6.33 Additionally, MAP4 fingerprint has opened the opportunity to create and navigate in a representative chemical space of more complex peptides,34,35 and it has been used to improve the performance of artificial intelligence algorithms to predict peptidic properties.36
This study introduces a new approach to exploring and describing the activity landscape of peptides with non-canonical residues, including PTMs. Our case study uses an exemplary dataset of 223 peptides with reported activity against methicillin-resistant Staphylococcus aureus (MRSA) strains.37 It is considered one of the most critical global health threats due to its high pandemic potential.37–41 Namely, the MRSA strains create an emerging challenge for health systems by increasing the costs associated with the recovery of patients.42 Epidemiologically, MRSA stands out for efficient dissemination and establishment in environments as diverse as hospitals and communities and is related to different types of productive livestock, whose repercussions range from human health to food production and safety.43,44 Visualising peptide structure–activity/property relationships studies using MAP4 fingerprint accelerates the rational design and optimisation of bioactive peptides, e.g., anti-MRSA peptides. The present approach allows (1) mapping the anti-MRSA peptides sequence and studying their structural diversity (using similarity metrics based on MAP4 fingerprint) and (2) visualising peptide activity cliffs in low-dimensional space (using an extension of a structure–activity similarity map). To this end, we employed an atom-connectivity fingerprint recently developed and well-suited to represent peptides. We also discuss an interpretation of the peptide activity cliffs.
The methodology used in this study is a distinctive manner to represent the non-canonical modifications on peptides that, compared with traditional peptide sequence alignments or other similarity metrics, offer a more realistic structure–activity approximation.
Fig. 1 Graphical representation of a Structure–Activity Similarity (SAS) map (A) and an extension of a SAS map (B). A SAS map is based on a pairwise comparison of each compound on a data set. Each data point in the graph in the map represents a pair of compounds. SAS map is based on the activity differences of the pair of compounds against a specific biological endpoint and their molecular distance. (A) Map with four regions: (I) identifies a pair of compounds with low activity difference and low molecular distance (also called scaffold or R-hopping, or similarity cliffs); (II) represents a pair of compounds with low activity difference and higher molecular distance (smooth SAR cases); (III) represents pair of compounds with higher activity differences and higher molecular distance (activity cliff); and (IV) represent pair of compounds with a discontinuous SAR.47 (B) An extension of the conventional SAS map (extended SAS map) implemented in this study adds the molecular weight differences as a new axis. |
SAS maps generated in this study represented all 25185 pairwise comparisons between the 223 peptides. The map displayed structure similarity with the MAP4 fingerprint and the MinHashed distance33 on the X-axis. The Y-axis showed the activity difference using the pMIC50 values of each peptide pair. The Z-axis expressed differences in molecular weight between each pair of peptides. The data points in the SAS maps were further coloured by their SALI value. This index quantifies the activity landscape using the expression proposed by Guha and Van Drie11,48 (eqn (1)):
SALIi,j = |Ai − Aj|/1 − sim(i,j) | (1) |
Fig. 3 Structural and sequence similarity between the 223 anti-MRSA peptides: (A) modified (extended) structure–activity similarity map; each sphere represents a pairwise comparison of the chemical structure (quantified utilising MinHassed distance/MAP4 fingerprints), activity difference, and molecular weight difference. The spheres are coloured according to the SALI values using a continuous scale from low (blue) to high (red) values. An interactive visualisation has been implemented using the DataWarrior software; see File S1 in the ESI.† (B) Sequence alignment and (C) summary characterisation of 11 representative peptide activity cliffs (pairs). SALI: Structural–Activity Landscape Index. |
Fig. 4 illustrates the structural similarity between additional representative peptide pairs 12–15. The fingerprint-based similarity protocol allows the identification of small structural changes (pair 12), changes in a unique amino acid sequence (pair 13), N-terminal modifications (pair 13), multiple amino acids changes (pair 14), and structural changes associated with post-traditional modifications (pair 15).
Fig. 4 Representative anti-MRSA peptide pairs 12–15. Chemical changes observed between each peptide pair are coloured in red, whereas shared chemical structures are depicted in black. |
In contrast, the peptides AP02565 and AP02567 (pair 2 in Fig. 3) are structurally different, e.g., 69% of AA sequence identity and 0.561 fingerprint-based similarity, and were located farthest apart compared to the peptide pair 1. The TMAP representation illustrates the peptide pairs' subtle and complex structural relationships. For example, the peptide pair 13 (AP00166 and AP00883) presented multiple AA changes and N-terminal modifications, whereas AP03059 and AP03481 (pair 15) only differed by forming a disulfide bond.
The peptide pairs 1–11 had a medium-to-high structural similarity (AA sequence identity between 22–100%) but were associated with a significant change in their pMIC50 values. However, the fingerprint-based similarity values (measured with MAP4) positively correlated with identity values (R2 = 0.31, Fig. S4 in the ESI†). Moreover, fingerprint-based similarity values showed a higher inverse correlation (−0.12) with the activity difference values of each peptide pair compared to their identity values (−0.06). Higher similarity or identity values were correlated with lower activity difference values. This observation suggests that fingerprint-based similarity measures complement the insights derived from sequence alignments but do not replace them. Similarity metrics explore the atom-connectivity in peptides, while sequence identity describes the residual differences. Therefore, using the small structural/sequence changes in peptides helps to rationalise the peptide structure–property relationships.
Fig. 6 Conformational differences between selected peptides are studied in this work. Each peptide is represented with a different colour: red (AP02565), green (AP02566), blue (AP02567), yellow (AP03010), cyan (AP03022), and orange (AP03311). The tridimensional representation of each peptide was modelled by PEP-FOLD.76 |
In summary, these results suggested that the structural similarity calculations based on MAP4 fingerprint and MinHashed function provide a means to explore the peptide activity landscape. Methods such as extended SAS maps and SALI enable the landscape study of the 223 anti-MRSA peptides, rapidly uncovering small structural changes associated with significant modifications in the pMIC50 values. However, this methodology is general and could be adapted to study any other properties of peptides, i.e., P-SA/PR. We noted that TMAPs helped visualise different features of peptides' peptide property landscape. Nevertheless, it is essential to acknowledge that TMAPs, similar to other visualisation techniques, rely significantly on structural representation (such as a molecular fingerprint) and are mainly influenced by the relative size of peptides. Therefore, we recommend limiting the usage of TMAP visualisation to peptides of similar size ranges.
For this reason, it is crucial to use novel approaches to quantify and understand their SA/PR studies. Additionally, antimicrobial peptides (AMPs) containing non-canonical amino acids present several advantages over their canonical counterparts (e.g. higher solubility, higher target affinity, higher stability). One of the earliest reported benefits is the improved bioavailability by reducing proteolytic degradation, achieved by incorporating D-amino acids at protease cleavage sites.66 AMPs with non-canonical amino acids also enhance selectivity by offering a broad range of structures and functionalities not present in the twenty canonical amino acids.67
Peptide analysis heavily relies on the alignment of canonical AA sequences in FASTA format,68 as pinpointing the specific position of active motifs is crucial for SAR analysis.69 However, aligning sequences becomes a challenge when dealing with peptides containing non-canonical amino acids, and we have suggested adapting the SMILES code66 to analyse these peptides. We aim not to replace existing alignment techniques but to complement them. We intend to establish a more robust methodology for identifying highly potent sequences and motifs by analysing both canonical and non-canonical groups within a global screening. Activity and property landscapes have been extensively studied for small organic molecules using structural fingerprints to quantify the similarity of chemical compounds. However, due to the lack of a robust molecular fingerprint to represent peptides, the activity/property relationships had not been developed for (short) peptides. However, new fingerprints like MAP4 and notations based on hierarchical editing language for macromolecules (HELM) have opened new avenues to unifying the complex chemical diversity (from small molecules to peptides without non-canonical residues) under a single representation.33,70 Using unifying fingerprints (like MAP4) and notations (like HELM) allows us to explore beyond the canonical realm of peptides, including PTMs or synthetic elements, to the peptide chemical space.
As mentioned in the introduction, identifying activity cliffs is a key element in designing molecules. In this case, the study of important structural motifs in peptides makes it possible to identify those non-canonical amino acids or modifications that give certain properties (e.g., anti-MRSA activity) to peptides. That is, it allows rationalizing which structural motifs or non-canonical modifications could be added to other peptide sequences to hybridize them and improve their already known properties or conversely, eliminate those molecular portions that eliminate the desired property.
Meanwhile, bioinformatics approaches enable the identification of activity cliffs using the sequence alignment of peptides. Chemoinformatics approaches (e.g., molecular similarity metrics) based on topologies, connectivity, tridimensional features, and molecular properties offer a new alternative to studying more complex molecules,71 like peptides. Additionally, a previous study using different fingerprints (e.g., MACCs keys, ECFP4, ECFP6, and atom pairs) permitted the construction of peptide landscapes using unique peptides with the same number of amino acids.72 In contrast, this work shows an application of methods typically used in chemoinformatics to study small organic molecules to study the P-SA/PR studies using the concept of activity/property landscapes.
The anti-MRSA peptide landscape explored in this work (Fig. 3A) indicated a total of 16953 (∼67.31%) peptide pairs in quadrant I (scaffold or R-hopping peptides); 152 (∼0.60%) in quadrant II (smooth SAR peptides); 8055 (∼31.99%) in quadrant III (peptide activity cliffs, like pairs 2–5); and 25 (∼0.10%) in quadrant IV (peptides without a with a discontinuous SAR, like pair 1). Namely, most of a third part of the peptide pairs have considered activity cliffs, which could limit the model ability of these data sets to develop a predictive model of anti-MRSA activity. We pointed out a direct SA/PR study based on pairwise comparisons could be established. For instance, in peptide pair 1 (Fig. 3), the terminal phenylalanine (Phe/F) could be associated with their biological activity difference. This aligns with X-ray diffraction studies, which suggested that a terminal Phe residue in peptide structures could enhance the stability of the helical conformation.73 Furthermore, He et al. confirm that the activity of antimicrobial peptides depends on the strength of their helical structure.74 Therefore, the protocol presented here to describe the activity landscape of the 223 anti-MRSA peptides could identify minor peptide differences involved in their activity.
Another critical example that remarks the impact of one unique AA change on the peptide structure/sequence is the peptide activity cliff 2 (AP02565–AP02567, Fig. 3), which suggests a crucial role of asparagine (Asn – N). Their tridimensional model (as generated with PEP-FOLD) (Fig. 6) reflects the impact of this AA change on the stability of the helical peptide structures. Additionally, quantum methods confirm this observation and remark on the importance of Asn on peptide reactivity.75
Although the predicted tridimensional structures of the peptides forming activity cliffs are similar (pairs 3 (AP03010–AP02656); 4 (AP03010–AP03022); and 5 (AP03010–AP03311) in Fig. 3 and 6), their values of TPSA are different which suggest changes in their solubility (Table S2 in the ESI†). Such differences could be associated with changes in their biological activity.47 Additionally, the differences between the cationic area53,55,77,78 (involved in the membrane interaction on MRSA strains) of each peptide pair could be associated with their variations in biological activity (Table S2 in the ESI†).
These results indicate the dependency of the activity cliffs with the descriptors used to quantify the similarity between pairs of peptides.79 For example, using the instead of the MAP4 fingerprint as a descriptor, the peptide pairs 2–5 would no longer be considered activity/property cliffs. Namely, these results indicate that the anti-MRSA activity does not depend uniquely on the peptide sequence and the features encoded on MAP4 fingerprints. The anti-MRSA activity also depends on other criteria, like the tridimensional similarity and the physicochemical properties. We remark that selecting the molecular representations is crucial in decoding the P-SA/PR. The same applies to virtually any other computational study: structure representation is vital.
During the past five years, the concept of SA/PR has been adapted to design and develop novel peptidic entities. The idea of P-SA/PR has been used to discover and create lipopeptides and cyclic peptides80,81 and decode the membranolytic mechanism of different peptides.82 However, there are complex challenges to resolve towards consolidating the in silico peptide design area.17,61–64,68 Limited access to quality data and the balance of active and inactive reports make generating new information and knowledge challenging. However, methods that prioritise the selection of the most representative structure could resolve (almost in part) this issue. Additionally, implementing the “Sequence–Structure–Function relationships” concept on peptides is a crucial step forward to exploiting the potential of peptide data. Besides, the biological issues (i.e., immunogenicity, proteolytic degradation, permeability, and toxicity) have been superficially explored.
Current methodologies used to study P-SA/PR have limitations, and the activity landscape approximation presented in this work is no exception. The fingerprint-based similarity (using MAP4 and the MinHassed distance) is a new method to explore and describe the landscape of any peptide property. However, the results of this study suggest that this methodology could be highly sensitive to structural changes on peptides with less than 20 residues, which could limit their applicability, and remarks on the importance of developing new molecular representations focused on peptides. For this reason, we recommend using multiple criteria and methodologies to understand the P-SA/PR. For example, a combination of activity landscape approaches, classical alignment sequence analysis, and 3D approximation help decode the P-SA/PR studies. The present work contributes to establishing a helpful workflow based on structure similarity metrics to explore P-SA/PR and quickly identify non-canonical peptide activity cliffs.
The primary perspective of this research is to utilise fingerprint-based similarity calculations to create consensus virtual screening protocols. These protocols incorporate various factors, including 2D and 3D structure similarity, chemical properties similarity, and sequence identity. The objective is to identify peptide structures that possess specific properties. Additionally, the methodology outlined in this study would be applied to curate peptide datasets helpful in developing artificial intelligence techniques for predicting peptide properties. Finally, molecular similarity landscapes of non-canonical peptides allow the possibility to study, decode, and optimise multiparametric properties in parallel, such as classical multiparametric landscapes, like DAD maps (Dual Activity Differences maps).83,84
Footnote |
† Electronic supplementary information (ESI) available: Fig. S1. Overview of the protocol implemented in this work; Fig. S2. Descriptive analysis of the 223 anti-MRSA peptides studied in this work; Fig. S3. Alignment analysis of the 223 anti-MRSA peptides studied in this work; Fig. S4. Correlations of identity values and fingerprint-based similarity values; Fig. S5. Alignment analysis of the 20 most potent anti-MRSA peptides. See DOI: https://doi.org/10.1039/d3dd00098b |
This journal is © The Royal Society of Chemistry 2023 |