Nicole C. Parsley
a and
Leslie M. Hicks
*b
aVestaron Corporation, Durham, North Carolina, USA
bDepartment of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. E-mail: lmhicks@unc.edu
First published on 11th March 2025
Natural product peptides embody a suite of inherent bioactivities and serve as a template to inspire new chemistries and molecular scaffolds in drug discovery and agrotechnology. Mapping the vast and diverse bioactive peptidome, however, is largely obfuscated by unpredictable molecular transformations in both non-ribosomal sequences and highly post-translationally modified ribosomal protein products. Mass spectrometry is a powerful analytical technique with modern instrumentation achieving unprecedented resolving power, rapid and sensitive gas-phase separations, and versatile multistage fragmentation techniques. As such, mass spectrometry can be (1) leveraged to characterize traditionally difficult-to-sequence natural product peptide modifications via enhanced gas-phase technologies and (2) coupled with complementary ‘Omics’ approaches to predict peptide structure through transcripts, motifs, biosynthetic pathways, and the biomolecular machinery involved in peptide biogenesis. Herein, the challenges of and recent innovations in mass spectrometry towards the discovery and characterization of natural product bioactive peptides are profiled.
Nicole C. Parsley Nicole C. Parsley is currently a peptide chemist and mass spectrometrist at Vestaron Corporation. Dr Parsley was trained in Dr Hicks' laboratory, PhD Chemistry, 2020, and her dissertation research focused on cyclic and disulfide-rich botanical bioactive peptide discovery and characterization with mass spectrometry. |
Leslie M. Hicks Leslie M. Hicks is the Chancellor's Science Scholars Term Professor in the Department of Chemistry at the University of North Carolina at Chapel Hill. Dr Hicks received her B.S. in Chemistry at Marshall University, summa cum laude, and PhD in Analytical Chemistry at the University of Illinois, Urbana-Champaign. |
Non-ribosomal peptides (NRPs) are complex natural products manufactured entirely independent of the ribosome. Megadalton systems of non-ribosomal peptide synthetases (NRPSs) construct the NRP peptide backbone from a vast pool of precursor modules and facilitate “tailoring,” e.g., methylation, oxidation, reduction, formylation, or epimerization. Separate trans-acting BGC-encoded enzymes may add additional modifications on the growing peptide chain or on the natural product peptide after it is released from the NRPS.4 Significant chemical diversity of NRPs emerges from the incorporation of primary metabolite-derived non-proteinogenic, D-, β-, N-methyl or homo amino acids, often hydroxylated, methylated, or halogenated (Fig. 1).5 Promiscuity for structurally-analogous amino acids among NRPS domains, enabling nimble adaptation to changing targets and environmental pressures, produces heterogeneous populations of NRP analogs.5 Additionally, hybrid systems of NRPSs and polyketide synthetases (PKSs), functioning in multienzyme complexes to condense small carboxylic acids into polyketide oligomers, generate increasingly complex NRP-PKS natural products (e.g., lipopeptides) with the ability to access new chemistries and modes of action.
Although mature sequences ultimately present similar post-translational modifications, the challenges associated with the unique RiPP and NRP biosynthetic origins limit bioinformatics in predicting novel structures. While gene-encoded RiPP core sequences are readily accessible through genome-mining, RiPP BGCs are conserved only within RiPP families and thus homology rule-based tools often fail to detect novel RiPPs.5,6 Additionally, the identification of short RiPP precursor-encoding genes through genomic approaches can yield significant false positives given the number of putative short open reading frames within a genome; setting a minimum length threshold may reduce these false positives, but risks excluding legitimate RiPP sequences.6 Efforts to characterize and annotate NRPS BGCs have resulted in increasingly intelligent bioinformatic tools for the prediction of mature NRP structures, however, promiscuity in NRPS enzymes hinders the use of genome mining for complete NRP structure prediction. While RiPPs and NRPs hold the capacity for unique and highly-specialized chemistries attractive to medicinal and agricultural biotechnology, the discovery and characterization of new active molecular species is limited by their complexity, indeterminate variability, and unpredictability beyond a finite genetic script.
Alternatively, mass spectrometry (MS) is a dynamic platform with the speed and sensitivity required to analyze highly variable and previously uncharacterized natural product extracts7 for the detection of mature bioactive peptides with or without genomic or transcriptomic information. Traditional MS characterization methods are challenged by combinatorial additions of post-translational modifications, generating heterogenous molecular populations and increasing source material complexity while decreasing the abundance of any given peptidoform.8 As such, legacy instruments with conventional fragmentation modes, e.g., collision-induced dissociation, and basic data processing are generally limited to the characterization of less complex bioactive peptides, merely scratching the surface of the full repertoire of elusive natural product NRPs and RiPPs. Modern technologies, however, sport increasingly high resolving power, innovative multistage fragmentation methods, and can be coupled with refined bioinformatic strategies for powerful peptidomic analyses; the discovery of novel bioactive peptides relies on these advances in MS proteomics to detect and characterize novel molecular species independent of genetic predictions or to complement bioinformatic ‘Omics’ approaches.
Developments in MSn enable the sequence elucidation of peptides containing structural isomers and populations of complex post-translationally modified peptides inaccessible to traditional mass spectrometric methods. In the absence of genetic information, the discrimination of leucine/isoleucine residues has historically challenged MS-based sequencing of peptide primary structure; inaccurate assignment of leucine/isoleucine can have detrimental effects on protein activity and specificity.11 However, a hybrid multistage mass spectrometry (MS3) approach combining HCD (high-energy collisional dissociation) and ETD (electron transfer dissociation) was demonstrated to unambiguously distinguish leucine and isoleucine residues in proteins and peptides up to 3 kDa, and can be applied to increase the accuracy of de novo sequencing.11 The enhanced peptide backbone fragmentation characteristic of less commonly available hybrid HCD/ETD, or EThcD, has also been leveraged for the de novo sequencing of difficult-to-sequence natural product peptides. In a botanical extract, a novel bioactive peptide, existing in multiple isobaric peptidoforms, challenged conventional characterization by CID fragmentation alone.8 Analysis via EThcD revealed the incorporation of hydroxyproline variably at three different positions along the peptide backbone, enabling comprehensive sequence characterization. Glycopeptides present diverse, heterogeneous populations of covalent N- or O-linked complex carbohydrates and oligosaccharides. Traditionally, glycoproteomics has relied on in vitro enzymatic cleavage of glycans and subsequent mass spectral analyses of glycan chains and associated peptides separately. Although glycan composition can be ascertained, this strategy cannot localize glycan attachment sites and complicates analysis. To reduce data complexity and laborious sample preparation in large-scale N-glycopeptidomics, an EThcD fragmentation approach for high-throughput analysis of intact gylcopeptides was implemented, where both HCD and ETD fragmentation information is collected in a single spectrum.12
Less commonly implemented gas-phase separations, such as ion mobility, coupled to MS can reveal novel molecular species in complex matrices through multidimensional separations complementary to LC-based separations, e.g., high-performance liquid chromatography (Fig. 2). Found in natural product NRPS and RiPPs,13 D-amino acids are typically characterized by nuclear magnetic resonance (NMR), requiring milligram quantities of highly purified material. Although standard methods are unable to discriminate enantiomers solely based on mass-to-charge measurements, increased access to commercially available ion mobility instruments has permitted the development of methods for enantioselection; recent work has demonstrated the use of a modified commercial miniature ion trap to break the chiral symmetry of sugars, amino acids, and small molecule drugs14 with promising future applications to peptides. Additionally, the ability of ion mobility to resolve analytes by collisional cross-section allows for the differentiation of disulfide-rich peptide conformers15 common in natural products, the characterization of which is essential when evaluating the impact of specific disulfide linkages on peptide bioactivity. In a recent study, the highly complex and dynamic peptidome generated by a germinating seed and its microenvironment was profiled on a commercially available hybrid TIMS (Trapped Ion Mobility Spectrometry)-TOF instrument. A comprehensive analysis of germinating Phaseolus vulgaris seed, the common bean, examined peptide variability among eight bean genotypes, identifying >3000 peptides and laying the groundwork for future investigation of bioactive seed-exuded peptides.16
The identification of biosynthetic pathways and machinery can guide the discovery of previously unknown sequences, structures, and post-translational modifications. In RiPPs, NRPs, and hybrid NRP-PKSs, cellular machinery aids or fully orchestrates the synthesis of structurally- and functionally-diverse bioactive peptides via backbone or side-chain cyclization, single, heterogenous, or branched post-translational modifications, and the incorporation of non-proteinogenic amino acids and chimeric glycan side chains. Minimally, MS can be used to deduce unpredictable mature peptide products from BGCs with unknown functions.25 When available, genetic information can be paired with MS for the identification of BGCs to guide bioactive peptide discovery, where numerous bioinformatic platforms are available to facilitate large data analysis. Millions of mass specta in the Global Natural Products Social (GNPS) were searched with Metaminer, a spectral networking tool that integrates natural product MS and metagenomic datasets for RiPP discovery and tolerates unknown modifications, against eight genomic datasets; MetaMiner identified 38 known and unknown RiPPs from diverse sources.26 Interpretation of bacterial genomic data with AntiSMASH, a BGC homology-based tool for the identification of NRPs/PKs and novel BGCs, yielded six RiPP, NRP, and PKS BGCs, enabling the prediction and subsequent mass spectrometric characterization of mature bioactive peptides.27 Non-ribosomal peptide discovery from only four GNPS MS datasets and their associated genomes with NRPminer, a modification-tolerant tool that mines non-canonical NRPS assembly lines, identified four novel non-ribosomal peptide families and 180 NRPs.28 HypoRiPPAtlas, a machine-learning, genomics-derived ‘atlas’ of predicted natural product sequences compared in silico with MS data for RiPP discovery and prediction, searched 46 GNPS MS datasets and identified numerous bioactive RiPPs and a novel post-translational modification.29 Additionally, MS can provide mechanistic and structural insights into BGCs; transcriptomics targeting BURP-domain peptide-motifs revealed a novel bicyclic peptide cyclase, and the mechanism of catalytic activity was monitored with bottom-up proteomics.30,31
Beyond bioactive peptide discovery from isolated natural product extracts in vitro as discussed herein, further characterization of bioactive peptides via the complex relationships among peptide structure and function, localization within source material, and cellular/protein targets can be probed through additional, sophisticated MS strategies. Hydrogen/Deuterium eXchange Mass Spectrometry (HDX-MS) monitors hydrogen/deuterium exchange kinetics to deduce protein structure and conformational dynamics, e.g., cis–trans isomerization of the protein backbone. Antimicrobial peptides often exert bactericidal effects through the disruption of microbial membranes; HDX-MS can be used to examine the interactions and structural changes of proteins upon membrane recruitment37 and may be leveraged in future studies to examine the mode of action of membrane-acting peptides. Matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) has been implemented to spatially localize endogenous peptides in botanical tissues,38 with the potential to further our understanding of the functionality of bioactive peptides within its source organism. Cross-linking mass spectrometry (XL-MS) is a maturing technique where interacting proteins are covalently linked prior to MS analysis, and has promising future applications in the elucidation of interactions among bioactive peptides, intracellular protein interactions, and mechanism of action.39 As a standalone technique or in tandem with powerful ‘Omics’ strategies, mass spectrometry is a rapidly-evolving, dynamic platform with diverse applications across the discovery, primary sequence identification, and structural characterization of natural product bioactive peptides.
This journal is © The Royal Society of Chemistry 2025 |