Bhaskar
Bhushan
a,
Daniele
Granata
*b,
Christian S.
Kaas‡
b,
Marina A.
Kasimova
b,
Qiansheng
Ren
c,
Christian N.
Cramer
b,
Mark D.
White
a,
Ann Maria K.
Hansen
d,
Christian
Fledelius
d,
Gaetano
Invernizzi
b,
Kristine
Deibler
e,
Oliver D.
Coleman
f,
Xin
Zhao
c,
Xinping
Qu
c,
Haimo
Liu
c,
Silvana S.
Zurmühl
b,
Janos T.
Kodra
b,
Akane
Kawamura
*af and
Martin
Münzel
*b
aDepartment of Chemistry, Oxford University, Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK. E-mail: Akane.Kawamura@chem.ox.ac.uk
bGlobal Research Technologies, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark. E-mail: myzm@novonordisk.com; dngt@novonordisk.com
cNovo Nordisk Research Center China, Novo Nordisk A/S, Shengmingyuan West Ring Rd, Changping District, Beijing, China
dGlobal Drug Discovery, Novo Nordisk A/S, Novo Nordisk Park, 2760 Måløv, Denmark
eNovo Nordisk Research Center Seattle, Novo Nordisk A/S, 530 Fairview Ave N # 5000, Seattle, WA 98109, USA
fSchool of Natural and Environmental Sciences, Chemistry, Newcastle University, Bedson Building, Kings Road Newcastle University Newcastle Upon Tyne, NE1 7RU, UK. E-mail: Akane.Kawamura@ncl.ac.uk
First published on 24th February 2022
In any drug discovery effort, the identification of hits for further optimisation is of crucial importance. For peptide therapeutics, display technologies such as mRNA display have emerged as powerful methodologies to identify these desired de novo hit ligands against targets of interest. The diverse peptide libraries are genetically encoded in these technologies, allowing for next-generation sequencing to be used to efficiently identify the binding ligands. Despite the vast datasets that can be generated, current downstream methodologies, however, are limited by low throughput validation processes, including hit prioritisation, peptide synthesis, biochemical and biophysical assays. In this work we report a highly efficient strategy that combines bioinformatic analysis with state-of-the-art high throughput peptide synthesis to identify nanomolar cyclic peptide (CP) ligands of the human glucose-dependent insulinotropic peptide receptor (hGIP-R). Furthermore, our workflow is able to discriminate between functional and remote binding non-functional ligands. Efficient structure–activity relationship analysis (SAR) combined with advanced in silico structural studies allow deduction of a thorough and holistic binding model which informs further chemical optimisation, including efficient half-life extension. We report the identification and design of the first de novo, GIP-competitive, incretin receptor family-selective CPs, which exhibit an in vivo half-life up to 10.7 h in rats. The workflow should be generally applicable to any selection target, improving and accelerating hit identification, validation, characterisation, and prioritisation for therapeutic development.
We chose the type 2 GPCR glucose-dependent insulinotropic peptide receptor (GIP-R) as a challenging and clinically relevant test case.6 Its natural ligand GIP and the close relative glucagon-like peptide 1 (GLP-1) are incretins, i.e., hormones which are secreted after oral nutrient uptake and which augment glucose-dependent insulin release from pancreatic beta cells.7 GLP-1 receptor agonists have been reported to not only have effects in metabolic disease through increasing insulin release, but also promote satiety by delaying gastric emptying, and are being investigated for their protective effects in obesity,8,9 heart disease,10 and diabetic kidney disease.11 Conversely, the biology of GIP is less clearly understood, and both GIP-R agonists12 as well as antagonists13,14 are under investigation as potential therapeutics in the fields of type 2 diabetes and obesity. Previous approaches to target the GIP-R thus far have made use of analogues of GIP itself,15,16 or GIP R specific antibodies14,17–20 and peptide display work was based on dual GIP-GLP-1 analogue libraries.14,19,20 As such, the GIP-R represented an ideal target for our mRNA based workflow as we strived to identify potent, ligand competitive, and subfamily selective de novo binders, which could provide valuable tools to understand GIP-R biology and serve as starting points for drug discovery efforts.
We employed mRNA display to identify cyclic peptide ligands towards the biotinylated extracellular domain (ECD) of the human GIP receptor (residues 22–138 of hGIP-R), which has been shown to retain binding to GIP.21 Initially, we established conditions for efficient cyclisation by disulphide bond formation for representative library peptides (Fig. S3†), before conducting iterative rounds of mRNA display5,22 using a nucleotide library encoding two conserved cysteines for macrocyclization flanking a region of 4 to 12 random amino acid sequence. Negative selections were performed against biotin-loaded streptavidin-functionalised magnetic beads, followed by positive selection against bead-immobilised hGIP-R ECD. Sufficient library enrichment was obtained for hGIP-R after five rounds of selections (percentage recovery relative to input > 0.1%) (Fig. S4a†). NGS of enriched output cDNA was carried out and peptides were clustered by similarity for each selection round (R1–R5). All sequences exceeding six reads in R5 (the top 3160 sequences) were compared against each other using pairwise local alignments in order to derive a complete distance matrix. Single-linkage hierarchical clustering was then performed on the distance matrix, choosing a threshold for cutting the relative dendrogram into clusters based on a statistical optimality criterion.23 For this dataset, the sequence similarity threshold for assigning a sequence to a cluster was set at 0.38, and the fewest number of members that a cluster was allowed to contain was 20. This generated 13 clusters (assigned letters A through M), where the largest cluster, A, contained 1880 sequences, while the two least populated clusters L and M contained 21 sequences each, and the unclustered sequences were assigned to a “noise” cluster (termed cluster 0), encompassing 263 sequences (Fig. 1).
Fig. 1 High throughput pairwise clustering analysis of 3160 Round 5 output sequences selected against hGIP-R ECD. Lighter colour indicates closer similarity between individual sequences (ESI Data 1†). |
In the next step, we selected peptides from each cluster (A-M, 0) based on sequence diversity and abundance in R5 for parallel solid-phase peptide synthesis (SPPS) in a 96 well format. The N-terminus was capped with an acetyl group (Ac) to mimic the fMet in mRNA-display and a C-terminal FLAG tag was added to each peptide to ensure peptide solubility in buffer by serving as a source of negative charge, analogous to C-terminal mRNA that is present during the selection screens.24 Peptides were assigned identifiers with cluster ID and their rank order based on their abundance in R5 (e.g. B_5). Following synthesis and cleavage, the peptides were macrocyclized via disulphide bond formation in 20% DMSO in buffer.25 Peptide identity, purity, and complete macrocyclization was confirmed by UPLC-MS and peptide concentrations were determined by UPLC-CAD (ESI Data 3†). For the initial high throughput characterisation, peptides were used without further purification (average purity 50–70%). To establish the binding properties, we performed high-throughput single-concentration biolayer interferometry (BLI) measurements, using biotinylated hGIP-R ECD immobilised on streptavidin-functionalised biosensors. Select peptides with favourable binding potencies were resynthesized, purified and their Kd determined by multiple concentration BLI measurements (Fig. 2). The results were generally in good agreement with the single concentration data.
We identified several peptides (>50% of our panel) with nanomolar binding affinities toward the hGIP-R from multiple sequence clusters, particularly from the two most abundant clusters A and B (Fig. 2a and b). However, potent binders were also identified in less abundant sequence families. In particular, peptide M_46 was the second most potent binder amongst the resynthesized peptide panel, while only accumulating 0.2% of the reads compared to the most abundant peptide A_0. This highlights the benefit of following up on peptide hits independent of their ranking based on sequencing reads. Encouragingly, weak-to-no binding was observed from peptide members of cluster 0 (Fig. S8†), corresponding to bioinformatic ‘noise’ in the NGS data, which likely consists of singleton sequences and possible sequencing errors. The FLAG tag itself did not bind to hGIP-R ECD. For members of clusters containing an internal cysteine, e.g. peptide C_24, we generated single cysteine-to-methionine mutants for each site (Met2, Met8, and Met13) and deduced the disulphide pattern from the binding data (i.e. a bond between Cys2 or Cys8 for C_24, see Fig. S8†). In the spirit of the high throughput nature of the workflow, we did not follow up on peptides which failed during parallel synthesis, as the goal was the identification of one hit sequence. As a consequence, we suggest that the pursuit of peptides with complex folds or unpaired cysteines is only indicated if no functional hits can be identified otherwise.
Fig. 2 (a and b) Single concentration hGIP-R ECD binding Kd values and dissociation rates for the most abundant peptide clusters, and corresponding abundances/reads of peptide sequences in the final round of selection against GIP-R ECD. All peptides contain an N-terminal Ac group and C-terminal FLAG tag and are cyclised as disulphides. (c) Radiolabelled [125I]-GIP displacement from hGIPR#5/BHK Creluc 2P cells by cyclic peptides at a single concentration of 1000 nM CPs (x axis), and 100 nM CPs (y axis) (CPM = counts per minute). All peptides were tested crude without prior purification (ESI Data 2†). |
The non-biased selection process means that high affinity CPs can be generated against any sites on hGIP-R ECD. We hypothesized that each cluster represents peptides which bind at a specific target binding site, with the possibility of different clusters having overlapping or distinct binding sites on the receptor. In this study, we were interested in identifying inhibitory peptide families which are able to either compete or disrupt the natural ligand, whether through direct competition or allosteric binding. To investigate this, we selected representative CPs from each cluster and performed competitive displacement assays with 125I-GIP in BHK CreLuc 2P cells stably expressing hGIP-R. Notably, this assay shows (competitive) binding to the full length GIP-R, whereas our selection and initial screen had been performed against the ECD. As shown in Fig. 2c, only peptides from clusters B and M were found to displace 125I-GIP from hGIP-R (as measured at peptide concentrations of 100 nM and 1000 nM). Interestingly, cluster B contains a LWPF motif at the C-terminus, while cluster M has a related LPWF motif at the N-terminus, indicating a potentially conserved binding site. None of the members of other clusters were found to displace 125I-GIP, even those which were determined to have nanomolar binding affinities to the hGIP R ECD by BLI, including the most abundant peptide family of the selection (cluster A), which covered over 50% of the whole dataset. This highlights the value of the clustering approach, as enrichment of the sequences during mRNA display is driven only by binding affinities of encoded CPs to hGIP-R ECD, but our subsequent clustering analyses served to reveal different functionalities of these CPs. Thus, the high-throughput prioritisation approach is likely to have a higher chance of identifying functional ligands, rather than those prioritised based on observed amounts of NGS reads.
Encouraged by these results, we focussed our subsequent work on cluster B, namely the structurally most diverse members B_3, B_5, and B_68. B_68 had < 20% purity in the 96 well synthesis and did not reveal any binding in the initial high throughput binding assay but was included in the follow up. Further analysis of the sequence list showed another structurally diverse member of cluster B, B_1275, which we hypothesized would also exhibit potent binding and GIP competition, despite the low levels of sequencing reads. These four sequences were scaled up by traditional SPPS and further investigated in multi-concentration BLI and competitive binding assays. All purified peptides showed nanomolar binding and competition of the 125I-GIP from BHK cells stably expressing hGIP-R (Fig. 4). Furthermore, none of these ligands were found to displace radiolabelled GLP-1 from GLP-1R, nor glucagon from GCG-R (Fig. S11†). The parent CP sequences also do not bear any significant sequence homology with native GIP, nor any known interactors of hGIP-R, suggesting that these are the first reported examples of de novo incretin receptor selective and competitive peptides for hGIP-R.
To optimise the biophysical and physicochemical properties of the CPs, and to identify a suitable attachment point for half-life extending moieties, such as albumin binders, we sought to establish a detailed structure–activity relationship (SAR) on cluster B. We chose the most and least abundant of our cluster B lead peptides (B_3 and B_1275) as starting points for full amino acid mutation scans.24,26 We focussed on natural amino acids only to keep the option for (semi-)recombinant expression of the compounds, should large amounts be needed for further development stages. Single mutation variant peptides were synthesised in a 96 well format for each amino acid in the variable region contained between the two conserved cysteine residues C2 and C13. Methionine and cysteine were not included in the mutational scan due to potential oxidation, and asparagine was not included due to the risk of deamidation and isomerisation to iso-aspartate. Additionally, scrambled peptides were included in the panel. The binding affinities of these mutant peptides to hGIP-R ECD were then determined by single concentration BLI experiments.
As shown in Fig. 3a and b, the consensus sequence of cluster B (positions 9–12, LWPF) was found to be the most intolerant to mutation, confirming this region to be critical for binding. Proline at position 11 was found to be completely intolerant to replacement by any other amino acid, suggesting that this residue is critical for maintaining conformationally restricted binding interface of CPs. While residues 9–12 are hydrophobic in both parent peptides, it is unlikely that the interaction of the peptides to hGIP-R ECD is unspecific, as none of the scrambled peptides (e.g. LWPF) were found to bind, including the transposed mutants of B_3 and B_1275, with interchanged L9 and W10. The data shows some general trends for both B_3 and B_1275. In general, positions 3 and 6 (both F in the parent peptides) favour aromatic residues, position 8 favours aliphatic residues, while residues 4, 5, and 7 are fairly tolerant to mutation. The single amino acid scan did not reveal any substitutions that led to remarkable improvements in binding, suggesting that these peptides may either already be optimised for binding through several rounds of mRNA-display selection, or that further binding improvements would only be achieved by synergistic action of multiple substituted residues. The latter could ideally be investigated by mRNA display based affinity maturation experiments in follow up studies.24,26 To gain insight into the binding mode of CPs to hGIP-R on the atomistic level we employed a two-step modelling protocol that first generates multiple conformations of the complex and then selects the final conformation based on stability (for details, see ESI†). As our peptides were demonstrated to be 125I-GIP-competitive, we directly folded them inside the GIP binding site of hGIP-R using Rosetta (Fig. 3c and d; results for B_1275 shown). Briefly, each of the four crucial residues according to the SAR data (Leu9, Trp10, Pro11 or Phe12) was placed in the GIP binding site and the rest of the peptide was grown around it. The obtained conformations (24000 in total) were further clustered and the representative poses of the four most populated clusters were submitted to molecular dynamics (MD) simulations (600 ns for each pose) to assess their stability. The model with the lowest L_RMSD (Root Mean Square Deviation of the ligand backbone during MD trajectory) was selected for further characterization (Fig. 3c). In this model, the C-terminal half of the peptide faces the GIP binding site forming most of the interactions with hGIP-R. This is in line with the obtained SAR data that the C-terminal region is in general less tolerant to mutagenesis compared to the N-terminus. Furthermore, the Trp10 residue overlaps with the position of Phe22 in GIP which is known to be crucial for its binding to the receptor (Fig. 3d). Finally, overall conformation of the peptide resembles that of a hairpin with the turn located on Phe6 and His7, which are the only two positions (together with the peptide termini, see below) tolerating mutagenesis to a proline.
Fig. 3 Heat maps showing Kd (nM) values for binding of single-residue mutant peptides derived from B3 (a) and B1275 (b) to biotinylated hGIP-R ECD as determined by single-concentration BLI. The columns indicate the amino acid changes compared to the parent sequence (displayed at the top of the table). All peptides featured a C-terminal Cys and were tested in crude form. Peptides that were not tested or those where synthesis failed are marked as grey boxes. None of the scrambled mutant peptides showed any binding. (c) Atomistic model of the cyclic peptide B_1275 and GIP receptor complex suggested by molecular modelling. The LWPF sequence of the peptide is shown in red, while the rest of the molecule is coloured by the atom name (carbon in yellow, nitrogen in blue, oxygen in red, and sulphur in orange). The receptor is show in grey; the binding site residues (L35, W39, M67 and Y87) are highlighted in green. (d) Overlay of the crystal structure of GIP complexed with hGIP receptor (PDB 2QKH) and atomistic model of cyclic peptide B_1275 in the binding site of the GIP receptor. Note that the positions of W10 of the cyclic peptide and F22 of GIP overlap. |
Next, we wanted to use the accrued binding and SAR information to design peptides for in vivo studies. For these, peptides need to fulfil two crucial requirements. Firstly, they need to be stable in plasma and secondly, they need to be protected from renal clearance. The latter is crucial to the development of any peptide therapeutic as even protease resistant peptides usually exhibit plasma half lives in the range of a few minutes. Attachment of a fatty acid-based albumin binder (termed protractor) is one of the most employed strategies to extend peptide half-life and we opted for a 2xOEG-gGlu-C18 diacid albumin binder for our peptide hits.3,27–29 The binding model suggested that the addition of an albumin binding moiety is tolerated at the N- or C-terminus, as these are solvent exposed. Further evidence that replacement of the Met in position 1 with Ala is possible for both B_3 and B_1275 without loss of affinity and the ease of synthesis prompted us to design four peptides with a protractor attached to the N-terminus (B_3.1, B_5.1, B_68.1 and B_1275.1). Furthermore, we synthesized linear versions of the parent peptides in order to investigate the importance of macrocyclization on binding and stability (B_3.2, B_5.2, B_68.2 and B_1275.2). Additionally, we synthesized peptides without FLAG tag to prove that binding indeed is mediated by the macrocycle and not the tag (B_1275.3). During the purification of the latter, we realized that solubility was low (as expected from the rather hydrophobic sequence), complicating purification and analysis. Thus, we utilized our SAR understanding to design variants with improved biophysical properties by reducing the hydrophobicity and increasing the charge at positions where mutation was tolerated (especially positions 1, 4 and 7). Furthermore, we replaced the Met residues to avoid oxidation of the sulphur atom. We focussed on B_1275 as we anticipated from the sequence that this peptide is most hydrophilic, as also indicated by the shortest retention time in HPLC chromatography (Table S1†). All resulting peptides containing a FLAG tag (B_1275.7 through B_1275.11) were soluble at the relevant conditions, however, only B_1275.5 showed high solubility without the tag, as measured by nephelometry at different PEG concentrations (Fig. S14†). Pleasingly, addition of an albumin binder (B_1275.6) did not negatively affect solubility. Analysis of the peptides by multi-concentration BLI and radio-GIP displacement showed that indeed N-terminal modification was allowed for most peptides (Fig. 4). None of the linear variants (B_3.2, B_5.2, B_68.2, B_1275.2) showed binding, suggesting that macrocyclization is crucial for the interaction. Most mutant peptides, however, retained binding, including the triple mutant B_1275.9 (M1A, T4R, H7E; Kd at 31 nM) and quadruple mutants B_1275.7 (M1A, F3W, H5E, H7E, Kd at 101 nM) and B_1275.8 (M1S, T4R, H5A, H7Q, Kd at 184 nM), showing the power of having detailed SAR information available for multi-factorial optimisation of hits. Interestingly, in contrast to the binding data obtained from F11P mutants from high throughput mutagenesis studies (Fig. 3), B_1275.10 with a F11P substitution did not show any binding, which exemplifies the limits of high throughput SAR analysis using crude peptides, as the terminal P possibly interferes with cyclisation and might lead to multi- or polymers which interfere with the BLI assay.
Finally, we tested a panel of the optimised peptides for stability in human plasma and determined in vivo pharmacokinetic parameters in rats. Most cyclic peptides (B_3.1, B_1275.3, B_1275.4, B_1275.5 and B_1275.6) showed no decrease in plasma stability over the course of 5 h, including in the presence of a FLAG tag or of a protractor (Fig. 4b). While parent B_1275 had a t1/2 of 3.5 h, the two linear peptides tested (B_3.2 and B_1275.2) exhibited low levels of degradation (t1/2ca. 2 h, see Table S2 in ESI† for exact values), and the biologically active linear peptides of both GIP and GLP-1 were more rapidly degraded (t1/2ca. 35–45 min), which highlights the benefits of cyclic peptides in terms of druggability. Having established the plasma stability, we turned our attention to the in vivo half-life. We chose two protracted parent peptides (B_3.1 and B_1275.1) and the optimised B_1275 variants with FLAG tag (B_1275.4), without FLAG tag (B_1275.5) and with protractor but without FLAG tag (B_1275.6). The two non-protracted peptides exhibit very short half-lives (B_1275.4 = 3.5 min, B_1275.5 = 1.8 min), whereas the peptides carrying an albumin binder show significant plasma exposure over a long period of time (t1/2, B_3.1 = 9.8 h,B_1275.1 = 10.7 h B_1275.6 = 4.2 h), with half-lives being increased by more than 100 fold (Fig. 4d) (as a comparison, the GLP-1 analogue semaglutide, which is dosed once-weekly in humans has a t1/2 in rats of 7 h.)28
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1sc06844j |
‡ Novo Nordisk Bio Innovation HubNovo Nordisk A/S255 Main St 10th Floor, Cambridge, MA 02142, United States |
This journal is © The Royal Society of Chemistry 2022 |