Site-selective C–C modification of proteins at neutral pH using organocatalyst-mediated cross aldol ligations† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c8sc01617h

An organocatalyst-mediated protein aldol ligation (OPAL) affords C–C linked bioconjugates at neutral pH.


Introduction
Protein-small molecule bioconjugates have revolutionised the elds of chemical medicine, chemical biology and cell biology, 1-4 but their utility can be undermined by the instability of the covalent linkages generated by existing protein chemical modication strategies. 5 Carbon-carbon bonds, the backbone of all organic molecules, are inherently stable across a range of conditions however, and are therefore established as the most coveted linkage in bioconjugation studies. Although a small number elegant strategies for the chemical assembly of protein C-C bonds, including carbon free-radical additions to alkenes, 6,7 Knoevenagel 8 or Mukaiyama 9 based condensations, and Pictet-Spengler type ligations, 10 in addition to other bioorthogonal ligations, 3,[11][12][13] have recently demonstrated impact beyond academic labs, the general use of some such methods can be hindered by practical limitations. These include the requirement for chemical probes containing reactive handles accessible only through multi-step syntheses, or probes themselves which are prohibitively expensive and used in large excess, or have reduced reactivity under biological conditions such as in the presence of oxygen. Of particular signicance is the frequent requirement for acidic or basic pH during bioconjugation as the folding, stability and function of many proteins (and complexes) depends on the maintenance of a pH close to neutral. Deviations from this optimal pH window can therefore have deleterious effects 14 exemplied by the disassembly of the histone octamer nucleosome core, 15 human hemoglobin dissociation, 16 aggregation of antibodies 17 and protein aggregation events associated with neurodegenerative diseases such as Alzheimer's 18 and Prion diseases. 19 Indeed, acidic pH is a characteristic feature of the lysosome which facilitates the degradation of proteins. There is therefore a pressing need to develop fully biocompatible ligations that address these limitations and enable efficient C-C bioconjugation of proteins at neutral pH. Using affordable simple probes in ratios acceptable for small molecule chemistry would also allow wider access to the methodology in non-specialist labs or those with limited resources.
In comparison to bioconjugation chemistry the synthesis of C-C bonds in small molecule chemistry is well established, and in the 2000's was redened by the emergence of 'organocatalysts' 20,21 capable of optimising existing transformations as well as inspiring new reactions. 22,23 Our interest in this area was piqued by the prominent role aldehyde chemistry has played in the exponential development of the eld. 24 This is notable as aldehydes are chemical handles which are easily installed into proteins through modication of both natural 25,26 and unnatural amino acids, 27 with efficiencies akin to the installation of widely utilised protein modication tags such as dehydroalanine (from cysteine 7,28 or phosphoserine 6 ) and azides or alkynes. 29 We therefore sought to explore whether the challenges of constructing C-C modied proteins could be overcome through exploitation of aldehyde handles, and the development of a novel ligation which merged established small molecule aldehyde organocatalysis methods with developing bioconjugation chemistry techniques.
We were attracted to Northrup and Macmillan's seminal work on cross aldol reactions of aldehydes using L-proline 1 as an organocatalyst, 30 and the water compatibility of this chemistry. 31 We envisioned that an analogous cross aldol reaction on protein aldehydes might have widespread utility because it would enable site selective C-C modication of proteins using uncomplicated aldehyde probes and non-toxic affordable organocatalysts. As an additional benet, we anticipated the b-hydroxy aldehyde product of the cross aldol reaction could be subjected to alternative aldehyde ligation conditions to afford dual modied proteins, which are of increasing utility. 32 Herein we disclose the realisation of this strategy in an 'organocatalyst-mediated protein aldol ligation' (OPAL) that enables site-selective formation of stable C-C bonds to protein aldehydes 2 at neutral pH at internal sites within folded proteins, and at the N-terminus (Fig. 1, enamine activation mode). OPAL is a stand-alone protein bioconjugation reaction which is highly efficient and complete within 1 h using as few as 2 equivalents of simple aldehyde probe. Additionally the selectivity of the ligation is also demonstrated in complex mixtures through the affinity puri-cation of a pH sensitive protein from a cell lysate. Furthermore, we establish that the b-hydroxy aldehyde OPAL product 3 can subsequently take part in an organocatalyst-mediated bhydroxy oxime ligation unexpectedly accelerated at neutral over acidic pH, affording access to dual differentially modi-ed proteins 4. The utility of this tandem method is showcased in the 'chemical mimicry' of a dual post-translationally lipidated surface protein, integral to the pathogenesis of Leishmaniasis. 33

Feasibility studies on proteins
To establish conditions for the OPAL we studied two model protein systems, haem co-factor containing horse heart myoglobin 5, and disulde bond containing thioredoxin 6, both bearing non-enolisable N-terminal a-oxo aldehydes 34,35 (for aldehyde installation methods see ESI Fig. 1 †). Preliminary ligations in phosphate buffer (PB) afforded full conversion to the desired b-hydroxy protein aldehydes within 6 h at neutral pH with 100 mM L-proline organocatalyst 1, using butyraldehyde 7 as an aldol donor (Fig. 2a). Importantly, only a single organocatalyst-dependent aldol modication occurred, con-rming the expected stability of the b-hydroxy aldehyde motif to further aldol reactions. 36 Additionally UV/Vis spectroscopic measurements of the haem group in the modied myoglobin demonstrated no compromise to the protein's tertiary structure had occurred (ESI Fig. 2 †) and trypsin digest and LC-MS/MS analysis of the resulting peptide fragments conrmed the siteselective nature of OPAL (ESI Fig. 3 †).

Optimisation on peptides and proteins
Encouraged by the biological compatibility and site-selectivity of these preliminary ligations, attention next turned to optimising OPAL by focusing on the choice of organocatalyst and aldehyde donor species. We screened a panel of secondary amines to investigate their ability to catalyse the ligation using the well studied model peptide substrate a-oxo-aldehyde-LYRAG 8 and butyraldehyde 7, and determined second order rate constants for each catalyst at 1, 10, and 25 mM loadings (Fig. 2b, see ESI Table 1, † ESI Fig. 4 † for MS/MS analysis, and the ESI † page S35 for a discussion on stereochemistry). There was a 60-fold range in the magnitude of the rate constants across the panel with tetrazole 9 exhibiting the highest value. 37,38 Similar reactivity correlations were also evident using a protein substrate (ESI Fig. 5 †). Further peptide screens demonstrated that the nature of the a-carbon substituent of the aldehyde donor also signicantly affected the rate of ligation, as in the presence of L-proline 1 and tetrazole 9, donors such as phenylacetylaldehyde 10, bearing an aryl substituent, reacted $240-fold faster than aldehyde donors bearing alkyl substituents. Overall, the rate constant of $24 M À1 s À1 using tetrazole 9 compares favorably with the fastest aldehyde bioconjugations reported in the literature at any pH, 24,39 demonstrating that judicious choice of both organocatalyst and donor species is essential for achieving optimal rates of ligation.

Site-selective functionalisation of proteins
Next we set out to explore the scope of these optimised conditions in the site-selective bioconjugation of a range of proteins using functionalised a-aryl aldehyde probes. Guided by a desire to simplify protein chemical modication procedures, we designed a practical synthetic route to access a-aryl substituted aldehyde donors bearing functional tags, including a uorescent label 11, a biotin affinity tag 12, a folate targeting moiety 13, and a bioorthogonal azide handle 14 (Fig. 3a). Probes bearing 1,2 amino-alcohols 15 were constructed using solid phase peptide synthesis (SPPS) from readily available building blocks, and unmasked using a biologically compatible periodate oxidation to reveal the desired aldehyde (Fig. 3b). These aldehyde probes were then deployed in the site-selective OPAL modication of a variety of a-oxo aldehyde containing proteins using 25 mM tetrazole 9 at neutral pH ( Fig. 3c). Thioredoxin, myoglobin and hydrophilic acylated surface protein A (HASPA) from Leishmania donovani, 33 all bearing a-oxo aldehydes at their N-termini, were modied in quantitative conversion within 1 h using 2-20 equivalents of aldehyde probe, with no modication observed on proteins that did not bear the required aldehyde functionality (see ESI † Section 6 for seven further protein bioconjugation examples). GFP protein bearing an a-oxo aldehyde in addition to a bioorthogonal strained alkyne (cyclooctynelysine) at position 39 was also compatible with the OPAL conditions and quantitatively modied with both folate (Fig. 3c), and biotin tags (see ESI † page S58). Additionally OPAL linkages showed no liability aer incubation in 25 mM PB pH 7.5 over 72 hours at 37 C (ESI Fig. 6 †).

Functionalisation of proteins at internal residues and in cell lysate
We also demonstrated the compatibility of OPAL in bioconjugations at internal sites within folded green uorescent protein (GFP) and superfolder GFP (sfGFP) (Fig. 4a, ESI Fig. 7 †). a-Oxo aldehydes were rst quantitatively installed into the proteins at neutral pH using a biocompatible Pd-mediated decaging (1 equivalent for 1 h) 27 of an incorporated unnatural thiazolidine-lysine (ThzK) amino acid, recently developed in our lab. Both internal aldehydes were then functionalised using OPAL to install azide handles at position 39 of GFP and position 150 of sfGFP in 1 h using aldehyde probe 14 with no observable loss of uorescence. Furthermore we demonstrated the compatibility of both the Pd-mediated decaging and OPAL modication in cell lysate (Fig. 4b, ESI Fig. 8-10 † for uncropped gels). Following expression of the ThzK containing GFP in E. coli, the cell lysate was subjected to a 1 h Pd-mediated decaging followed by a further 1 h OPAL using biotin tag 12 at neutral pH. The lysate was then loaded onto an avidin affinity column, which was washed prior to elution with 2 mM biotin to afford the OPAL biotinylated GFP protein (see ESI † page S16 for protein recovery at each step). The ability to selectively pulldown only the a-oxo aldehyde containing GFP from the lysate in a 2 h procedure showcases the selectivity and efficiency of OPAL for the modication of an internal aldehyde in folded GFP within a mixture of proteins, and the retention of GFP uorescence (which is quenched at mildly acidic pH (ref. 40)) also highlights the signicant advantage of functionalisation at neutral pH.

Optimising bi-functional modication
As previously observed by Macmillan in small molecule cross aldol reactions, 36 the b-hydroxy aldehyde product of the OPAL displayed no reactivity in further aldol reactions. 41 So we next determined whether this aldehyde was reactive under alternative conditions which would enable construction of challenging but coveted differentially bi-functionalised proteins (Fig. 5), which have limitless potential applications. 32,42,43 Using peptide b-hydroxy aldehyde-LYRAG 16, we screened the reactivity of the b-hydroxy aldehyde as an electrophilic partner in two high yielding literature bioconjugation reactions, the iso-Pictet-Spengler ligation 44 and the 2-amino benzamidoxime (ABAO) ligation 45 (both with optimal reactivity at acidic pH). Although conversion to bi-functionalised product was observed, yields were disappointingly low (ESI Fig. 11 †), emphasising the relative stability of the b-hydroxy aldehyde moiety compared to other aldehydes previously used in bioconjugation studies. 25 We therefore turned our attention to the classical acid-catalysed oxime ligation (pH 4.5 optimum), which proceeds more slowly at neutral pH but can be accelerated by the addition of aniline 17 as an organocatalyst. 46 Gratifyingly, in studies using bhydroxy aldehyde-LYRAG 16 and an aminooxy nucleophile at pH 4.5 we achieved 61% conversion to bi-functionalised product 18 in the presence of aniline organocatalyst 17. Unexpectedly however, the conversion to the b-hydroxy oxime product 18 was further increased to 95% when the reaction was performed at pH 7.5 (Fig. 5a, aniline Schiff base activation mode), which is a reversal of the precedent for oxime formation with other aldehyde handles. 46 This trend was again evident when screening alternative aniline organocatalysts which have been previously reported for hydrazone/oxime ligation, 47 as well as with alternative peptide and protein b-hydroxy aldehyde substrates (ESI Fig. 12 †), suggesting a general trend in reactivity and improved biological compatibility. Akin to observations made for imine formation at ortho-substituted aromatic aldehydes in water, 48 this reversal may be governed by intramolecular H-bonding between the b-hydroxy group and the protonated aldehyde or aniline Schiff base intermediates formed en route to oxime formation, and merits further investigation.

Bi-functional modication of proteins through b-hydroxy oxime ligations
Having established an unexpected reversal in pH dependence for the rate of oxime formation we subsequently demonstrated the potential utility of the organocatalyst-mediated b-hydroxyoxime ligation in tandem with OPAL for the construction of dual N-terminal functionalised proteins using both a biotin aminooxy affinity handle 19, and a polyethylene glycol (PEG) aminooxy reagent 20 (Fig. 5b). A two-step organocatalystmediated tandem modication of thioredoxin 6 was achieved at neutral pH, rstly using OPAL to quantitatively install a uorescent label into the protein using 25 mM tetrazole 9 and probe 11, followed by tandem oxime ligation under optimised reaction conditions of 10 mM p-anisidine 21 organocatalyst in PB (50 mM, pH 7.5) and biotin affinity handle 19, to afford differentially functionalised thioredoxin 22 (Fig. 5c . Under identical organocatalyst conditions thioredoxin could also modied with both an azide and biotin probe with detection of the bifunctionalised product 24 by Western blot (Fig. 5d, ESI  Fig. 14 †) as well as myoglobin with both azide and biotin (ESI † page S67). Fluorescent myoglobin OPAL product 25 was also further functionalised by PEGylation (Fig. 5e, ESI Fig. 15 †), a strategy used for increasing the circulatory lifetime of therapeutic proteins. Following treatment with aminooxy probe 20, Additionally, we noted the b-hydroxy oxime linkages resulting from tandem bioconjugation reactions, showed no degradation over 30 days at neutral pH (ESI Fig. 16 †).

Towards chemical mimicry of natural dual PTMs of HASPA
Finally, we sought to explore the suitability of the tandem OPALoxime ligation for the 'chemical mimicry' of a natural dual posttranslational modication (PTM) integral to the pathogenesis of the neglected tropical disease Leishmaniasis. Hydrophilic acylated surface proteins (HASPs) are present in all human infective Leishmania parasites. They are highly immunogenic and form the basis of a visceral leishmaniasis vaccine currently undergoing clinical trials in humans. 49 Their expression is stage regulated during human infection, however the exact role they play in the parasite lifecycle has yet to be determined. 33 Non-classically, HASPA is dually acylated at the N-terminus (Fig. 6, top) with both myristoyl 27 (at Gly1) and palmitoyl 28 (at Cys4) lipids, PTMs which are thought to govern its ability to associate with plasma membranes 50 but not fully rationalised. Whilst co-translational myristoylation of HASPA by the parasite N-myristoyl transferase (NMT) can be recapitulated in vitro, the S-palmitoyltransferase is unknown and the Cys4 containing protein is prone to precipitation, which has limited recombinant access to natural dual lipidated protein for further study. We therefore designed a strategy using our tandem biocompatible ligations to chemically mimic the structural modications of HASPA for the rst time (Fig. 6,  bottom), and provide access to a dual lipidated construct. Using myristoyl aldehyde 29 as donor and recombinantly expressed Leishmania donovani HASPA bearing an N-terminal aoxo aldehyde as a substrate (C4S mutant, see ESI Fig. 17 †) we rstly used OPAL to construct a chemical mimic 30 of the natural myristoylated protein at neutral pH in quantitative conversion (ESI-MS characterisation, Fig. 7b). To establish that  this bioconjugate structurally mimics myristoylation we used NMR spectroscopy to characterise both the unmodied HASPA 31 and enzymatically myristoylated HASPA 27, lipidated using puried recombinant N-myristoyl transferase. 51 Following resonance assignment of backbone nuclei (ESI Fig. 18 †), comparison of the 2D ( 1 H, 15 N) HSQC spectra revealed that myristoylation of the native protein caused exchange broadening of resonances for residues near the N-terminus, including Y3, S4, T5 and S8 (Fig. 7a, le panel). Subsequent NMR characterisation of the chemically myristoylated HASPA 30 was also then performed and the HSQC spectra revealed to be highly comparable to that of the enzymatically modied protein 27 (Fig. 7a, right panel), with residues near the Nterminus also displaying the characteristic exchange broadening following modication. These data therefore demonstrate that OPAL modication of HASPA replicates the in vitro solution properties and structure of the enzymatically modi-ed protein, and also highlight the simplicity of the OPAL procedure for use in the chemical mimicry of protein myristoylation, which in this example requires only three affordable commercial reagents, sodium periodate for aldehyde installation (see ESI Fig. 1 †), and aldehyde 29 and organocatalyst 9 for the OPAL.
To begin to investigate the potential function of in vivo dual modication of HASPs, we next subjected the OPAL myristoylated HASPA 30 to tandem organocatalyst-mediated b-hydroxy oxime ligation using palmitoyl aminoxy 32. The dual modied product 33, a chemical mimic of myristoylation and palmitoylation, was also characterised by ESI-MS (Fig. 7b). Finally both the OPAL myristoylated HASPA 30 and this dual modied HASPA 33 were characterised using time course liposome sedimentation assays to assess their capacity to bind to model biological membranes (Fig. 7c). SDS-PAGE analysis revealed that over a 30 h period both lipid modied HASPs displayed substantially greater levels of in vitro association to membranes than the unmodied HASPA 31 (Fig. 7c) and approximately twice as much protein was bound to the membrane when HASPA was dual lipidated in 33 compared to mono myristoylation in 30 (ESI Fig. S22 †). Results from this in vitro 'chemical model system' therefore substantiate the notion that the in vivo role of the lipid PTMs is to facilitate attachment of HASPA to parasite membranes and determine its localisation to the cell surface, 50 and emphasise that two lipid modications may be required to maximise in vivo binding, with the caveat that further in-depth functional comparison of single and dual lipidated HASPs in cellulo is still warranted.

Conclusion
In conclusion, we have validated the OPAL as a powerful standalone C-C forming bioconjugation strategy using simple aldehyde probes for the mild site-selective modication of a range of proteins at both internal and N-terminal sites demonstrating exibility in both positioning and functionalisation. We also demonstrated the compatibility and selectivity of both Pdmediated decaging of internal a-oxo aldehydes and subsequent OPAL modication in a mixture of proteins within cell lysate to enable affinity pull-down of GFP, which maintains its uorescence as a consequence of purication at neutral pH. We anticipate the potential applicability and simplicity of this strategy may also serve to nucleate further organocatalystmediated protein modication studies.
In contrast to other protein bioconjugations, OPAL not only 'survives' but 'thrives' in biocompatible conditions. This is because at aqueous neutral pH the b-hydroxy aldehyde product of the OPAL predominates over the enal (aldol condensation) product. The b-hydroxy group of this newly installed aldehyde prevents further aldol reactions facilitating a single siteselective modication and in an additional benet serves as a "pH switch" for a novel tandem b-hydroxy oxime ligation which is unexpectedly accelerated at neutral pH over acidic pH. Oxime ligations are perhaps the most widely used protein bioconjugation reaction, 47 but limited in that they are most efficient at acidic pH even when using aniline organocatalysts. The presence of the b-hydroxy group seemingly alters this acidic bias and constitutes a new aldehyde scaffold for bioconjugations, which has an in-built preference to react, and is stable at neutral pH.
Finally we utilised OPAL in the chemical mimicry of Nterminal myristoylation of a HASP protein integral to the pathogenesis of Leishmaniasis and demonstrated the ability to mimic the structural effects of enzymatic myristoylation through characterisation by protein NMR. Subsequent chemical palmitoylation by a tandem b-hydroxy oxime ligation generated a construct that also mimics the previously inaccessible natural dual modied protein, and furthermore was used for exploring the effects of single vs. dual lipid modication on binding to membranes. Notably this strategy may lend itself to the in vitro study of other proteins bearing multiple post-translational lipid modications 52 and may also prove broadly applicable in the functionalisation and immobilisation of other biomolecules.