Selective chemical labeling of proteins

Over the years, there have been remarkable efforts in the development of selective protein labeling strategies. In this review, we deliver a comprehensive overview of the currently available bioorthogonal and chemoselective reactions. The ability to introduce bioorthogonal handles to proteins is essential to carry out bioorthogonal reactions for protein labeling in living systems. We therefore summarize the techniques that allow for site-specific “installation” of bioorthogonal handles into proteins. We also highlight the biological applications that have been achieved by selective chemical labeling of proteins.


Introduction
][3][4][5][6][7][8][9][10] Since proteins carry many copies of different functional groups, it is demanding to selectively label a protein at a defined site.Additionally, the labeling conditions have to be mild and reactions should be able to undergo in aqueous solution.Moreover, the labeling reagents should also be orthogonal to other functionalities in living systems when a protein is to be labeled in vivo. 11,12These restrictions make it really challenging to label proteins in a chemoselective and site-specific manner.
Recent years have seen tremendous progress in chemical protein labeling.12][13][14][15][16][17][18] In this review, we compare the features of reported bioorthogonal reactions and chemical tagging approaches, e.g. the reaction rate or time, reaction conditions, reagents, etc.This could be helpful to identify the most suitable labeling reaction for a particular application.Moreover, we highlight the applications of established chemical labeling techniques to tackle biological problems.

The toolbox of bioorthogonal chemistry for protein labeling
Bioorthogonal reactions are robust and invaluable tools for chemical protein labeling. 4,6Typically, selective protein labeling is accomplished by incorporation of bioorthogonal groups into a protein, followed by chemoselective modifications. 18his approach is also designated as "tag-and-modify". 19A variety of bioorthogonal reactions have been developed, which can be classified into: (1) condensation reactions through carbonyls, (2) "click" reactions through azides, (3) inverse electron-demand Diels-Alder cycloadditions (DA INV ) and other cycloaddition reactions, (4) transition metal-catalyzed coupling and decaging reactions, and ( 5) labeling reactions at cysteine residues (Table 1).In parallel, many elegant approaches have been established to selectively equip proteins with bioorthogonal handles (discussed in later sections), fulfilling the requirements for the subsequent bioorthogonal modification.

Condensation through carbonyls
Ketones and aldehydes can react with hydroxyamine and hydrazide compounds under aqueous conditions to form stable oxime and hydrazone linkages, respectively.Oxime ligation 76 is slow under neutral pH conditions and therefore an aniline catalyst is required. 20Recently, an improved catalyst p-phenylenediamine (p-PDA) was reported, which displays a higher water solubility and a better catalytic efficacy (10-120 times faster) than aniline. 21With respect to the condensation using hydrazide (hydrazone ligation), electronic and acid/base effects strongly influence the reaction efficiency at pH 7.4.For instance, carbonyl compounds with neighboring acid/base groups (e.g.carboxylate) form hydrazones at accelerated rates of up to 2-20 M −1 s −1 . 77In addition, in the presence of a 5-methoxyanthranilic acid (5MA) catalyst, the condensation rate can be substantially enhanced 84-fold (6.6 M −1 s −1 , 1 mM 5MA) compared to the reaction without a catalyst (0.08 M −1 s −1 ) at pH 6.5 (Table 1A). 22side from oxime ligation and hydrazone ligation, aldehydes and ketones can undergo a Pictet-Spengler reaction with β-arylethylamines, 78 which has been used for protein labeling at the N-terminus. 23To date, several modified versions have been introduced, including Pictet-Spengler ligation 24 and hydrazino-Pictet-Spengler ligation. 25Additionally, proteins engineered with an N-terminal aldehyde tag can be labeled via the Mukaiyama-Adol condensation using silyl ketene reagents with the formation of a stable C-C bond. 28KAHA (α-ketoacidhydroxylamine) ligation allows the condensation between an α-ketoacid and hydroxylamine or 5-oxaproline to form a native amide bond.This ligation has been a valuable alternative to native chemical ligation (NCL) 73 to join two unprotected peptide fragments in peptide synthesis. 29,30Owing to the rapid association between aldehydes and amines, an aldehyde has been elegantly employed as an amine-capture auxiliary in aldehyde capture ligation (ACL).In ACL, a C-terminal selenobenzaldehyde ester can interact with the N-terminus of a peptide/ protein.A native amide bond linkage is formed following a Se→N acyl shift.The ACL has been used for site-specific N-terminal modification of ubiquitin (Table 1A). 31 recently introduced reaction is ABAO (2-aminobenzamidoxime) ligation.ABAO combines an aniline moiety for iminium-based activation of the aldehyde with a nucleophilic group at the ortho-position to the amine for intramolecular ring closure.In addition to the rapid condensation reaction kinetics (up to 40 M −1 s −1 ), the condensation forms a fluorescent dihydroquinazoline derivative, making it possible to develop fluorogenic aldehyde-reactive probes.26 Alkyl aldehydes can also efficiently couple with aryl diamines under mild conditions (RT, neutral aqueous solution) in the presence of Cu(II) or Zn(II) ions via an oxidative condensation process.27 This reaction forms stable benzimidazole linkages and has been utilized to label the T4 lysozyme protein with an aldehyde dye (Table 1A).
With respect to labeling biomolecules in live cells or organisms, carbonyl compound-related condensations have not been widely used.This is because the catalysts are usually toxic, and endogenous ketones and aldehydes, e.g.glucose and pyruvate, would interfere with the labeling reaction.Nevertheless, ketones and aldehydes are generally not present on the cell surface.Therefore, carbonyls serve as useful chemical handles for labeling biomolecules on the cell surface using hydrazide or aminooxy probes.

"Click" through azides
Azide is a small and stable group, which has a unique dipole for a variety of bioorthogonal reactions. 84Click reactions using azides include Staudinger ligation, 32 traceless Staudinger ligation, 33,34 Staudinger-phosphite ligation, 35 copper-catalyzed azide alkyne cycloaddition (CuAAC), 36 strain-promoted azide alkyne cycloaddition (SPAAC) and oxanorbornadiene cycloaddition (Table 1B). 38Among these reactions, CuAAC and SPAAC appear to be the most popular and are discussed in detail in this section. 4uAAC is a hallmark of bioorthogonal chemistry that was reported independently by Sharpless 36 and Meldal 85 in 2002.Its application as a bioorthogonal reaction revolutionized our ability to modify and manipulate proteins. 86,87CuAAC becomes popular mainly due to the following reasons: (1) the azide and alkyne groups are highly specific toward each other but remain inert to other chemically active molecules in live systems; (2) the reaction produces a regioselective 1,4-triazole product which is stable and inert; (3) CuAAC exhibits fast reaction kinetics (∼3 M −1 s −1 in the presence of 50 μM Cu(I) and 50 μM TBTA) 37 and various ligands have been developed to stabilize Cu(I) and further increase the reaction speed.
Cu ions catalyze the production of reactive oxygen species, leading to cytotoxicity. 88This limits the application of CuAAC in living systems.By choosing an appropriate ligand, CuAAC can be biocompatible with minimal cytotoxicity while showing an increased reaction rate. 89A panel of these ligands is summarized in Scheme 1A.Copper-chelating azides bring the Cu(I) ion into close proximity and thereby significantly increase the reaction rate (Scheme 1B). 90The ligand BTTP (3-[4-({bis[(1-tertbutyl-1H-1,2,3-triazol-4-yl)methyl]amino}methyl)-1H-1,2,3triazol-1-yl]propanol) stabilizes the Cu(I) ion while speeding up                SPAAC: BCN is readily accessible; the reaction rate is accelerated using an electron-deficient azide  65 the click reaction (Scheme 1A).In addition, its complex with Cu(I) is cell permeable and non-cytotoxic, facilitating CuAAC in live E. coli cells. 91Using this reaction, an environment-sensitive fluorogenic fluorophore (alky-4DMN) was site-specifically introduced into HdeA in both the periplasm and cytoplasm of E. coli.
HdeA, an acid-stress chaperone that adopts pH-dependent conformational changes, was genetically encoded with an azide-carrying unnatural amino acid ACPK at residue 58 within its pHresponsive region.The resulting hybrid pH indicator enables compartment-specific pH measurement to determine the pH gradient across the E. coli cytoplasmic membrane (Scheme 1C). 91luorogenic azide probes display substantial fluorescence enhancement upon cycloaddition reaction and therefore confer labeling with minimal background (Fig. 1A).Based on photo-induced electron transfer (PET) mechanism, highly fluorogenic blue-emissive azidomethyl substituted anthracene (A) and its analogues have been generated. 92Substitution at the 3 or 7 position of coumarin has a strong impact on its fluorescence properties.Guided by this principle, 3-azido substituted coumarins (B) show an 80-fold increase in fluorescence intensity upon the cycloaddition reaction. 93The quantum yield (Φ, QY) of a probe after a click reaction can be calculated by density function theory (DPF), allowing rational design of fluorogenic azide probes. 94Using this approach, the greenemission azido-fluorescein (C) has been designed, which exhibits a fluorescence enhancement of 29 to 34-fold upon cycloaddition with different alkynes. 95Fluorogenic green-to far redemitting CalFluors enable sensitive detection of biomolecules under no-wash conditions (D, E, F). 96 An important application of CuAAC lies in the target identification of biologically active small molecules in biomedical research and drug discovery. 97A bioactive compound is typically   derivatized with a photo-crosslinking moiety (e.g.diazirine) and a terminal alkyne, which is denoted as an affinity-based probe (AfBP) (Fig. 1B).By photo-crosslinking, the AfBP probe in situ captures the protein target and forms a stable protein-ligand complex in the cell.Subsequent labeling of the target proteins using azide probes is followed by separation via gel-electrophoresis and determination via mass spectroscopy. 97Among many bioorthogonal groups, a terminal alkyne is a well suited tag as it is small with minimal interference of protein-ligand interactions, chemically inert, and can be easily modified via CuAAC.
Although the cytotoxicity of Cu(I) can be reduced by using ligands, copper-free click chemistry is more straightforward.Strain-promoted azide-alkyne cycloaddition (SPAAC) and other copper-free click reactions 38 without the requirement for a metal catalyst have been developed (Table 2A).The first SPAAC reaction for protein labeling was reported by Bertozzi and coworkers in 2004 using a strained cyclooctyne. 39However, the reaction rate is slow (k = 2.4 × 10 −3 M −1 s −1 ), which is comparable to Staudinger ligation (k = 3 × 10 −3 M −1 s −1 ).To date, new variants of strained alkynes have been developed with improved properties, such as enhanced specificity, reduced lipophilicity 98 and increased reaction rates (Table 2B).
There are several ways to improve strained alkynes. 99The first strategy is to modulate the electronic properties by introducing electron-withdrawing (EW) groups, e.g.fluorine, near the triple bond. 100Examples include MOFO 101 and DIFO. 40he second approach is to fuse the cyclooctynes with rigid aromatic rings, leading to an enhancement of reactivity by increasing ring strain.The ring-fused cyclooctynes, including DIBO, 41,102 DIBAC, 103 COMBO 104 and BARAC, 42 show a 25-400fold increase in the reaction rate. 105However, synthesis of these cyclooctynes is usually laborious.Bicyclononyne (BCN), a cyclopropane-fused cyclooctyne, shows relatively fast reaction kinetics (0.3-1 M −1 s −1 ) toward azides and can be facilely prepared in only three steps. 43Using an electron-deficient azide, e.g.4-azido-1-methylpyridin-1-ium iodide, the cycloaddition reaction rate with BCN can be further increased to 2-2.9 M −1 s −1 . 44By introducing both EW groups and ring strain, difluorobenzocyclooctyne (DIFBO) shows only a moderate increase of reactivity (0.22 M −1 s −1 ) with a significant reduction in stability. 106The third strategy is to shorten the ring size as exemplified by the 7-membered tetramethylthiazacycloheptyne (TMTH), 107 cyclohexyne and cyclopentyne. 108However, these compounds have poor stability (Table 2B).The copper-free click reactions have been used for antibody-free western blot analysis, 109 visualization of glycosylation on cell surfaces 110 and protein labeling inside live cells. 111

Inverse electron-demand Diels-Alder cycloaddition (DA INV ) and other cycloadditions
The inverse electron-demand Diels-Alder cycloaddition (DA INV ) occurs between an electron-rich dienophile (e.g.strained alkene/alkyne) and an electron-poor diene, typically 1,2,4,5-tetrazine (Table 3A).DA INV represents so far the most rapid bioorthogonal reaction (up to 8.6 × 10 5 M −1 s −1 ). 45,112In contrast, the reaction rate of SPAAC is only up to 1 M −1 s −1 .Theoretical calculation suggests that the rapid reaction rate can be attributed to the much higher interaction energy between tetrazine and trans-cyclooctene (TCO) in comparison to the interaction energy between an azide and a strained alkyne. 113etrazines are highly reactive so that they can also readily react with strained alkynes, isonitriles 52 and even terminal alkenes.Among these reactions, the reaction between tetrazine and BCN or cyclopropene has proven to be the most useful.Tetrazine reacts with BCN in a rapid fashion with a reaction rate constant of 3000 M −1 s −1 . 46In comparison to TCO that requires complicated synthetic procedures, BCN is readily available through organic synthesis.Compared to BCN and TCO groups, cyclopropene (Cpp) is much smaller, and therefore has been employed as a "minimalist" tag for live-cell imaging and affinity-based protein labeling. 114lkenes and alkynes display different reactivity toward tetrazine (Table 3B). 116sTCO features a cyclopropane ring, which brings additional strain and increases the reaction rate of the cycloaddition up to 160-fold in comparison to TCO. 116 The carbamate bond near the trans-double bond in TCO* reduces the chance of nucleophilic attack.Thus, TCO* shows better stability than TCO with only 2-fold reduction in reactivity toward tetrazine. 117The reaction rate of BCN with tetrazine is about 10 times slower than that of TCO. 46Norbornene is ca. 10 000 times less reactive than TCO, as it is more bulky and exhibits less ring strain than TCO. 46,47Other alkenes, including Cpp, 48,49,114 acyl-azetine 50 and terminal alkene, 51 are less reactive, and therefore can react only with highly reactive tetrazines.
Different tetrazines show varied reactivity toward strained alkenes (Table 3C).Studies have been conducted to evaluate the reactivity and stability of tetrazines. 115The introduction of EW substituents can substantially increase the reaction rate.However, it is a double-edged sword.Increase in the reactivity of tetrazine often results in the reduction of stability and lifetime in the serum.ED substituents on the other hand decrease the reactivity of tetrazine. 115Besides electronic effects, steric effects also play a crucial role.In general, there is a trade-off between reactivity and stability.The rate constants of tetrazine reactions with TCO as well as their stability are summarized in Table 3C.
Green-and red-emitting fluorophores display electronic interactions with tetrazine chromophores that have absorption maxima at 500-525 nm.As a consequence, tetrazine-conjugated fluorophores often show reduced fluorescence.After a cycloaddition reaction, tetrazine is deconjugated and loses its quenching capability.Hence, many tetrazine-fluorophores feature fluorogenic properties, such as BODIPY-FL, Oregon Green 488, BODIPY-TMR, VT680, 118 TAMRA, and fluorescein. 119These probes show a moderate turn-on ratio (up to 20-fold) (Fig. 2A).By adapting through-bond energy transfer (TBET) for fluorescence quenching, Weissleder and coworkers developed green-emitting BODIPY-tetrazine probes with up to 1600-fold turn-on ratio 120 and blue-emitting coumarin-tetrazine Table 3 Reactivity of strained alkenes and tetrazines a Protein labeling reaction.b The reaction kinetics values with BCN which is roughly 10-15 times smaller than with TCO 46 .c Reaction with sTCO; stability (%) refers to the intact tetrazine in fetal bovine serum (FBS) at 37 °C after 10 h. 115robes with up to 11 000-fold fluorescence enhancement. 121he tetrazine moiety is attached at the meta-position to the fluorophore on a rigid phenyl ring (Fig. 2A).Under these conditions, the tetrazine group is perpendicular to the fluorophore moiety, leading to the collinear alignment of two dipoles. 121o achieve dual-or multi-labeling of proteins, mutually orthogonal reactions are desirable. 122By fine-tuning the DA INV reactions, "selectivity enhanced" DA INV reactions have been applied to sequential dual-color labeling of insulin receptors (IRs) and viruslike particles (VLPs) on the cell surface of HEK293T cells, facilitating dual-color super-resolution microscopy (Fig. 2B). 117n addition to CuAAC, SPAAC and DA INV , which are popularly used for protein labeling, other dipolar cycloaddition reactions are summarized in Table 1D, including strain-promoted alkyne-nitrone cycloaddition (SPANC), 53 diazo-strained alkyne cycloaddition, 54,55 nitrile oxide-norbornene cycloaddition, 58 quadricylane ligation, 57 TQ-ligation, 59,60 strainpromoted sydnone-BCN cycloaddition, 56 and plenty of phototriggered cycloaddition reactions, such as the tetrazole-alkene photo-click reaction 61,62 and azirine ligation. 63

Transition metal-catalyzed couplings and decaging reactions
Biocompatible transition metal-catalyzed couplings include the Suzuki-Miyaura coupling, 64 Sonogashira coupling, and olefin metathesis.These reactions can be performed under mild and aqueous conditions, despite the necessity to use a transition metal catalyst (Table 1E). 15Suzuki-Miyaura coupling requires an iodophenyl group, a water soluble palladium (0) catalyst and an appropriate ligand.Recently, an improved ligand, 1,1-dimethylguanidine, has been employed for aqueous Suzuki-Miyaura coupling. 123Suzuki-Miyaura coupling has been successfully used for protein glycosylation and labeling at the cell surface. 124Sonogashira coupling is another Pd-catalyzed coupling reaction. 65An alkyne group is incorporated into the protein and subsequently reacts with an iodophenyl probe.Sonogashira coupling has been used for the modification of ubiquitin in live cells. 11,15,125erminal olefins can undergo an oxidative Heck reaction with boronic acid in the presence of Pd(OAc) 2 /BIAN catalysts.This reaction has been used for the site-specific labeling of 4-oxalocrotonate tautomerase (4-OT). 67Olefin metathesis is the redistribution of fragments of alkenes by scission and regeneration of carbon-carbon double bonds. 117,118Hence, in order to avoid undesired cross coupling products, a terminal olefin and a more reactive thiovinyl ether are required. 126ater soluble ruthenium catalysts have been developed to mediate the reaction in aqueous solution. 126,127hemical protein labeling not only makes proteins visible but also renders them controllable.Recently, palladium catalysts have been used to manipulate protein function in cells.Pd-mediated cleavage of the propargyl carbamate group leads to the generation of a free lysine residue.The protected lysine analogue can be genetically and site-specifically incorporated into a protein using an unnatural amino acid (UAA) mutagenesis technique (discussed in the later section).This strategy enables protein activation in living cells by decaging the lysine residue located at the active site of a protein, and has been utilized to elucidate the virulence mechanism of a bacterial type III effector protein in its host cells (Fig. 3A). 68A ruthenium-catalyzed cleavage reaction has been used for the cleavage of allyl carbamate to unmask caged Rhodamine 110 (R110) inside living cells (Fig. 3B). 128

Selective labeling at cysteine residue
Cysteine is characterized by its remarkable nucleophilicity among 20 common amino acid residues.Traditionally, proteins can be site-specifically labeled at their solvent accessible cysteine residues by thiol-reactive alkylation reagents, e.g., maleimides and iodoacetamides. 69,129However, excess maleimide-based reagents lead to the modification of the side chains of histidine, lysine and α-amino groups.By converting cysteine residues to dehydroalanine (Dha), the Dha residue can then be rapidly and specifically labeled by various thiol reagents. 71This approach has been used for the preparation of post-translationally modified proteins, including phosphorylation, glycosylation, methylation, acetylation, and lipidation (Table 1F). 130,131djacent cysteines are usually oxidized to form disulfide bonds under non-reducing conditions.In these cases, the solvent accessible disulfide bond can be first gently reduced and subsequently "intercalated" by mono-sulfone reagents.This approach permits the site-specific PEGylation of a variety of therapeutic proteins, including human interferon α-2b and antibody fragments. 74A more recent approach employed 1,3dichloroacetone (DCA) to introduce a reactive ketone tag, enabling subsequent oxime ligation (Table 1F). 75-terminal cysteine displays unique reactivity.Proteins carrying an N-terminal cysteine can undergo native chemical ligation with thioester probes and chemoselective ligation with aldehydes to form thiazolidines. 132 N-terminal cysteine can also specifically react with cyanobenzothiazole (CBT) derivatives at a fast reaction rate (9 M −1 s −1 ). 72The reaction of CBT compounds with D-cysteine is highly biocompatible and has been used for bioluminescent imaging of protease activity in live mice. 133lladium-tolyl complexes using 2-dicyclohexylphosphino-2′,6′-diisopropylbiphenyl (RuPhos) as the ligand have been developed to mediate efficient and highly selective cysteine conjugation reactions under biocompatible reaction conditions. 70At pH 7.5, the rate of a palladium-medicated reaction is comparable to that of the maleimide reaction.This bioconjugation strategy has demonstrated its broad utility for making stapled peptides, sitespecific labeling of proteins with a coumarin fluorophore, and the preparation of antibody-drug conjugates (ADCs) (Scheme 2). 70

Chemoselective labeling of native proteins
Chemoselective labeling of native proteins is useful for the preparation of protein conjugates such as ADCs 134 and the labeling of endogenous proteins in living cells. 135In addition to selective labeling at cysteine residues as discussed in the previous section, there are a few other approaches available (Scheme 3).

N-terminal labeling
The N-terminus of a protein shows unique reactivity and can be site-specifically labeled or converted to a bioorthogonal handle.The N-terminal α-amino group features a lower pK a value (ca.8) than that of the ε-amino group of lysine (ca.10).Transamination methods have been used to convert the N-terminal α-amino group to an aldehyde or a ketone using glyoxylate in the presence of a divalent metal ion and a base 136 or using pyridoxal-5-phosphate (PLP). 137N-terminal glycine shows the highest reactivity in the transamination reaction. 137 ketene compound has been reported to selectively react with an N-terminal α-amino group at pH 6.3. 138Another approach is the oxidation of an N-terminal serine or threonine to an aldehyde or ketone by sodium m-periodate at neutral pH. 139fterwards, the N-terminal aldehyde or ketone tag is ready for oxime ligation, Mukaiyama-Adol condensation, Pictet-Spengler reaction, etc., as discussed in previous sections (Scheme 3A).

Kinetically-controlled protein labeling (KPL)
Under normal bioconjugation conditions for protein labeling at lysine side chains, it is usually difficult to achieve site-specificity using regular amine-reactive reagents, such as an N-hydroxysuccinimidyl (NHS) ester.Nonetheless, lysine residues on a protein's surface display subtle differences regarding their individual reactivity.This difference may originate from the different solvent accessibility of lysine, or the interaction of lysine with neighboring residues, or the combination of both. 140Through kinetically-controlled protein labeling (KPL), proteins can be mono-modified at a specific lysine side chain. 140This approach enables the site-selective introduction of a terminal alkyne, or azide into native proteins for a click reaction 140 or Staudinger ligation (Scheme 3B). 141KPL has been utilized for site-specific biotinylation of pharmaceutically relevant proteins for delivery into macrophages 142 and dual labeling of the peptide hormone, somatostatin, for the visualization of targeted drug delivery in tumor cells. 143

Catalytic affinity labeling
Catalytic modules tethered with a ligand can be targeted to the binding pocket of a protein to locally catalyze the labeling reaction on the protein.For instance, a biotin-conjugated dimethylglycine (DMG) group is brought to the proximity of the avidin binding pocket, where it specifically catalyzes the acylation of amine probes with Asp108. 144In the so-called modular affinity labeling (MoAL) approach, three modules are required: (1) the catalytic ligand module (biotin-DMG), (2) the labeling module (amine probe) and (3) the reactive module (CDMT).Another catalytic affinity labeling method is termed affinityguided DMAP (4-dimethylaminopyridine) chemistry.DMAP is an effective acyl transfer catalyst, which can activate an acyl ester for its transfer to a nucleophilic residue.Therefore, a ligand tethered with DMAP allows specific labeling of native proteins by acyl ester probes (Scheme 3C). 145

Affinity labeling of endogenous proteins
In ligand-directed tosyl (LDT) chemistry, ligand-tethered tosyl ester probes are used to label endogenous proteins in cells.Upon labeling, the ligand is cleaved off at the tosyl ester linkage.Therefore, the protein is still active after labeling because the active center is no longer occupied by its ligand. 146The concept was employed to construct a turn-on probe based on the release of the quencher upon labeling. 147he LDT chemistry has been applied to the labeling of cell surface receptors 148 and native FKBP12 in live cells. 149LDT chemistry is slow and typically requires over 12 hours of incubation time.Recently, a faster affinity labeling approach, known as ligand-directed acyl imidazole (LDAI) chemistry, was used to selectively modify the endogenous folate receptor at the cell surface (Scheme 3D).The experiments showed that LDAI labeling is 12-fold more efficient than LDT labeling. 150ew emerging affinity labeling reagents include chemical probes bearing an O-nitrobenzoxadiazole (O-NBD) unit.Upon the ligand-directed reaction with a lysine side chain, the nonfluorescent O-NBD is converted to fluorescent N-NBD.Translocator protein (TSPO) ligands carrying an O-NBD unit enabled proteomic identification of a partner protein of TSPO, a voltage-dependent anion channel (VDAC).The affinity labeling reaction using O-NBD is quite efficient with yields of 41% and 76% after 1 h and 12 h, respectively. 151

Chemical labeling of proteins in living cells
Proteins function in signaling pathways and interacting networks in the complicated surroundings of cells and organisms.Fluorescent proteins (FPs) have revolutionized our ability to visualize and investigate protein function directly in living cells and organisms by fusion of individual proteins with FPs.Chemical probes, including organic dyes, are able to achieve properties that are not readily possible when using FPs.For instance, many organic dyes are superior to FPs in terms of brightness, photostability, far red-emission, environmental sensitivity, pulse-chase labeling and the flexibility for modifications to their spectral and biochemical properties.In the chemical tagging approach, a protein of interest (POI) is fused with a polypeptide tag, which is subsequently labeled with chemical probes.In general, these tags can be classified in the following categories: (1) metal chelation based peptide tags; (2) self-labeling peptide tags; (3) ligand binding domains (LBDs); (4) self-labeling enzymatic domains; (5) peptide sequences for enzymatic modifications; and (6) genetically encoded unnatural amino acids (UAAs) as "minimalist" tags (Table 4).

Metal chelation based peptide tags
Metal chelation methods have been adopted for affinity chromatography in protein purification.The principle has also been utilized for protein labeling, either by non-covalent complex formation or by chelation-driven affinity conjugation. 152,185xamples include the poly-histidine tag (His-tag, HHHHHH) and the tetra-aspartate tag [D4 tag, (DDDD) n , n = 1-3], which can be labeled using Ni-NTA probes and zinc complexes, respectively. 152,154,155,186-189Specific intracellular labeling of His-tagged proteins was achieved using cell-penetrating multivalent N-nitrilotriacetic acid (NTA) carrier complexes. 153The advantages of metal chelation based labeling can be attributed to the small size of the tag which confers minimal disturbance to protein function, the high labeling efficiency, selectivity and accessibility of various functional probes (Table 4A). 190

Self-labeling peptide tags
Tsien and coworkers reported the biarsenical FlAsH (fluorescein arsenical hairpin) as the first chemical surrogate to FPs for labeling proteins in live cells. 156A red version of FlAsH, a resorufin-based biarsenical (ReAsH), was developed later.][193] Table 4 Representative chemical tags for the labeling of proteins in cells However, this labeling technology suffers practically from nonspecific labeling of thiol-rich biomolecules in the cell and the toxicity of the biarsenical ligands. 194Nevertheless, the tetracysteine motif (CCXXCC) is much smaller than FPs.Other selflabeling peptide tags include the bisboronic RhoBo-tetraserine tagging system, 158 hydrazide-reactive (HyRe) tag, 159 SpyTag 160 and E3 tag 161,162 (Table 4B).
Prominent applications of self-labeling peptide tag approaches involve visualization of newly synthesized proteins and tracking of protein trafficking in live cells.This is readily achieved via the "pulse-chase" technique.The old populations of proteins were pulse-labeled by green-emitting FlAsH, while the newly synthesized proteins were chased by red-emitting ReAsH.Consequently, old and new copies of an individual  protein were labeled using two colors.In one example, this approach was used to elucidate the mechanism of connexin assembly and turnover in HeLa cells. 191In another example, the approach was employed to study AMPA receptor (AMPAR) trafficking.Regulation of AMPA receptor (AMPAR) trafficking is important for neural plasticity.GluR1 and GluR2 are two AMPAR subunits that play a key role in the activity-dependent trafficking of the AMPARs during long-term potentiation (LTP) and depression (LDT).In order to examine the trafficking and synthesis of GluR1 and GluR2, a tetracysteine motif (EAAAR-EACCRECCARA) was attached at the C-termini.ReAsH-EDT 2 was first applied and after 6-8 h, FlAsH-EDT 2 was applied to cells expressing tetracysteine-tagged GluR1 or GluR2 (Scheme 4B).In this case, the red ReAsH-EDT 2 labels all preexisting GluR1/2 subunits, while the green FlAsH-EDT 2 labels those AMPRAR subunits synthesized during the 6-8 h chase period.The measurements suggested that both GluR1 and GluR2 are synthesized in dendrites and that an activity blockade enhances the dendritic synthesis of GluR1 but not GluR2. 195

Ligand binding domains (LBDs)
The specific interaction between the ligand binding domain (LBD) and its small-molecule ligand confers specificity of labeling in cells.SLF', a derivative of the synthetic ligand of FKBP12 (FK506 binding protein), binds to FKBP' (FKBP12_F36 V mutant) with more than 1000-fold selectivity over the wild type FKBP12 protein. 163Based on this specific binding, proteins fused with FKBP' have been selectively labeled by SLF'conjugated probes in live cells. 164The antibiotic trimethoprim (TMP) is a specific inhibitor of E. coli dihydrofolate reductase (eDHFR).TMP binds to eDHFR with nanomolar affinity, which is over 1000-fold higher than the interaction with mammalian DHFR.As a result, the off-target labeling by TMP probes is minimal under certain conditions. 165ecause of the non-covalent binding, the labeling via the FKBP' or eDHFR tag is reversible.In order to achieve a stable labeling, the affinity conjugation approach has been introduced.A cysteine mutation is introduced in the proximity of the TMP binding site on eDHFR, which can be specifically labeled by TMP-acryloyl probes due to a proximity-induced effect. 196The reaction of a mildly reactive acryloyl group with other thiols in the cell is minimal under certain conditions. 197ased on the affinity conjugation principle, a rapid and fluorogenic TMP-AcBOPDIPY probe is developed with a half-life of less than 2 min for covalent labeling (Fig. 4A). 166Intracellular proteins fused with eDHFR_N23C were rapidly labeled by the TMP-AcBOPDIPY probe under no-wash conditions (Fig. 4B).In addition, the chemical probe displays a superior dynamic range in fluorescence lifetime imaging microscopy (FLIM) for intracellular FRET studies.
Recently, a more versatile "tagging-then-labeling" approach has been realized, enabling efficient introduction of bioorthogonal groups into proteins for bioorthogonal labeling in live cells.The TMP-AcAz ligand incorporates an azido group to proteins fused with the eDHFR tag (Fig. 4C).Subsequently, strain promoted cycloaddition reactions using DBCO-or BCN-conjugates facilitate protein labeling with various probes inside live cells (Fig. 4D). 111he eDHFR tag has been used for live-cell imaging of protein-protein interactions (PPIs) between the first PDZ domain of ZO-1 (fused with eDHFR) and the C-terminal YV motif of claudin-1 (fused with GFP) using time resolved luminescence resonance energy transfer (LRET) technique (Fig. 5). 198Conventional FRET imaging suffers from fluorescence breed-through, leading to high background.In the LRET approach, background signals from cellular auto-fluorescence and direct excitation of GFP were effectively eliminated by imposing a time delay of 10 µs between excitation and detection.
Photoactive yellow protein (PYP) is a small (14 kDa) soluble protein found in several purple bacteria.It binds to a natural cofactor, the CoA thioester of 4-hydroxycinnamic acid through transthioesterification with Cys69.It also binds to the thioester derivative of coumarin-3-carboxylic acid.Since PYP and its ligands do not exist in animal cells, they can therefore be employed for bioorthogonal labeling of proteins (k = 1.1-124M −1 s −1 ). 167,1684.Self-labeling enzymatic domains Enzyme-catalyzed reactions that proceed via irreversible conjugation with "suicide substrates" have been used for protein labeling in live cells.A variety of enzyme-substrate pairs are available.Many of these enzymes are able to tolerate modifications to their substrates.By generating fusion constructs with a protein, it is possible to covalently link the modified substrate to the protein.Generally, these reactions are specific, rapid and irreversible.
O 6 -Alkylguanine transferase (AGT), a human DNA repair protein, has been used as a self-labeling tag. 169,199The reaction involves the irreversible transfer of the alkyl group of O 6 -benzylguanine (BG) derivatives to the reactive cysteine residue within the enzyme to generate a covalently modified protein.More efficient AGT mutants, termed SNAP-tags (19 kDa), have been developed. 169An orthogonal AGT-based tag, termed a CLIP-tag, reacts specifically with O 2 -benzylcytosine derivatives. 157Scheme 4 Dual-color "pulse-chase" labeling of neurons using ReAsH and FlAsH for the visualization of the synthesis and trafficking of AMPA receptors.Single-molecule imaging often requires photo-stable and bright organic dyes, which are made possible using chemical protein labeling approaches.The spliceosome is a complex machine responsible for removing introns from the precursors of messenger RNAs ( pre-mRNAs).The SNAP-tag has been exploited in combination with the eDHFR-tag to enable singlemolecule imaging of the spliceosome in yeast cell extracts (Fig. 6A).The SNAP-and eDHFR-tags facilitate labeling pairs of the small nuclear ribonucleoprotein (snRNP) components of the spliceosome in cell extracts with bright organic dyes (TMP-Cy5 and BG-DY549), thereby enabling imaging of their assembly on individual pre-mRNAs.The measurements revealed that individual spliceosomal subcomplexes associate with pre-mRNA sequentially through an ordered pathway and that subcomplex binding is reversible. 200alo-tag is a modified haloalkane dehalogenase that covalently binds to synthetic chloroalkane derivatives. 171,201alo-tag is commercially available and has been widely used to label proteins in cells.This approach has been applied to small molecule-induced protein degradation, namely, Halo-PROTACs.PROTACs are a class of heterobifunctional molecules that link a ligand of E3 ligase to a ligand for a protein of interest (POI). 202PROTACs recruit the E3 ligase to the POI, resulting in its ubiquitination and subsequent degradation by the proteasome.A bifunctional HaloPROTAC contains a chloroalkane and a hydroxyproline derivative which binds an E3 ligase VHL.The compound induces binding between the HaloTag7 fusion protein and the E3 ligase, leading to the degradation of the HaloTag7 fusion proteins by the proteasome (Fig. 6B). 203-Lactamase is a small bacterial enzyme (29 kDa) that hydrolyzes β-lactam antibiotics.The E166N mutation of β-lactamase leads to the accumulation of acyl-enzyme intermediates due to the dramatic suppression of deacylation.The mutant β-lactamase, termed BL-tag, and β-lactam probes have been used for the covalent labeling of proteins in cells.170 Enormous efforts that have been made on β-lactam antibiotics render it possible to design various β-lactam probes.204 Cephalosporin-based probes featuring substituent elimination facilitate the development of fluorogenic probes.205 Other examples in this category include cutinase 172 and catalytic antibodies (Abs) 206 (Table 4D).

Enzymatic modifications
A number of post-translational modifications have been harnessed to specifically incorporate chemical probes or bioorthogonal handles into proteins (Table 4E).The first example is the modification of cell surface proteins using E. coli biotin ligase (BirA).The enzyme recognizes and biotinylates a 15-mer acceptor peptide (AP) sequence.Using a synthetic ketone-containing biotin isostere (keto-biotin) as a substrate, BirA can be used as a "ketone ligase" to introduce a ketone handle for oxime ligation. 83Another example is E. coli lipoic acid ligase, which is able to transfer a lipoic acid derivative carrying an azide 178 or a TCO moiety to proteins. 119The introduced azide and TCO moieties facilitate subsequent labeling using strained cyclooctyne and terazine probes, respectively. 119,178The formylglycinegenerating enzyme (FGE) specifically oxidizes the cysteine in the consensus LCXPXR motif to the formyl glycine.Subsequently, the aldehyde tag generated by the FGE on the target protein can selectively react with aminooxy and hydrazide probes. 79Reconstitution of the split inteins mediates protein trans splicing (PTS).After ligation of the N-and C-peptide fragments, the intein is eventually removed. 181By choosing appropriate naturally split inteins, PTS reactions can proceed efficiently in live cells. 182,207The bacterial enzyme AnkX from Legionella pneumophila has been used to transfer phosphocholine moieties from synthetically produced CDP-choline derivatives to a consensus sequence, TITSSYYR, at the termini or internal loop regions of a POI.The covalent labeling can be removed by another Legionella dephosphocholination enzyme Lem3.Many other enzymes have also been used for the selective modification of proteins, including phosphopantetheine transferase (AcpS or Sfp), [173][174][175]208 transglutaminases (TGases), 176,177 sortase (SrtA), 179 protein farnesyl transferase (PFTase), 180 glycosyltransferase, 209 N-myristoyl transferase (NMT) 210 and tubulin tyrosine ligase (TTL) 184 (Table 4E). Theadvantages of enzymatic modifications lie in the small size of the tag and the highly efficient reactions. However, many of the sutrates are not cell permeable and therefore are not suited for intracellular labeling.
The enzymatic modification approach has been elegantly applied to spatially-resolved proteomic mapping in living cells. 211An ascorbate peroxidase (APEX) fused with a "mito" sequence was targeted to the mitochondrial matrix.Labeling was initiated by the addition of biotin-phenol and H 2 O 2 to live cells.The resulting phenoxyl radicals are short-lived and membrane-impermeant and therefore only label neighboring endogenous proteins.The biotinylated proteins were recovered with streptavidin-coated beads and identified using mass spectrometry (Fig. 7).This approach led to the identification of 495 proteins within the human mitochondrial matrix, including 31 proteins that were not previously linked to mitochondria.4.6.Incorporation of UAA as a "minimalist" tag Unnatural amino acid (UAA) mutagenesis has emerged as a powerful tool for the site-specific modification of proteins. 212he incorporation of UAAs into a protein sequence can be considered as the introduction of a "minimalist tag" (e.g.only an individual amino acid residue as opposed to a peptide sequence or a protein domain).Unnatural amino acids can be co-translationally incorporated into proteins in either a residue-or a site-specific fashion.The incorporation of UAAs carrying small bioorthogonal groups followed by chemoselective reactions makes it possible to label proteins with a diverse range of probes.A pool of UAAs featuring a variety of bioorthogonal handles has been added to the genetic codes of E. coli, yeast, and mammalian cells. 34][215] This approach has also been used to make therapeutic proteins and to prepare a new generation of ADCs. 134More details about this topic are summarized in a recent review. 3

Concluding remarks
Selective protein labeling techniques have provided unprecedented views of protein structures, dynamics and functions in vitro, in live cells, and in whole organisms.These approaches have demonstrated the power of chemistry as useful tools in a wide range of biological research areas, including post-translational modifications of proteins, preparation of protein-based pharmaceuticals, super-resolution imaging, 117,216 visualization of intracellular protein-protein interactions, modulation of protein function in live cells, proteome labeling 49 and affinity-based protein profiling. 114espite successful applications in selective protein labeling, numerous challenges remain in bioorthogonal chemistry.Firstly, many bioorthogonal functional groups are not truly "bioorthogonal".For instance, strained alkynes may react with free thiols in live systems.Aldehyde and ketone functionalities are also present in many metabolites in living systems.Secondly, some of the bioorthogonal groups are too large and lipophilic, such as cyclooctynes and cyclooctenes, causing non-specific staining and reduction of effective reactants in live cells.Thirdly, the instability of tetrazine, phosphine and other groups in live cells during a prolonged incubation time leads to the deactivation of the bioorthogonal moieties.Fourthly, some metal-catalyzed reactions are incompatible with living conditions, as exemplified by the cytotoxicity of the copper ion used for CuAAC.To address these issues, the development of new bioorthogonal chemistry is definitely required.Since there are few "perfect" bioorthogonal reactions available, one has to carefully consider both the pros and cons of each reaction in order to identify the most suited ones for a particular application.
Chemical tagging approaches allow for the incorporation of probes or bioorthogonal handles into proteins in cells and organisms.However, most of these approaches rely on exogenous expression of the POI with tags or UAAs.Although the investigation of exogenous proteins is useful for unraveling biological processes, the advance in selective labeling of endogenous proteins should facilitate the proteomic analysis of cellular organelles and protein complexes, target identification and diagnosis.An advantage of chemical probes over FPs lies in the flexibility of modifications on organic dyes.Therefore, the development of new organic dyes with special properties, e.g.photo-switchable, activatable, highly fluorogenic, bright, far-red emissive, etc., should substantially help to understand the biological mechanisms of proteins in the context of living systems.

Fig. 5
Fig. 5 Time resolved LRET for live-cell imaging of protein-protein interactions using the eDHFR tag.

Fig. 4 (
Fig.4(A) The chemical structure of the TMP-AcBOPDIPY probe (BOPDIPY: boron phenyldipyrrolemethene). (B) Cellular labeling of eDHFR_N23C fused with a K-Ras C-terminal sequence (CAAX), Rab1, Rab5, and nucleus localizing sequence (NLS) at the plasma membrane (PM), the Golgi body, the endosomes, and the nucleus, respectively, using the TMP-AcBOPDIPY probe under no-wash conditions.(C) The "tagging-then-labeling" approach using the TMP-AcAz ligand and BCN or DBCO probes for labeling intracellular proteins in live cells.(D) Specific labeling of EGFP-eDHFR_N23C/L28C-NLS at the nucleus in living HeLa cells by the BCN-TAMRA probe.Scale bar: 10 µm. 183

Fig. 6 (
Fig. 6 (A) Experimental setup for single-molecule tracing of pre-mRNA splicing in yeast cell extracts using SNAP tagging and TMP tagging technologies.(B) Schematic depiction of HaloPROTAC in inducing degradation of HaloTag7 fused proteins.The chemical structure of one representative HaloPROTAC, HaloPROTAC-3 is given.Ub = ubiquitin; E3 = E3 ligase.