5-Hydroxymethylcytosine: the many faces of the sixth base of mammalian DNA

Epigenetic phenomena play a central role in cell regulatory processes and are important factors for understanding complex human disease. One of the best understood epigenetic mechanisms is DNA methylation. In the mammalian genome, cytosines (C) in CpG dinucleotides were long known to undergo methylation at the 5-position of the pyrimidine ring (mC). Later it was found that mC can be oxidized to 5-hydroxymethylcytosine (hmC) or even further to 5-formylcytosine (fC) and to 5-carboxy-lcytosine (caC) by the action of 2-oxoglutarate-dependent dioxygenases of the TET family. These findings unveiled a long elusive mechanism of active DNA demethylation and bolstered a wave of studies in the area of epigenetic regulation in mammals. This review is dedicated to critical assessment of recent data on biochemical and chemical aspects of the formation and conversion of hmC in DNA, analytical techniques used for detection and mapping of this nucleobase in mammalian genomes as well as epigenetic roles of hmC in DNA replication, transcription, cell diﬀerentiation and human disease.


Introduction
The key function of the genome is the storage, replication and transmission of the encoded genetic information.A meaningful and timely reading of the genome consisting of billions of recurring G:C or A:T base pairs in different types of cells is possible by using a sort of (epi)genetic ''bookmarking'' systema mechanism that allows living organisms to maintain or reprogram the identity of each cell during development or in response to environmental cues.A key player in this process is DNA methylation which occurs by enzymatic transfer of a methyl group from the ubiquitous cofactor S-adenosyl-Lmethionine (SAM) onto specific targets in DNA.Three major products of DNA methyltransferases are N6-methyladenine, N4-methylcytosine or 5-methylcytosine (mC).Notably, the biologically installed methyl groups do not alter the pairing specificity of the target nucleobases preserving the original genetic content of the genome.Their exposure in the major groove of the DNA helix (Fig. 1) enables readability of these ''steric'' groups by specialized cellular proteins, enzymes or large multicomponent complexes.These features make such modified bases well suited to serve as epigenetic marks for biological signaling that operates as an additional regulatory layer ''above'' the genome.All three types of DNA methylation are found in microorganisms and occur sequence-specifically.In vertebrate animals, the dominant methylation product is mC (Fig. 2a); the mC methylation occurs in sequence-specific (predominantly but not exclusively at CpG dinucleotides) and in a locus-specific manner. 1,2DNA methylation levels vary dramatically during development, but in somatic tissues the majority (70-80%) of CpGs are methylated except those localized in so-called CpG islands (genomic regions highly enriched in CpG sites). 3,4Traditionally, mCs, when localized at CpG islands, are important transcriptional silencers of gene promoters.Three major types of DNA methyltransferases are active on mammalian genomes.Initial methylation patterns are thought to be established by so-called de novo DNA methyltransferases Dnmt3a and Dnmt3b, whereas preservation of the CpG methylation marks across cell divisions is carried out by the maintenance methyltransferase Dnmt1.Underlying its importance, mC is often called the fifth base of DNA.

Occurrence of 5-hydroxymethylated nucleobases in DNA
The presence of hmC in DNA was first observed in certain bacteriophages.In T-even phages, 5-hydroxymethylated cytosine is incorporated into the genome during DNA synthesis and is subsequently modified by phage aand b-glucosyltransferases creating highly glucosylated DNA containing 5-glucosyloxymethylcytosine (glc-hmC) residues.Such DNA becomes resistant to cleavage by host restriction endonucleases.A similar modified base-J (5-(b-D-glucosyl) oxymethyluracil) (Fig. 1) is present in DNA of flagellated protozoa (which include parasites Trypanosoma brucei and Leishmania sp.) and closely related unicellular alga. 5It replaces up to 1% of thymines and is mostly found in telomeric repeats.It is produced by the oxidation of thymine residues in DNA by the JBP1/JBP2 2-oxoglutarate-dependent dioxygenases, followed by enzymatic glycosylation of the produced hmU.Base-J was demonstrated to be essential for transcription termination at the ends of the polycistronic gene clusters that are a hallmark of Leishmania and related trypanosomatids. 6,7he presence of hmC in genomic DNA extracted from the brain of adult mice, rats and frogs was first reported in the early seventies. 8However, no confirmation of this finding was obtained in other labs for a period of almost 40 years. 9n 2009, two groups independently re-confirmed the existence of hmC in mouse brain cells and mouse embryonic stem cells (ESC). 10,11By using thin layer chromatography Kriauc ˇionis and Heinz found that hmC constitutes 0.6% of total nucleotides in Purkinje cells and B0.2% in granule cells. 10Based on knowledge of the above-mentioned trypanosome proteins JBP1 and JBP2, Rao and co-workers identified their mammalian counterparts, the ten-eleven-translocation (TET) proteins TET1, TET2 and TET3 as potential modifiers of mC to hmC (Fig. 2b). 11The three homologs contain the common features of 2-oxoglutarate (2OG)-and Fe(II)-dependent dioxygenases.Two years later, the groups of Zhang and Xu have demonstrated that the TET enzymes are capable of further converting mC and hmC to 5-formylcytosine (fC) and 5-carboxylcytosine (caC) in vitro and in vivo, 12,13 and, independently, the presence of low levels of genomic fC has been identified in mouse ESCs by the Carell group. 14Altogether this evidence and subsequent studies convincingly demonstrated that hmC is indeed an endogenous biological component of mammalian genomic DNA (also named the sixth base of DNA) assuming the role of a key intermediate in the long searched pathway of active DNA demethylation.
Besides the well-established TET-mediated pathway, one possible source of hmC could be direct hydroxymethylation of cytosine in DNA.This chemical precedent comes from in vitro experiments with representative SAM-dependent C5-MTases, which unexpectedly showed that these enzymes can catalyze covalent addition of formaldehyde to the C5-position of their target cytosine residues in DNA yielding hmC. 15Formaldehyde is an essential metabolite naturally present at B0.1 mM concentrations in mammalian cells, tissues and fluids, although the concentration can vary depending on the cell type and physiological conditions. 16However, the significance of this chemo-enzymatic route has not been confirmed in vivo.
Another potential way of hmC arrival in DNA is through the nucleotide salvage pathway (NSP) (Fig. 3).In addition to de novo synthesis, cells recycle nucleotides arising from the breakdown of DNA (DNA repair, apoptosis etc.) or from nutrient uptake.Akin to deoxycytidine, modified cytidines could enter deoxynucleotide pools via an enzymatic phosphorylation cascade which involves deoxycytidine kinase (DCK) to produce a monophosphate, followed by conversion into a diphosphate by cytidine monophosphate kinases (CMPK), and phosphorylation into a triphosphate by a family of nucleoside diphosphate kinases.However, a key barrier to the formation of 5-modified dCTPs and entrance of the modified cytidines into DNA is a high selectivity of CMPK for unmodified dCMP leading to exclusion of nucleotides carrying C5modifications.Nevertheless, along with mC, hmC is readily deaminated to hmU at the level of dNMP (or dN depending on the cell type) [17][18][19] to produce hmU 5 0 -mononucleotide.Since thymidylate kinase (DTYMK) is promiscuous with respect to 5-modifications, hmdUMP could be phosphorylated and then incorporated into DNA, 19 where hmU then be targeted for excision by SMUG1 or TDG.To avoid cytotoxic complications associated with the appearance of hmU, cells deploy a dedicated enzyme -2 0 -deoxynucleoside 5 0 -monophosphate N-glycosidase (DNPH1 also known as RCL), which degrades hmdUMP eliminating it from the nucleotide pool. 17These two main barriers of the NSP guard the mammalian DNA from sporadic incorporation of 5-hydroxymethylated pyrimidines, and thereby preventing their potential interference with the regulatory mechanisms mediated by the mC-derived epigenetic modifications.This safeguarding system is often weakened in fast proliferating cancer cells, which confers them susceptible to treatment with modified cytidines as potential therapeutic agents.

Reversal of genomic cytosine-5 methylation
Loss of mC in genomic DNA can occur through a passive or an active DNA demethylation pathway.In the first scenario, the  methylation marks are diluted from DNA in a replicationdependent manner in the absence of the methylation maintenance activity on the daughter strand.This mechanism has been well proven to operate in many cases, when DNMT1 is downregulated or excluded. 20,21Alternatively, mC can be ''actively'' converted back to unmodified C in the framework of the same DNA strand independently of DNA replication.The existence of active DNA demethylation has long been known in plants.In Arabidopsis thaliana, a group of DNA glycosylases named Demeter excise mC by cleaving the N-glycosydic bond, resulting in an abasic site on the DNA strand. 22Then, AP lyases and AP endonucleases form a single nucleotide gap that is subsequently filled by action of DNA polymerases and ligases.However, no such glycosylases have been found to act directly on mC in mammals.The reality of active demethylation in vertebrates had long remained elusive, but it all suddenly became clear with the discovery of hmC in DNA.

Chemistry of DNA demethylation
As mentioned above, biological methylation is performed by DNA MTases which catalyze the S N 2 transfer of sulfoniumbound methyl group from the cofactor SAM to defined positions on DNA (Fig. 2a).Although all reactions, in theory, are reversible, only one example of direct DNA demethylation via an S N 2 reaction is known to date.O6-Methylguanine, a product of DNA alkylation damage by exogenous or endogenous compounds, is reverted to guanine directly via transfer of the O6-methyl group onto a cysteine residue of methylguanine DNA methyltransferase (MGMT) protein (Fig. 2c).This reaction requires stoichiometric amounts of the protein, which is a costly burden for the cell, perhaps justified by the rarity and severity (altered base-pairing) of the lesion.This seems to illustrate that a direct S N 2 demethylation is only instrumental for O-bound methyl groups, and perhaps it is chemically unfeasible in the case of N-or C-methylation due to the high thermodynamic stability of the N-CH 3 and C-CH 3 bonds.Indeed, in all other reported demethylation cases, a multi-step mechanism based on radical redox chemistry is deployed.The latter reactions are carried out by a large family of 2OG-dependent dioxygenases.For N-alkylated nucleobases (N3-methylcytosine, N6-methyladenine), an enzymatic oxidation by the AlkB family of oxygenases leads to corresponding N-hydroxymethyl derivatives (Fig. 2d), which then undergo spontaneous hydrolytic release of formaldehyde from the nitrogen to directly generate the original unmodified base.Demethylation of biological N-methyl marks in histones occurs through a similar route.However, it turns out that hydroxymethyl groups produced at the C5-position of cytosine are chemically quite stable and long-lived under physiological conditions.In this case, additional cycles of enzymatic hydroxylation are required to produce modified bases that ''appear'' like DNA lesions and can be further processed by the repair machinery to ultimately yield unmodified cytosine (Fig. 2b).
As mentioned above, the TET proteins were identified in mammals as the modifiers of mC to hmC 11 and were also shown to be capable of oxidizing the formed hmC to fC and caC (collectively ox-mC). 12,13He et al. demonstrated that mC and hmC are almost fully (90%) converted to caC by TET1 and TET2 without appearance of fC, whereas Ito et al. report that fC accumulates relative to caC. 12,13Subsequent enzymatic and structural studies showed that TET2 can yield fC and caC by iteratively acting in a single encounter with mC-containing DNA, without release of the hmC intermediate; but once released, hmC is a less favorable substrate than is mC. 23,24ltogether, only a fraction of hmC is further oxidized to fC/caC, making hmC a rather stable cytosine modification.Altogether, these results suggest that the efficiency and the final product of the oxidation steps performed by TET proteins depend on various conditions which are not fully understood yet.

Modulators of TET activity
Not surprisingly, the enzymatic activity of the TET proteins can be affected by the availability of the small molecules that are involved in its catalytic reaction: oxygen (O 2 ), 2-oxoglutarate (2OG) and Fe(II) (Fig. 4).The activity of TET enzymes often, but not always, 25,26 is reduced in hypoxic environment due to oxygen shortage, 27 which is one of the key elements in cancer development and potential treatment targets (reviewed in 28 ).Similarly, exogenous 2OG increases TET activity and the hmC/mC ratio in cultured mouse embryos, 29 in different tissues of adult mice 30 and significantly improves anti-cancer treatment through epigenetic modulation. 31Intracellular levels of 2OG are thought to direct the timing of ESC differentiation. 32n the other hand, certain metabolic components such as N-oxalylglycine (NOG) [33][34][35] and 2-hydroxyglutarate (2HG) 36,37 can act as potent inhibitors of 2OG-dependent oxygenases.Similarly, supplementation with Fe(II) in cell culture results in elevated levels of hmC. 38Iron-deficient diets during neurodevelopment result in decreased TET activity and reduced global hydroxymethylation in rat brain. 39Apart from the reaction substrates, it was found that vitamin C enhances the abundance of ox-mC in various cell cultures, from iPSCs and ESCs to cancer cell lines and also activates histone demethylation by Jhdm. 40Although it was originally prescribed a role in maintaining iron in the reduced form Fe(II), no other reducing agents were found to exert similar stimulation of the oxygenase activity.Biochemical studies showed direct interaction of this compound with the catalytic domain of TET2. 41Curiously, vitamin C in its hydrated form can itself serve as a substrate in an oxidative mC modification catalyzed by an algal TET homolog, CMD1 (Fig. 4). 42The latter finding, highlighting a structural similarity between 2OG and ascorbate hydrate (both contain the 2-oxocarboxylate moiety), may point at some other mechanisms of ascorbate-induced TET stimulation apart from keeping iron in the reduced state; for example, if TETs could directly utilize ascorbate for the generation of the active oxygen species in lieu of 2OG, the reaction would then yield 5-carbon This journal is © The Royal Society of Chemistry 2024 in DNA upon sporadic deamination of mC, was found to directly excise fC and caC, while leaving mC and hmC untouched. 13,44The produced abasic sites in DNA are then repaired via the enzymatic cascade of the base excision repair (BER) whereby the nucleotides are replaced by newly incorporated unmodified nucleotides.Based on these observations, a pathway for active demethylation involving iterative mC oxidation by TET proteins coupled with TDG-mediated BER has been proposed (Fig. 5).A major caveat to this mechanism was that double-strand DNA breaks could be generated at densely methylated loci or if demethylation occurred on both strands of a CG/CG site.However, these concerns were alleviated by productive sequential demethylation of symmetrically methylated CGs and thereby avoiding DNA double-strand break formation in a reconstituted TET-TDG-BER system in vitro. 45It is now well established that both BER and nucleotide excision repair (NER) pathways play a central role in active DNA demethylation in the mouse zygote. 46,47n alternative pathway for hmC processing was proposed based on the ability of some DNA glycosylases to excise hmU, which might occur via deamination of hmC (Fig. 5). 48Notably, some DNA glycosylases, single-stranded monofunctional uracil-DNA glycosylase 1 (SMUG1) and MBD4 have no significant activity for excision of caC or hmC, 13 but they efficiently remove hmU. 49,50DG was also shown to exhibit excision activity against hmU-G mispairs in dsDNA. 49This pathway relies on the assumption that AID/APOBEC deaminases (or some other unknown deaminase) can effectively deaminate hmC in duplex DNA in vivo, although in vitro studies indicated a strong preference of these enzymes for unmodified cytosines located in single stranded DNA.Some studies have suggested that hmU likely does not result from the deamination of hmC, as most hmU in DNA originates from the TET-catalyzed oxidation of thymine. 51Therefore, the deamination-mediated pathway still requires strong biochemical support. 52,53et another way, which lately gained substantial support, is based on direct conversion of hmC, fC or caC to C via a C-C bond cleavage, and thus altogether avoiding the troubles related to the generation of abasic sites.For example, it was shown that active demethylation in certain cases can occur in the absence of TDG.An early hint was provided by DNA C5-MTases, which are not only capable of coupling of formaldehyde to cytosine, but can also promote sequence-specific conversion of hmC 15,54 or caC 55 to unmodified C in vitro.The MTase-directed reaction proceeds via a covalent intermediate at C6 (Fig. 6a) resembling the thiol-mediated 56 or bisulfite-mediated decarboxylation of caU and deformylation of fC (Fig. 6). 57These chemical precedents suggest that certain DNMTs or some dedicated enzymes may in principle perform the removal of the oxidized groups (5hydroxymethyl, 5-formyl or 5-carboxyl) to give unmodified cytosine residues. 14,539][60] It seems that this scenario is operational in certain cases, however, no enzyme or other cellular component directly responsible for these reactions has yet been identified.this modified nucleotide as well as for the development and validation of new analytical techniques.Chemical synthesis of hmC containing oligodeoxyribonucleotides was first proposed in 1997, 62 which used phosphoramidite precursors with reversible 2-cyanoethyl O-derivatization of the 5-hydroxymethyl group along with the canonical N-benzoylation of the exocyclic amine.More recently, an alternative strategy based on N,Ocarbamoyl bridging of both exocyclic groups afforded improved yields and purity of the target oligomers. 63,64Both types of phosphoramidites are commercially available and used by DNA synthesis services.

Production and analysis of hmC in DNA in vitro
Akin to the cellular reactions, hmC-modified DNA can in principle be produced from methylated DNA in vitro by oxidation of mC residues using TET oxygenases but the reaction should be firmly controlled to avoid further oxidation products.This has been demonstrated by directed engineering of the catalytic processivity of human TET2 oxygenase 65 and further by developing a series of reactions permitting interconversion of mC and its oxi-forms for analytical applications (see below).The positions of produced hmC residues would then be determined by those of the original mCs.Direct sequence-specific installation of hmC in DNA is possible using an atypical chemoenzymatic reaction of DNA-C5 methyltransferases, 15,66 which catalyze a reversible coupling of formaldehyde to their target cytosine residues in vitro.The reaction is highly specific for the methyltransferase target sites, although the efficiency of hydroxymethylation may vary for each enzyme used.Random replacement of C with hmC in DNA in vitro 67 or in vivo 68 is achieved by supplying the 2 0 -deoxynucleoside-5 0 -triphosphate (hmdCTP) in DNA polymerase-dependent strand extension reactions including PCR.

Detection and quantification of hmC
Several techniques have been developed or adapted for detection and quantitation of hmC in DNA.1][12] Despite the high sensitivity, its use is more and more limited due to hazards associated with handling and disposing of radioactive materials in standard laboratories.
The gold standard for analysis of global nucleoside composition of unlabeled DNA samples is liquid chromatography coupled with UV or MS detection.A higher selectivity and sensitivity can be achieved with modern MS/MS detectors, and reliable quantitation of the nucleosides is possible using synthetic stable-isotope labeled internal standards. 69,70These methods are well suited for quantitation of global hmC, fC and caC levels in genomic DNA, however they require specialized equipment.A high-throughput version has been developed using direct-injection MS, whereby DNA hydrolyzates are This journal is © The Royal Society of Chemistry 2024 mass-analyzed without chromatographic fractionation of nucleosides. 71ue to the rarity of fC and caC and sometimes hmC, special care is required during sample processing 72 and base-specific derivatization could be used to increase the sensitivity (by MS or optical detection) and enable enrichment of the target fragments from bulk genomic DNA. 73,74A particularly useful analytical modification of hmC is its enzymatic glucosylation to a much bulkier and distinctive residue, 5-glucosyloxy-methylcytosine, using T4 b-glucosyltransferase (BGT) and the uridine-5 0 -diphospho-D-glucose (UDP-glc) cofactor or its analogs (see below).The BGT reaction is robust and highly selective for the 5-hydroxymethyl group of hmC (or hmU).Such treatment can be used to attach tritium labeled glucose moieties from UDP-[ 3 H]glucose to DNA permitting direct quantification of hmC by scintillation counting. 75ew hmC detection methods that involve fluorescence strategies such as fluorescence resonance energy transfer, 76,77 fluorescent tags introduced by enzymatic 78-80 and chemical 81 functionalization have been developed in recent years.These methods offer high sensitivity, specificity and are usually amplification-free.Their combination with different amplification techniques, rolling circle amplification, 82 loop-mediated isothermal amplification, 83 and isothermal exponential amplification 84 accomplishes hmC detection even more sensitively and in a simple and time-saving manner.Furthermore, the unique MTase-catalyzed reaction to replace the hydroxylgroup of hmC by sulfhydryl group from cysteamine 15 was combined with fluorescence 85 or electrochemiluminescence 86,87 and photoelectrochemical methods 88,89 to create sensitive biosensors for detection of hmC in tissues.

Technologies to map hmC in genomic DNA
Numerous studies showed that although all mammalian tissues contain similar levels of mC (4-6% from C), 2,90 hmC amounts are tissue dependent and vary from 0.03% to 1.2%. 75,9112]92 hmC levels increase with age 93 and are strongly depleted in tumor samples, suggesting the utility of hmC as a prognostic biomarker (see below).Other forms of ox-mC, fC and caC, are present at 100-1000-fold lower levels and vary from 0.00002 to 0.002% for fC, whereas caC amounts do not exceed 0.0003% from C. 12,94,95 The sparsity of ox-mC in genomes poses challenges in developing a universal method that would be sensitive and specific and most importantly, would inform on the cytosine modification status of each CpG.A myriad of methods has been developed over the years after the discovery of hmC to investigate its genomic profiles, which can be divided into enrichment-based methods and single nucleotide resolution approaches (Fig. 7).The methods considerably differ in genomic coverage, experimental and analysis strategies and cost.

Affinity enrichment-based methods
Methods based on affinity enrichment rely on selective binding of short hmC-containing DNA fragments (usually 200-500 bp) to hmC-specific antibodies, other hmC-binding proteins or derivatization of hmC with reporter groups permitting their physical extraction from the rest of DNA for analysis using quantitative PCR, DNA microarrays or sequencing.The length of fragmented genomic DNA defines the resolution limit of all enrichment-based methods.Akin to methylated DNA immunoprecipitation (MeDIP), antibodies have been raised against hmC 96,97 for immunoprecipitation of hmC DNA and its subsequent sequencing by the hMeDIP (hydroxymethylated-DNA immunoprecipitation) approach (Fig. 7b).However, the results from different studies show considerable variation even in the analysis of the same genomes, perhaps due to typical shortcoming of antibody-based pull-down approaches, such as possible cross-reactivity with methylated and unmodified cytosines, as these bases bear very few chemical differences for discrimination.Very recently the reliability of the current DIP protocols has been put into question, as differences were shown to arise due to the intrinsic affinity of IgG antibodies for short unmodified DNA repeats. 98he development of covalent hmC modification significantly advanced the sensitivity and specificity of enrichmentbased strategies.To discriminate between mC and hmC, many of such approaches involve a highly selective enzymatic glycosylation of hmC using BGT to a much bulkier and distinctive residue glc-hmC.The most widely used technique is hMe-Seal in which BGT catalyzes hmC derivatization by transferring an azide-modified glucose from the chemically modified cofactor This journal is © The Royal Society of Chemistry 2024 analogue UDP-glc-azide. 99A biotin is subsequently attached through click chemistry and used for selective streptavidinbiotin pull down of hmC-containing DNA and sequencing (Fig. 7b).Using this technique, a myriad of studies to map hmC genome-wide in various cell lines, tissues and cfDNA have been performed. 100,101Its further advancement, Nano-hmC-Seal, demonstrated hmC mapping capabilities of ultra-low quantities of genomic DNA. 102n alternative chemo-enzymatic derivatization of hmC that requires no UDP-glc-azide cofactor has been proposed by Liutkevic ˇi % ute ˙et al. 103 It was found that M.SssI (and a few other DNA C5-MTases) can catalyse sequence-specific replacement of 5-hydroxyl groups in hmCpG sites with alkylthio moieties carrying functional amino groups.The attached aliphatic amino group permits chemical ligation of NHS-biotin reagents for selective enrichment and analysis of hmC containing DNA.

Tethered oligonucleotide-primed sequencing
To push the resolution limits from 200-500 bp offered by the enrichment-based methods to a single nucleotide, an approach called TOP-seq (Tethered Oligonucleotide-Primed sequencing) has been proposed (Fig. 7b).The salient feature of the approach is that a DNA oligonucleotide is covalently tethered instead of a biotin moiety to enable non-homologous priming of the DNA polymerase strand synthesis right at the covalently tagged CpG genomic site. 104In its further extension, hmTOP-seq, a DNA oligonucleotide is tethered through copper(I)-promoted azidealkyne click chemistry on BGT azide-derivatized glycosylated hmC residues, which then serves as a primer for generating amplicons in which the starting sequence marks a precise position of the original hmC in the genome. 66Pilot hmTOPseq studies demonstrated wide applicability of the method for high-resolution and cost-efficient profiling of hmC in mammalian genomic samples and cfDNA. 26,66,105Independently, a similar approach has been proposed that exploited copper-free click chemistry to attach a DNA hairpin to hmC for sequence readout around (20 bp window) the modified cytosines; 106 in this case, a much bulkier tethering moiety is produced which seems to negatively affect the precision and efficiency of the polymerase priming as compared to the copper(I)-promoted tagging. 66

Chemical conversion-based base resolution methods
For many years the gold standard method for mC profiling genome-wide at single C resolution was bisulfite sequencing (BS-seq).The differential reactivity of mC and C in the presence of bisulfite, forms the basis for the method: mC is very stable to bisulfite-promoted deamination and is subsequently read as normal C whereas unmodified C is read as T. With the discovery of ox-mCs, the informativity of standard bisulfite treatment appeared insufficient as the bisulfite attack at the 5-hydroxymethyl group yields a hydrolytically stable 5-sulfonylmethylcytosine (smC, also called cytosine 5-methylenesulfonate) and cannot be discriminated from mC (Fig. 7a and 8).Moreover, the products of further hmC oxidation, caC and fC, are interpreted as unmodified C (reads in the T lane).
The indistinguishable behavior of mC and hmC or caC and fC in BS is most likely the reason why ox-mC were long missed in DNA modification analysis.Therefore, as the standard bisulfite chemistry provides only an aggregated mC + hmC signal versus unmodified cytosine (plus very rare fC and caC), this evoked the demand for specific pre-treatment approaches to detect hmC.Two most widely used methods are based on enzymatic or chemical oxidation of hmC prior to standard BS (Fig. 7a and 8).The TET-assisted bisulfite sequencing, TABseq, 107 is a chemoenzymatic approach that exploits the protection of hmC by glycosylation and subsequent oxidation of mC to caC by the mouse TET1 enzyme.Following bisulfite sequencing, hmC is read as C, while all other cytosine forms are interpreted as unmodified C (reads in the T lane).In another method called the oxidative bisulfite sequencing, OxBS, 108 hmC is oxidized by potassium perruthenate (KRuO 4 ) into fC and then is read as T, whereas mC remains as C. The absolute amount and genomic positions of hmC in OxBS are determined by the subtraction of the hmC track from the standard BS signal and thus, necessitates parallel processing of two samples.The subtraction-based analysis strategy to discriminate between hmC and mC is also required in TAB-seq (the TAB-seq signal is subtracted from BS) which considerably increases the analysis cost, as sequencing depth for each cytosine must be considerably high to detect low-level hmCs in both approaches.
On the downside, bisulfite treatment leads to extensive degradation of input DNA 109 while deamination of all unmodified cytosines causes analytical challenges due to reduced sequence complexity.This sparked the appearance of alternative bisulfite-free single base analysis methods (see Fig. 7a and 8).Such novel chemical strategy is used in a chemoenzymatic bisulfitefree method, termed TET-assisted pyridine borane sequencing, TAPS. 110The method involves TET proteins to convert mC and hmC into caC and then, caC is reduced to dihydrouracil by pyridine borane (PB) and is sequenced as T. The important advancements of pyridine-borane chemistry is the direct readout of modified bases preserving unmodified C intact and less destructive nature that allows improved sequence quality, mapping rate, and coverage compared to BS. 110 Although the first protocol of TAPS provided information on the aggregated mC + hmC signal, the recently introduced optimizations to replace the TET-mediated oxidation by the treatment with potassium ruthenate (K 2 RuO 4 ) and the protection of hmC with glycosylation prior to PB (or picolyl borane (pic-borane)) conversion enabled an independent mapping of hmC and mC in the CAPS and TAPSb approaches, respectively. 111Another novel bisulfite-free chemical strategy to detect hmC, termed hmC-CATCH, is based on the selective oxidation of hmC to fC with K 2 RuO4, and subsequent labeling of newly generated fC using derivatives of 1,3-indandione that results in the transformation of hmC into T in sequencing, while endogenous fC is blocked before the oxidation reaction (Fig. 7a). 112hmC-CATCH causes minimal DNA degradation, and offers single base hmC analysis with limited amounts of input DNA as shown in application of this method to analysis of cfDNA.

Enzyme-assisted base resolution methods
Milder treatment of DNA compared to BS provides advantages in the ACE-seq (APOBEC-coupled epigenetic sequencing) and EM-seq (Enzymatic Methyl-Seq) methods which both employ hydrolytic deamination of C into U using the DNA deaminases from the AID/APOBEC family (Fig. 7a). 113,114The ACE-seq approach provides direct sequencing of hmC, whereas EMseq detects the common mC + hmC signal and demands for the ACE-seq acquired signal subtraction to localize mC.Although both APOBEC deamination-based methods contain a non-destructive feature, they still may suffer from the low complexity of base contents after C deamination, and thus, analysis of low DNA amounts may be complicated.A similar set of enzymatic nucleotide conversions is used in a recently proposed sequencing platform that can derive the four genetic bases and two epigenetic states of cytosine (mC and hmC) avoiding lane subtraction.This is achieved by linking and intricate processing of both strands of each genomic fragment in one experimental workflow. 115striction endonuclease cleavage has been long employed in DNA modification analysis as a simple to perform and costefficient strategy.Owing to the strict sequence specificity of restriction endonucleases, such methods offer base-resolution DNA modification profiling.However, since their target sites are longer than two nucleotides (typically 4-6 bp), the approach can query only a small subset of all genomic CpG sites.Several methods for hmC analysis have been developed that combine restriction enzyme digestion with enzymatic glucosylation of hmC.The first of them adapted for genome-wide analysis of hmC employs the differential sensitivity of the HpaII and MspI restriction endonucleases to cytosine modification within their target sequence CCGG.BGT-directed glycosylation makes ChmCGG sites resistant to MspI cleavage, which can be enriched and analyzed using qPCR, 116 microarrays or nextgeneration sequencing. 117,118ore recent example strategies include Aba-seq approach 119 and Pvu-Seal-seq. 120In Aba-seq, the AbaSI restriction endonuclease cleaves at a narrow range of distances away from the recognized glycosylated hmC and then, the cleaved fragments are pulled down and sequenced.The PvuRts1l restriction endonuclease recognizes hydroxymethylated target sites and cleaves 11-12 bp downstream in Pvu-Seal-seq (see Fig. 7b).The PvuRts1l-cleavage is further combined with the enrichment of hydroxymethylated target sites by covalent labeling of hmC with BGT and UDP-azido-glucose for specific separation of modified DNA as described for the hmC-selective chemical labeling approach hMe-Seal (see above).Although all restriction endonuclease-based methods cannot determine the absolute amounts of hmC genome-wide, their inherent feature is sensitivity to detect lowly hydroxymethylated target sites without requirement for deep sequencing necessary in BS-based approaches.

Single-molecule long read biophysical detection
Several approaches permit single-molecule profiling of epigenetic modifications in native stretches of DNA.One such method is optical mapping of hmC in native chromosomal DNA.This method uses a two-step labeling of hmC with a fluorophore, followed by stretching the DNA strands in nanochannels for fluorescence image analysis.Physical positioning of the fluorophores along the linear DNA strand generates epigenetic profiles of long DNA molecules at the sub-megabase scale, 102 which can be interpreted alone or along with a genetic (sequence-specific) fluorescence profile.The mapping resolution of 1 kb was determined by the detecting microscope technology and physical fluctuations of the DNA strands in the nano channel, which can reach 10 nm or just 20 bp as demonstrated in pilot experiments using DNA stretching on solid surfaces and dSTORM imaging. 121hird-generation sequencing technologies generally avoid DNA derivatization and DNA amplification whereby native stretches of DNA are sequenced directly.In nanopore sequencing, commercially offered by Oxford Nanopore Technologies (ONT), sequencing is achieved by reading ionic current profiles as a nucleic acid strand passes through a nanopore.Single-molecule Fig. 8 Chemical conversions of C5-modified cytosines for their analysis by DNA sequencing.Upon treatment of DNA with bisulfite (leftward reactions), hmC forms a hydrolytically stable 5-sulfonylmethylcytosine (smC) and cannot be discriminated from unreactive mC (both read in lane C, hCi), whereas caC and fC are deaminated and interpreted as unmodified C (read as T).hmC can be oxidized by potassium perruthenate (K 2 RuO 4 ) into fC and sequenced directly (read as C) or further derivatized using an azido derivative of 1,3-indandione (fC-AI) for biotin pull down sequencing (read as T).In pyridine-borane treatment-based methods, mC and hmC are first converted to caC and then caC is reduced to dihydrouracil (DHU) and is read as T, whereas C remains unaffected during the pyridine-borane workout (bottom right).
This journal is © The Royal Society of Chemistry 2024 real-time (SMRT) sequencing, offered by Pacific Biosciences, tracks the incorporation of fluorescently labeled dNTPs by a DNA polymerase immobilized in a zero-mode waveguide (Fig. 7C).In this case base modifications present on a template strand alter the kinetics of nucleotide incorporation (dNTP arrival and residence times) permitting their identification in multiple sequencing rounds.In both cases, DNA modifications result in subtle and context-dependent deviations of the signal measured.Therefore, it comes to extensive signal processing and machine learning using in vitro prepared training DNA substrates, to decipher the presence and identity of the modified bases. 122Although a proof of principle for the discrimination among the five cytosine variants in certain CpG contexts by their nanopore signal signatures has been demonstrated, 123 selective signal enhancement for improved SMRT detection of mC and hmC can be achieved by their enzymatic conversion to caC 124 and glc-hmC, 125 respectively.Both ONT and PacBio have integrated their tools into their default sequencing software.For native DNA, currently detection is restricted to CpG sites and is only available for mC and hmC (ONT Remora) or mC (PacBio).Further dynamic developments in third-generation sequencing seem about to deliver the benefits of single molecule sensitivity, reliable discrimination of native cytosine modifications, long reads and high throughput automation.

Structure and interactions of hmC-DNA
6.1.7][128][129][130] There is a general tendency for reduced twist and increased roll angles when C is substituted by mC or hmC, probably due to a steric hindrance of the cytosine 5-substituent. 129hmC results in a 0.8 Å (4.5%) widening of the major groove at the site of modification, 127 similarly to mC or other C5 modifications, 131 reviewed in. 132he hydroxymethyl moiety extends into the major groove (Fig. 9), and the main orientation of the hydroxyl group (70-80% occupation) points towards the 3 0 neighboring base.An alternative conformation (20-30%) has the hydroxyl interacting with a backbone phosphodiester oxygen via a bridging water molecule. 127,128,130NMR experiments showed no evidence for increased imino tautomerization of hmC consistent with Watson-Crick base pairing.The dynamic opening rates of the modified base pair is similar to that of C or mC and do not correlate with the excision preferences of the oxidized mC forms by TDG. 44,1305][136] As for hmC, there is no unanimous agreement on whether it makes DNA duplex more flexible or more stiff, more stable or more melting-prone as compared to unmodified and methylated DNA.UV melting temperature, circularisation with ligation or FRET, nucleosome assembly/disassembly, molecular force assays, molecular dynamics simulations, etc. -the in vitro approaches give conflicting results among independent studies or even in the same study among different DNA sequences or the same sequence with different hmC modification densities.In general, hmC is considered to reverse the impacts of methylation on DNA physical properties.Numerous studies have shown that while mC increases the duplex melting temperature, hmC brings it back closer to the one observed with C. 68,126,127,129 A single-molecule mechanical force study showed that the mechanical stability of DNA (resistance to zipper-mode strand separation) increases with the number of hmC bases. 137ifferent approaches have been taken to evaluate the effects of epigenetic DNA modifications on the DNA flexibility with anticipation of finding a direct mechanistic connection to nucleosome binding and gene expression regulation.However, DNA circularisation with either ligation or FRET, nucleosome assembly or disassembly, and different molecular dynamics simulations have led to contrasting conclusions. 129,136Finally, there can be differences between effects of solitary hmC modifications and their cooperative impact in larger clusters. 129,131mong non-canonical DNA structures, G-quadruplexes (G4) have emerged as a significant player in regulation of gene expression, replication and other genome metabolic processes.G4s are shown to form not only in telomeric regions but also throughout the genome in promoters, replication start sites, 138 therefore partly overlapping with methylation sites. 139While some studies find no marked changes in the formation or stability of G4 structures after incorporation of hmC, 140 others report more complex results, 138 and infer their significance from the perspective of structural impact on promoter G4s formation in senescent cells.

Protein-DNA interactions
Clearly, the thermodynamic and structural effects of the natural cytosine modifications on DNA are quite weak and diverse to account for their biological effects on physical grounds only.Rather the dominant belief is that the regulatory roles of the cytosine modifications are largely exerted at the level of protein-DNA interactions. 141To this end, compared to a methyl group, a hydroxymethyl group is slightly larger, more polar and can serve as a donor or acceptor of a H-bond.However, for the most part dedicated hmC ''readers'' have remained elusive as a large fraction of hmC binders would also bind mC. 86,142,143For example, methyl-CpG-binding protein 2 (MeCP2) can bind mC and hmC, 144,145 although the methyl-CpG binding domain protein 1 (MBD1) can specifically bind to mC but not to ox-mCs. 146Altogether, hmC appears incapable of interacting with a number of proteins that recognize mC, suggesting that hmC is an anti-binding modification that serves to exclude mC readers. 147Nonetheless, there are proteins that bind mC and hmC using distinct interaction modes.A particular thermodynamic signature of binding hmC-DNA, compared to methylated or unmethylated DNA, has been shown for MeCP2; mutations affecting the hmCpG -MeCP2 interactions are associated with the phenotype of the Rett syndrome. 148UHRF2 has been shown to use a base-flipping mechanism, first discovered in a DNA MTase, 149,150 for exclusive recognition and enhanced binding of hmC-DNA. 149ltering the target specificity or de novo engineering artificial hmC binders for biotechnological applications have been demonstrated for TALEs 151 and MeCp2. 152Notably, plant glycosylases DME and ROS1, which serve to remove mC in plants, have also been shown to excise hmC in vitro.As no hmC was detected in Arabidopsis thaliana, this activity may simply reflect a promiscuous substrate specificity of this family of proteins, which could be applied for epigenetic editing. 22other important class of proteins are the mammalian CpG-specific DNMTs.In particular, DNMT1 which is largely involved in replication of the methylation patterns on the daughter strand, shows a strong preference (80-fold on average) for hemimethylated CpG sites as compared to unmethylated sites.Several studies using different model DNA substrates and variants of the protein pointed towards the notion that hemihydroxymethylated CpG sites (hmCG/CG) are poor substrates (similar to unmodified sites) and thus hmC marks would be passively erased in the progeny. 50,153However, in more recent studies, two groups reported that the DNMT1 methylation preferences in vitro gradually decline in the row mC 4 hmC 4 fCEC, 154 and the mC/hmC preference is on average only 13-fold while the hmC/C maintenance preference is still B7-fold. 155These findings suggest that hemi-hydroxymethylated sites could be partially maintained (appear as attenuated hemimethylated sites) and thus passive DNA demethylation by hmC generation may not not be as efficient as previously thought.As the DNMT1 activity is strongly regulated by many external factors and features on target DNA, the role of hmC in the loss of methylation patterns during DNMT1-directed methylation maintenance in vivo requires further attention.
As discussed above, in line with its distinctive chemical properties (chemically active aldehyde group), fC is a target for numerous specific binders. 142[158]

Biological roles of hmC in mammals
The discovered active and passive mC removal provided evidence that DNA methylation is a bi-directional and dynamically regulated process; methylation and demethylation events occur at different times and genomic regions under different developmental programs, and environmental cues.After more than a decade of hmC studies, its biological implications are beginning to emerge and are still under extensive debate.Although hmC is generated from its precursor mC in the active TETmediated demethylation pathway, hmC appears to persist in many cell types, 159 arguing against hmC as being solely a demethylation intermediate.

hmC role in replication, repair and recombination
An intimate interaction of hmC and DNA repair stems from the involvement of BER/NER pathways in the TET-induced DNA demethylation (Fig. 5).TET activity leading to enhanced levels of hmC also generates more fC and caC, which in turn activate BER.This idea is consistent with a decreased mutational frequencies in hmC rich regions, in contrast to mC, which marks hotspots of genomic CpG depletion due to mC -T mutations. 160But, notably, hmC, as opposed to related nucleobases hmU, U, fC and caC, is not a substrate for the base This journal is © The Royal Society of Chemistry 2024 excision machinery, and thus can accumulate to substantial levels in genomic DNA.
Genomic profiling studies detected elevated hmC levels at the sites of DNA damage 161 and at the stalled replication forks. 162The presence of hmC was also found at DNA double strand breaks during meiotic recombination of gametes. 163,164t was suggested that the conversion of mC to hmC may prevent binding of factors with affinity for mC. 163However, experiments in which the TET activity is altered to vary the levels of hmC, do not make it clear if the generated hmC is directly involved in recruiting the DNA repair components to the problematic loci, or is just a passive by-stander produced along with the actively excised fC and caC.
Replication origins in mammals, which are not clearly marked by DNA methylation, specific histone modification or consensus sequences, have been shown to be enriched for hmC. 165hmC appears to direct the assembly of the prereplication complex in G1 and M phases, however, prior or during origin firing, hmC has to be removed and then again installed in newly replicated DNA.Therefore, elevated hmC levels delay the progression through the G1 phase, reducing cell division.These observations point at direct involvement of hmC in cell cycle regulation and may explain why the nondividing neuron populations show the highest hmC content, whereas rapidly dividing neural progenitors are almost devoid of hmC. 165n the other hand, it was proposed that, vice versa, the genomic hmC content may in fact depend on the duration of the cell cycle. 159Metabolic isotope labeling of cytosine derivatives in DNA of mammalian cells and tissues, showed that, in contrast to DNA methylation, which occurs during or immediately after replication, hmC forms slowly over a 30 h period following DNA synthesis.This delayed hmC appearance thus can explain why rapidly diving cells would contain less hmC, in line with the observed inverse correlation between the global levels of hmC and the rate of cell proliferation.

hmC involvement in transcription
Rausch et al. have measured mC and hmC effects on the transcription and replication of the whole genome in live bacterial, yeast and mammalian cells.They conclude that while mC stabilizes the DNA helix and reduces the DNA helicase and RNA/DNA polymerase speed, hmC reverts the duplex stabilizing and genome metabolic effects to the level of unmodified cytosine. 68In biochemical assays, the presence of hmC at the ICV promoter strongly inhibited transcription, while its presence within gene bodies had almost no effect on transcription. 166ifferent hmC profiling studies have found its preferential distribution at gene bodies of active genes, 3'UTRs, active or poised enhancers, promoters of development-associated genes, and around transcription factor binding sites. 99,107,167,168nvolvement of hmC in epigenetic regulation is already indicated by the differential accumulation of hmC and fC/caC across the genome.For example, in mouse ESC, 490% hmCG positions do not correspond to those of fC and caC, and only B19% fC/caC sites overlap with hmC. 169Moreover, fC enrichment is stronger in active enhancers which usually show lower DNA methylation as compared to hmC-containing loci. 170herefore, hmC may mark less active chromatin loci than caC and fC (the suggested order is caC 4 fC 4 hmC). 171The positioning of hmC at exon-intron boundaries and its abundance in constitutive exons compared to alternatively spliced exons raised an idea of its role in alternative splicing. 117It was demonstrated that the major genome insulator and multifunctional regulator methyl-sensitive protein CTCF binds to intragenic hmC in alternative exons and is associated with alternative exon inclusion. 172Interestingly, hmC levels were found to be higher on the sense strand. 173,174Indeed, the association of gene body hmC with gene expression is widely attested, although its influence is bi-directional: for example, it promotes gene expression in ESC and many other cell types, but negatively influences transcription in neuronal progenitors. 173,175,176In contrast, highly expressed genes usually show hmC depletion at promoters in many cell types. 173he role of hmC and other ox-mC as independent epigenetic regulators is indirectly supported by the existence of partially different sets of their binders, which represent mainly transcriptional regulators. 86,142,143For example, MeCP2 can bind mC and hmC, which alters chromatin structure and facilitates gene expression in neural cells. 144,145In contrast, MBD1 can specifically bind to mC but not to ox-mCs. 146mC-bound MBD1 can recruit the histone methyltransferase SETDB1 and thus, promotes H3K9 methylation and gene repression. 146In addition, hmC distribution at gene bodies modulates the deposition of H3K36me3 marks which indicate active transcription 177 thereby collectively altering the configuration of chromatin for turning on or off gene activity in heterochromatic and/or euchromatic regions. 176

hmC in cell differentiation and reprogramming
Some degree of controversy exists in defining the role of TET proteins and hmC in stem cell pluripotency.9][180][181] However, the TET1/TET2 deficiency may delay transcriptional changes during differentiation, or induce the commitment of ESCs towards specific lineages. 97,180Moreover, the triple-mutant (TET1, TET2, and TET3) mESCs were found to be viable and pluripotent though showed depletion of hmC, hypermethylation of promoters and impaired differentiation potential, 182 supporting the essential role of the active DNA demethylation in differentiation.
It seems that a major role of TET-mediated oxidation in ESC is the maintenance of the demethylated state of the regulatory regions, particularly enhancers and promoters, 183 which are the primary targets in reprogramming.In ESC, the high-level hmC marks silent bivalent promoters, which contain both repressive and activating histone marks H3K27me3 and H3K4me3, respectively, and typically are activated on differentiation. 53t seems that TET and DNMT proteins compete with each other to regulate the methylation status of enhancers and promoters; 184,185 TET1 can exclude DNMT3A1 from promoters where TET1 preferentially binds in ESC. 186Moreover, in tight cooperation with transcription factors, the TET proteins mediate the stepwise oxidation of mC and regulate gene expression of developmental genes. 187For example, TET1 and TET2 interact with the pluripotency factor Nanog to enhance reprogramming efficiency, 188 or with various heterochromatin-associated proteins, such as HDAC histone deacetylases, to remodel chromatin and regulate transcription. 97,189ther studies demonstrate that TET proteins and ox-mCs are important for telomere length maintenance, which is important in sustaining pluripotency, self-renewal and genomic stability.For example, the double TET1/TET2 mutant results in elevated amounts of DNMT3b and the mC/hmC ratio, which leads to telomere shortening. 183,190nterestingly, mutational engineering of the TET2 activity toward the preferential production of hmC permitted to dissect the biological significance of hmC versus the other ox-mCs during reprogramming of induced pluripotent cells (iPSC). 191he increased generation of hmC itself appeared insufficient to drive the epigenetic changes, and the formation of fC/caC was necessary to promote rapid changes at reprogramming loci.In addition, during differentiation of pluripotent stem cell into neurons, glial cells and hepatocytes, the increased caC amounts were detected in promoters of cell-specific genes, thus, it is thought that caC-pathway might exists as a general epigenetic rewiring mechanism for establishing cell lineages. 192,193n mouse zygotic reprogramming, most of hmC is generated from de novo mC, indicating a specific hmC role in the early embryo development. 194Generally, the epigenetic role of hmC becomes apparent in the context of cell differentiation.During erythropoiesis, a decrease in global hmC level was observed, but hmC at certain genomic loci remains highly enriched despite multiple rounds of DNA replication. 195In asymmetrically dividing stem cells, higher hmC has been shown to identify immortal DNA strand chromosomes. 196In neuronal cells, hmC is depleted from TSS, regardless of gene expression level or CpG content. 145,174It was suggested that in early development, brain-specific enhancers initially become hydroxymethylated, but later, both hydroxymethylation and demethylation stages at enhancers are differentially regulated in a celltype specific manner.Overall, these data point to the requirement of properly functioning TET-mediated DNA demethylation for differentiation, reprogramming and cell fate decisions.

hmC as a diagnostic and prognostic marker
The balance between proper DNA methylation and demethylation maintains the DNA methylation landscape of healthy cells, which becomes heavily disturbed in cancers.It has been well documented that global levels of hmC are significantly reduced in many cancer types proposing the potential clinical relevance of hmC.One might think that in such intensively dividing cells as cancer cells the decreased hmC might be related to the hmC removal through passive DNA demethylation.However, in the healthy developing cells, hmC at certain genomic loci remains highly enriched despite multiple rounds of DNA replication and decreased global hmC levels. 195In many cancers genetic alterations of TETs are absent and thus the basis for global loss of hmC is not always clear.Some tumors display increased activity of isocitrate dehydrogenase (due to gain-of-function mutations in IDH1 or IDH2), which leads to production of 2HG, an oncometabolite and inhibitor of the TET function. 36,37n addition, the decreased glucose/glutamine levels in nutrientdeprived conditions of a tumor may impact the intracellular 2OG/succinate ratio which downregulates TET and evoke hmC loss even in the absence of the genetic TET and IDH mutations. 197Other factors, such as the shortage of Fe(II), ascorbate, or oxygen also may attenuate the activity of TETs (see above).The supplementation of cancer cells with ascorbic acid may restore hmC levels and inhibit cancer growth and migration. 198,199Generally, the mechanisms for hmC loss in cancers are likely more global than just defined by abberant functioning of TET or IDH enzymes.Interestingly, it was shown that simple re-installation of hmC and TET enzymes may reduce the tumor growth and aggressiveness. 200The correlation between decreased hmC and tumor aggressiveness has been suggested by multiple studies 10,201 raising the idea that hmC level could be used to predict the metastasis, recurrence, and prognosis.3][204] Besides widely attested links between hmC and cancer, the clinical relevance of tissuespecific hmC monitoring was demonstrated for other complex diseases, for example in non-invasive diagnostics of fetal Down syndrome, Alzheimer disease, and Type 2 diabetes. 105,205,206n addition to its value as a disease marker, hmC may find some therapeutic value.As mentioned above, cells deploy a twotier safeguarding system to avoid sporadic incorporation of the modified nucleosides in DNA via the nucleotide salvage pathway (Fig. 3).However, this system is more promiscuous in fast proliferating cancer cells, which confers them more susceptible to treatment with the modified cytidines and thus opens new ways for potential therapeutic options in controling cancer. 17,18

Conclusions
From biological standpoint, the generated hmC presents an epigenetic state of cytosine in which the proper 5-methyl mark is no longer present, but an exocyclic group on the C5-position is still present precluding its remethylation.How is this chemically stable epigenetic state (functionally demethylated) of cytosine resolved?Notably, hmC is not recognized by some of the methyl-CpG binding domain proteins, such as transcriptional repressors MBD1 and MBD2 50,207 suggesting that conversion of mC to hmC may attenuate the gene silencing effect of mC.In the presence of maintenance MTases such as Dnmt1, hmC patterns would be partially replicated into mC on the daughter strand (passively reduced).It is reasonable to assume that, although being produced from a primary epigenetic mark, mC, hmC may play its unique regulatory role as a secondary This journal is © The Royal Society of Chemistry 2024 epigenetic mark; in many cases though it may behave as an attenuated (partially muted, demoted) methyl group in DNA.
Several other factors to consider are as follows.The distribution of hmC is not uniform across the genome and is different from that of the other modified cytosines (see above).The abundance of hmC is lower than that of mC, but in certain cell types and genetic loci, is quite substantial to persist as a causative epigenetic mark.A tight safeguarding system precludes sporadic incorporation of hmC nucleosides in mammalian DNA via NSP (Fig. 3).All this indicates that the occurrence or absence of hmC itself (rather than loss of mC) in certain loci and certain times has functional importance.There are several well-established examples of direct hmC interactions with the cellular machinery leading to defined consequences for the cell.Therefore, it seems fair to conclude that besides its wellestablished role as an intermediate in the demethylation pathway, hmC is exploited to play additional epigenetic roles as a biological marker and/or controller of biological processes in the mammalian cell.

Fig. 1
Fig. 1 Base pairing and epigenetic modifications in DNA of eukaryotic organisms.Biological cytosine-5 modifications (xC, x = methyl, hydroxymethyl, formyl of carboxyl) point to the major groove of the B-helical DNA and do not interfere with G:C base pairing interactions (left).Structure of base-J (5-(b-D-glucosyl)oxymethyluracil) present in DNA of flagellated protozoa (right).

Fig. 2
Fig. 2 Chemical strategies for biological methylation and demethylation of DNA nucleobases at O, N and C atoms.(a), methylation of cytosine by C5-MTases via an S N 2 transfer of a sulfonium-bound methyl group onto a covalently activated target cytosine residue; (b) oxidation of mC in DNA by mammalian TET dioxygenases yields chemically stable hmC, which is further oxidized to fC or caC to be removed by dedicated glycosylases; (c) a product of DNA alkylation damage, O6-methylguanine, is reverted to guanine directly via an S N 2 transfer of the O6-methyl group onto a cysteine residue of a methylguanine DNA methyltransferase (MGMT) protein; (d) oxidation of representative N-alkylated nucleobases (N3-methylcytosine, N6-methyladenine) in DNA by the AlkB family of dioxygenases yields corresponding N-hydroxymethyl derivatives, which then undergo spontaneous hydrolytic release of formaldehyde to directly generate the unmodified base.

4. 1 .
Production of hmC in DNA in vitroProduction of DNA substrates containing hmC residues is required for studies of chemo-enzymatic transformations of

Fig. 5
Fig. 5 Formation and transformations of hmC in mammalian DNA.Cytosine (C) is converted to 5-methylcytosine (mC) by action of endogenous DNA MTases of the DNMT family (green pathway).Several mechanisms for DNA demethylation, in which 5-methylcytosine (mC) is converted back to C, have been proposed.Red arrows represent oxidation-based pathways performed by TET dioxygenases yielding 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC) and 5-carboxycytosine (caC), and may also produce minor amounts of 5-hydroxymethyluracil (hmU).Blue arrow shows deamination-based pathways where hmC is deaminated to hmU by AID/APOBEC or other deaminases.Grey arrows denote base excision repair (BER) pathways initiated by TDG, and SMUG1 glycosylases, and DNA replication.Black dashed arrows denote the hydroxymethylation and dehydroxymethylation reactions performed by cytosine-5 methyltransferases in vitro and putative deformylation and decarboxylation of fC and caC, respectively, yielding unmodified C. BER, base excision repair; NER, nucleotide excision repair; NSP, nucleotide salvage pathway.

Fig. 6
Fig. 6 Removal of 5-hydroxymethyl or 5-carboxyl groups is facilitated by transient nucleophilic addition at C6 of the pyrimidine ring.(a) DNA cytosine-5 methyl-transferase directed conversion of hmC to C residues in DNA.(b) Light-induced dehydroxymethylation of cytosine. 61(c) and (d) Nucleophile-promoted deformylation of fC and decarboxylation of caC, respectively.