Next-generation DNA damage sequencing

Cellular DNA is constantly chemically altered by exogenous and endogenous agents. As all processes of life depend on the transmission of the genetic information, multiple biological processes exist to ensure genome integrity. Chemically damaged DNA has been linked to cancer and aging, therefore it is of great interest to map DNA damage formation and repair to elucidate the distribution of damage on a genomewide scale. While the low abundance and inability to enzymatically amplify DNA damage are obstacles to genome-wide sequencing, new developments in the last few years have enabled high-resolution mapping of damaged bases. Recently, a number of DNA damage sequencing library construction strategies coupled to new data analysis pipelines allowed the mapping of specific DNA damage formation and repair at high and single nucleotide resolution. Strikingly, these advancements revealed that the distribution of DNA damage is heavily influenced by chromatin states and the binding of transcription factors. In the last seven years, these novel approaches have revealed new genomic maps of DNA damage distribution in a variety of organisms as generated by diverse chemical and physical DNA insults; oxidative stress, chemotherapeutic drugs, environmental pollutants, and sun exposure. Preferred sequences for damage formation and repair have been elucidated, thus making it possible to identify persistent weak spots in the genome as locations predicted to be vulnerable for mutation. As such, sequencing DNA damage will have an immense impact on our ability to elucidate mechanisms of disease initiation, and to evaluate and predict the efficacy of chemotherapeutic drugs.


Introduction
All physiological processes tied to cellular replication rely on the chemical integrity of DNA, and its damage is associated with a range of adverse outcomes such as accelerated aging and cancer. [1][2][3] Endogenous cellular processes and the effects of exogenous agents, including UV irradiation and chemicals

Cécile Mingard
Cécile Mingard received her Master's degree in Toxicology from the University of Basel in 2016. Under the supervision of Prof. Stephan Krähenbühl, she studied mechanisms of hepatotoxicity induced by tyrosine kinase inhibitors at the University Hospital of Basel. She then moved to Zürich to start a PhD at the ETH in Prof. Shana J. Sturla's lab, where she is studying genome-wide patterns of DNA damage distribution associated with mutagenesis.

Junzhou Wu
Junzhou Wu obtained his PhD degree (2019) from ETH Zurich under the supervision of Prof. Shana J. Sturla, concerning the development of DNA damage sequencing methods. He then received an SNF early postdoc. mobility fellowship and joined Prof. Peter Dedon's lab as a postdoctoral fellow at MIT, where he is currently developing a novel RNA sequencing technology to interrogate the epitranscriptome in human, bacterial and viral diseases.
from environmental, dietary, drug, and occupational exposures, are constantly altering DNA (Fig. 1). 4 These reactions create up to 70 000 distinct damage events in a cell each day, distorting the structure of the DNA and, if left unrepaired, potentially stalling replication or impacting gene expression. 4 Translesion DNA synthesis (TLS), involving specialized DNA polymerases that can bypass DNA damage, counters the cytotoxic effects of DNA replication stalling and acts in concert with DNA repair functions. In cancer therapy with DNA-binding agents, which target and stall replication, this process can contribute to drug resistance. In normal cells, TLS may be protective, but even if cytotoxicity may be avoided in the short term, DNA damage bypass can be highly mutagenic and contribute to cancer and other adverse outcomes in the long run. The biological and toxicological consequences of DNA damage, repair, and bypass depend fundamentally on not only their structure and abundance, but also their distribution in the genome, including the interplay of chemical modification and higher chromatin structures in gene expression and mutagenesis.
High-throughput sequencing has recently enabled the whole genome sequencing of numerous cancer genomes. From these large datasets, mutational signatures that describe characteristic imprints left by mutational processes, including DNA damage and repair, have been deciphered in cancer genomes. 5 Because the cellular impacts of DNA damage are also the basis of the most common cancer therapy drugs, understanding the genomic distribution of DNA modification induced by anticancer drugs is a potential strategy improve the safety and efficacy of cancer therapy. While there are many techniques to study outcomes of DNA damage (i.e. mutation, cytotoxicity), there is a lag in methods available to map how DNA is initially modified, therefore limiting the ability to predict adverse or therapeutic outcomes on the basis of early measurable markers.
Defining the relationship between the distribution of chemical forms of DNA damage on a genome-wide scale with adverse or therapeutic biological outcomes is a tough nut to crack. Early models of DNA damage and mutagenesis were built around a simple direct relationship between damage formation and the acquisition of a mutation, but there is a complex interplay between genetic and epigenetic landscapes factoring into cancer evolution and progression. Indeed, cancer is driven by natural selection enabled by the evolution of mutations conferring a growth advantage. 6 However, within the large mutational landscape, only some mutations are driver mutations that confer a selective growth advantage; whereas many of the other mutations are passenger mutations acquired by a cell with driver mutations. 7,8 There is controversy concerning whether most mutations in cancer genomes arise from DNA replication errors or other intrinsic events (the bad luck hypothesis), which are hard to prevent 9 or from extrinsic factors that, on the contrary, could be avoided. 10 Indeed mutational signatures are at the core of extensive ongoing work to uncover the etiology of individual cancers, but strategies for tracking analogous precursor DNA damage signatures in the genome lag behind gene sequencing and epigenetics because of their inherent chemical complexity and variation. 11 There are many well-established strategies for DNA damage quantification integrated over the whole genome as well as strategies for identifying the sequence specific locations of damage in isolated genes, but typically not both. For example, mass spectrometry 12 and 32 P-postlabelling 13 allow highsensitivity quantification of total DNA damage in biological samples, but do not provide sequence or location information.
In contrast, ligation-mediated polymerase chain reaction (LM-PCR) is based on the principle that DNA polymerases cannot synthesize DNA past certain types of damage. Thus, LM-PCR can indicate the exact sequence and position of DNA damage on the basis of PCR termination sites; however, this method is not damage specific, meaning the chemical nature of the damage may be unknown. These strategies have various advantages, but do not allow one to relate the chemistry of damage formation with biological changes in particular genes or in the genome.
In addition to the conceptual challenge of examining the complex relationship between genome sequences, structure, regulation and potential patterns of DNA damage processes, there are two major technical challenges towards obtaining the necessary robust damage sequence data to evaluate these relationships. The first is that DNA damage events are rare on a genome-wide scale. Typically 0.1-100 endogenous DNA damage events occur per 10 6 nucleotides, 14 and those arising from discreet interactions with particular chemicals can be of even lower magnitude such as one DNA adduct per 10 11 nucleotides. 15 The second challenge is that chemical damage is not typically read by DNA polymerases, and they may either stall or insert an incorrect base or combination of incorrect bases opposite the altered site. As result, the chemical identity of DNA damage is generally lost in the process of standard DNA sequencing. Nonetheless, there are several very recent examples discussed in this review of exciting and innovative approaches to address these technical challenges and yield the first insights on DNA damage distribution at the genome-wide level.
We provide here a comprehensive review of the progress in sequence-specific mapping of DNA damage. Emerging methods  described in this review have addressed long-standing obstacles facing damage sequencing by including a combination of damage enrichment, damage specific recognition, and functional marking of the damage position with a sequencingcompatible adaptor (Fig. 2). A few reviews highlight specific DNA damage sequencing methods; however, no reviews exist that cover all classes of DNA damage and discuss the importance of the biological findings. [16][17][18][19][20] For each major class of DNA damage, we first provide an overview of the occurrence and biological relevance. Next, we describe each of the novel strategies that have enabled successful DNA damage sequencing of these specific DNA damage classes (Table 1). Finally, we compare the opportunities and challenges for each of these methods, focusing on the early glimpses of biological insight enabled by each unique method. The rapid improvement and adoption of these approaches is expected to spur advances in the study and prevention of aging, cancer, and disease related to genomic instability.

Occurrence and relevance
Guanine has the lowest redox potential of the DNA bases, and thus can be easily oxidized to form 8-oxo-7,8-dihydrodeoxyguanine (8-oxodG) via single-electron transfer mediated by reactive oxygen species (ROS) such as superoxide, hydrogen peroxide, and hydroxyl radicals. 21 ROS are generated by normal metabolic processes or as a consequence of exposure to environmental pro-oxidants, such as components of cigarette smoke, alcohol, ionizing and UV radiation, pesticides, and ozone. 22 The production and scavenging of ROS are highly coordinated by cellular antioxidant networks essential for cell signaling and homeostasis. 23 A mild increase of ROS production in cells and organisms has a variety of anti-aging and longevity-extending hormesis effects by stimulating endogenous defense mechanisms and stress resistance. 24 However, ROS is a double-edged sword. Under typical physiological ROS levels, 8-oxodG is generated at a frequency of at least several hundred damage events per human cell per day; this rate is further increased under conditions of oxidative overload. 25 Oxidative stress contributes to cancer, atherosclerosis, diabetes, aging, and pathologies of the central nervous system, 26-28 making 8-oxodG an indicator of oxidative stress and a cellular biomarker of pathophysiological processes.

8-OxodG repair, mutagenicity, and toxicity
Efficient search and removal of 8-oxodG is performed by the base excision repair (BER) pathway to maintain cell integrity. Three different enzymes cooperate to handle 8-oxodG in the cell, involving 8-oxo-dGTP diphosphohydrolase, 8-oxoguanine DNA glycosylase and adenine DNA glycosylase (Table 2). [29][30][31] By monitoring the nucleotide pool, 8-oxo-dGTP diphosphohydrolase prevents the incorporation of 8-oxodG into nascent DNA. 8-Oxoguanine glycosylase acts on 8-oxodG within double stranded DNA (dsDNA), directly removing it when paired with cytosine. Additionally, persistent 8-oxodG may pair with adenine, promoted by Hoogsteen bonding during replication, 32 in which case, adenine DNA glycosylase can excise the incorporated adenine. When this defense system is overwhelmed and 8-oxodG persists during replication, it is prone to G -T transversion mutation. Indeed, this type of mutation is prevalent in the MTH1/OGG1/MUTYH triple knockout mouse model (TOY-KO) and MUTYH-associated polyposis (MAP) syndrome colorectal cancer. 33,34 Aside from pro-mutagenic effects, 8-oxodG is also a source of toxicity when transcribed. 8-OxodG can significantly arrest transcription by direct structural interference of transcription components or the repair intermediate of 8-oxodG/OGG1. 35 Furthermore, when 8-oxodG is located on the transcribed DNA strand, other consequences like erroneous bypass of the lesion by the transcribing RNA polymerase may occur. Such transcriptional mutagenesis often results in a specific C -A mutation in the RNA transcript and aberrant protein production, 36 which may play a role in protein aggregation and the pathogenesis of neurodegenerative diseases, such as Alzheimer's and Parkinson's disease. 37

8-OxodG and OGG1 modulate gene expression
Despite the toxicological implications of 8-oxodG, mounting evidence supports that 8-oxodG may be a cellular friend by facilitating gene activation in response to oxidative stress, countering conventional models of DNA damage effects. 8-OxodG induced gene expression involves several pathways, including direct interactions of OGG1 with transcription factors (TFs) or chromatin remodelers and allosteric transition of 8-oxodG containing G-quadruplex.
When 8-oxodG is located at promoter regions, OGG1 is recruited and enhances the binding of several TFs, including hypoxia-inducible factor 1a (HIF-1a), 38 signal transducer and activator of transcription 1 (STAT1), 39 and nuclear factor kappalight-chain-enhancer of activated B cells (NF-kB). 40,41 Reduction in OGG1 expression in rat pulmonary arterial endothelial cells strongly reduced the binding of the TF HIF-1a to the vascular endothelial growth factor gene (VEGF) promoter and reduced VEGF expression. 38 OGG1 both coactivates STAT1 and induces the transcriptional activation of pro-inflammatory mediators after lipopolysaccharide (LPS) stimulation. 39 In addition, the binding of OGG1 to 8-oxodG in promoter regions enhanced NF-kB/RelA binding to cis-regulatory elements and facilitated the rapid recruitment of specificity protein 1 (Sp1), transcription initiation factor II-D and phosphorylated-RNA polymerase II (Pol II), resulting in prompt gene expression upon oxidative exposure. 40,41 Thus, interactions between 8-oxodG, OGG1 and relevant TFs lead to the expression of oxidative stress-induced genes.
In addition to the interactions between OGG1 and TFs, 8-oxodG in gene promoter regions regulates transcription via G-quadruplex (G4) folding. Indeed, potential G-quadruplex sequences are widely distributed in the human genome, with high enrichment in gene promoters. 42,43 The formation of 8-oxodG in G-rich sequences can either impede G4-protein interactions or stall repair proteins at G4 structures which further recruit TFs. For example, the VEGF promoter contains three G-rich Sp1 binding sites, which is critical for regulating mRNA synthesis. 44,45 When 8-oxodG accumulates due to hypoxia, Sp1 binding decreases in these G-rich elements, resulting in the up-regulation of VEGF transcription. 38,46 These observations suggest that G4 formation activates transcription when 8-oxodG is present. Recently, Burrows et al. reported that plasmids containing 8-oxodG in G4 promoter regions produced more target protein than the same plasmid without 8-oxodG. 47 The data suggest that 8-oxodG in G-rich regions of the VEGF promoter were removed by OGG1, generating abasic sites (AP sites) and destabilizing the duplex structure. This loss of stability led to the formation of a new G4 structure with an abasic-site-containing loop, which facilitated the binding and stalling of APE1 to the AP site, further stimulating TF binding and activating transcription. [47][48][49][50] The emerging role of 8-oxodG as a transcriptional regulator highlights its biological and health relevance beyond classic toxicity aspects of DNA damage. However, genome-wide associations of 8-oxodG with gene expression and further with pathological processes are not understood due to the lack of precise location information of 8-oxodG in the genome. Thus, extensive efforts have been made to locate 8-oxodG with several recent high-throughput sequencing strategies providing advanced tools to understand how 8-oxodG is distributed and can modulate gene expression on a genome-wide level.

Genome-wide mapping of 8-oxodG
Initial strategies to map 8-oxodG involved enrichment by antibody pull-down of fragmented sequences containing 8-oxodG (i.e. analogous to ChIP-sequencing), where antibodies are used to selectively enrich for DNA sequences bound by a particular protein to map global protein-binding sites in cells. Following antibody binding or enrichment, 8-oxodG location could be obtained using several methods including microscopy, Sanger sequencing, high-throughput microarray assays and more recently, next-generation sequencing.
The first genome-wide map of 8-oxodG was constructed nearly 14 years ago using an antibody enrichment strategy, resulting in a map of 8-oxodG in human metaphase chromosomes at a 1000 kb resolution, revealing its heterogeneous distribution in the genome. 51 Specifically, immunofluorescence revealed that 8-oxodG was unevenly distributed and located primarily within regions with a high frequency of recombination and single nucleotide polymorphisms (SNPs) in cultured human lymphocytes. 51 However, the relatively low resolution of optical microscopy limited the resolution of the 8-oxodG map. By Sanger sequencing, a map of 8-oxodG at a 100 base pair resolution was achieved in mouse renal cortical samples, 52 allowing for 8-oxodG analysis at the gene-level. These data suggested that 8-oxodG is preferentially enriched in highly expressed genes, presenting the first clue for the potential impact of 8-oxodG on gene expression. 52 However, due to the limited throughput of Sanger sequencing, the resulting map only revealed several hundred 8-oxodG sites in the mouse genome. More recently, two microarray analyses allowed for a higher throughput genome-wide mapping of 8-oxodG in kidney tissues from rats and mice (244 000 probes for rat genome and 720 000 probes for mouse genome). 53 Both studies revealed that 8-oxodG was preferentially located in gene deserts, devoid of protein-coding genes, and correlated with lamina-associated domains. 53,54 In the last 5 years, next-generation sequencing technologies have been combined with affinity enrichment strategies to achieve genome-wide high-throughput mapping of 8-oxodG. As one example, OxiDIP-seq used an 8-oxodG antibody for immunoprecipitation followed by high-throughput sequencing in human non-tumorigenic epithelial breast cells and mouse embryonic fibroblasts. 55,56 The sequencing revealed that 8-oxodG sites accumulated in the transcribed regions of long genes and at DNA replication origins, overlapping with gH2AX ChIP-seq signals and double-strand breaks. Furthermore, a strong reduction of 8-oxodG was observed within promoter regions with high GC content in quiescent (G0) cells without DNA replication. As another example, an OGG1 K249Q mutant lacking glycosylase activity was used to trap a stable complex of OGG1 with the sequences containing 8-oxodG (enTRAP-seq). 57 Following affinity precipitation and sequencing, enTRAP-seq revealed enrichment of 8-oxodG in transcriptionally active chromatin regions and regulatory elements such as promoters, 5 0 UTRs, and CpG islands in the mouse embryonic fibroblast genome. While 8-oxodG-specific binding proteins are useful tools for 8-oxodG enrichment and sequencing, further studies comparing the binding specificity of antibody clones and glycosylase mutants will help to understand apparently conflicting results.
Besides protein-based enrichment, two chemical enrichment methods have also been developed for high-throughput sequencing of 8-oxodG. The first approach was based on the selective oxidation of 8-oxodG to form an electrophilic intermediate that can be specifically recognized and labelled with amine-terminated biotin for affinity enrichment (OG-seq). 58 In mouse embryonic fibroblast cells, 8-oxodG levels were elevated in promoters, 5 0 -UTRs, 3 0 -UTRs and G4 structures in comparison with the baseline random distribution throughout the genome. The second chemical-based enrichment approach was based on the reaction between an AP site released from 8-oxodG and an aldehyde reactive probe (AP-seq), enabling both the specific recognition and enrichment of 8-oxodG sequences. 59 In HepG2 cells, a reduction of 8-oxodG was found in functional elements such as promoters, exons, TF binding sites, and termination sites in a seemingly GC contentdependent manner. However, AP-seq has been used to sequence other aldehyde-containing nucleotides, to be discussed in the abasic site section. Depending on the biological questions addressed, a potential drawback of these proteinand chemical-based enrichment methods is lack of nucleotide resolution, preventing determination of sequence-specific 8-oxodG occurrence and distribution at the resolution level, for example of mutational signatures.
A nucleotide resolution map of 8-oxodG is of interest to better understand sequence context effects of 8-oxodG formation and repair, and origins of mutational signatures. Recently, we reported a nucleotide resolution sequencing method, click-code-seq, to map 8-oxodG. 60 In this approach, 8-oxodG sites are specifically recognized and removed by an 8-oxodG glycosylase, generating a gap with a free 3 0 -hydroxyl at the damage site. Next, a synthetic O-3 0 -propargyl modified nucleotide (prop-dGTP) is incorporated into the resulting gap by DNA polymerase, giving rise to a 3 0 -alkynyl-modified end.
The 3 0 -alkynyl DNA is then ligated to a 5 0 -azido-modified code sequence via a copper(I)-catalyzed click reaction, resulting in triazole-linked DNA that can be amplified by DNA polymerases. 61 Via this process, 8-oxodG sites are stably labelled with a code sequence that serves as a tag for affinity enrichment, an adaptor for PCR amplification, and a sequencing-compatible marker of the damage locations. 60 Using click-code-seq, a single-base resolution whole genome map of DNA oxidation was obtained for S. cerevisiae. 60 On a genome level, the first G in a 5 0 -GG-3 0 dimer was more frequently oxidized than in other contexts. By analyzing 8-oxodG within discrete genomic features, especially transcription start sites (TSS), transcription terminator sites, DNase I hypersensitive sites, and autonomously replicating sequences, less 8-oxodG could be observed relative to the average coverage over the entire genome. On the other hand, telomeres, nucleosomes, and positions of low RNA Pol II occupancy had higher 8-oxodG frequency. Meanwhile, nucleosomes with post-translational modifications that accelerate nucleosome unwrapping had less 8-oxodG compared to nucleosomes without these modifications. These data suggest that chromatin accessibility may shape 8-oxodG distribution, with an accumulation of 8-oxodG in regions of reduced chromatin accessibility where repair proteins cannot penetrate.
All of the genome-wide sequencing methods for 8-oxodG reported to date rely on damaged sample enrichment (Fig. 3). However, 8-oxodG can also be sequenced directly at nucleotide resolution without enrichment using third generation sequencing technologies, such as single-molecule real-time sequencing 62 and nanopore sequencing. [63][64][65] Finally, a number of methods have the potential to detect 8-oxodG at nucleotide resolution, but were designed for one gene or one position, such as DNA hybridization probes containing a non-natural nucleoside specific for 8-oxodG, 66 LM-PCR, 67 third base pair based amplification, 68 BER-mediated deletion mutation 69 and Hoogsteen base pairing-mediated PCRsequencing. 70 These technologies are faster and cheaper than whole genome sequencing and may be used as diagnostic tools to detect 8-oxodG hotspots within the genome.
From the 10 currently available genomic maps of 8-oxodG in biological contexts ranging from yeast and rodents to cultured human cells, and with resolutions varying from thousands of kb to a single nucleotide, a consistently emerging observation is that there is a non-uniform genomic distribution of 8-oxodG. In particular, DNA oxidation depends on the heterogeneous structure of a chromosome, consisting of protein-bound regions, open regulatory regions, and actively transcribed genes. However, it is too early to make strong biological conclusions from these data due to the differing species, conditions, library preparation protocols, and processing. Further methodological improvements are needed to understand, eliminate, or correct for embedded biases, as well as to control for artefactual DNA oxidation during sample preparation, a notorious problem in DNA oxidation analysis. Potential biases may arise from the binding specificity of different antibody clones/glycosylases, reaction selectivity of chemical probes, adaptor ligation, and PCR amplification. [71][72][73] Meanwhile, artefactual 8-oxodG may arise from genomic DNA extraction and DNA shearing, leading to false positive reads during sequencing. 74 Additionally, further work is anticipated to improve data reliability and the sensitivity of 8-oxodG sequencing methods. In the future, systematic sequencing studies of DNA oxidation are expected with a complement of robust methods to reveal a genomic basis of cellular oxidative stress responses.

Relevance and repair
Cisplatin (cis-diamminedichloroplatinum II, [Pt(NH 3 ) 2 Cl 2 ]), and related platinum compounds such as oxaliplatin (C 8 H 14 N 2 O 4 Pt), are drugs used to treat a variety of solid tumors including breast, ovarian, head and neck, testicular, bladder, lung, brain, and esophageal cancers. [75][76][77] Platinum-based drugs bind to DNA, forming interstrand or intrastrand DNA crosslinks, especially Pt-d(GpG), followed by Pt-d(ApG) and Pt-d(GpNpG). 75,78 Once platinated, replicative DNA synthesis is stalled and, if left unrepaired, can lead to DNA doublestrand breaks and cell death. 79 Cisplatin-DNA crosslinks are recognized and removed by nucleotide excision repair (NER) machinery, and upregulation of NER function is often observed in resistant tumors, along with increased cisplatin glutathione conjugation, increased cellular efflux, or proficient bypass of Pt-DNA damage by translesion DNA synthesis. 80,81 Because cisplatin resistance deteriorates the prognosis for cancer patient survival, recent DNA damage repair mapping techniques aim to define the genomic distribution of cisplatin crosslinks and the influence of altered NER function in cancer cells.

Genome-wide mapping of cisplatin-DNA damage
Two methods, damage-seq and cisplatin-seq, have been used to map cisplatin damage in human cells at single nucleotide resolution. Due to the low abundance of damage, both methods require an initial enrichment step. Damage-seq involves using an antibody specific either for cisplatin or oxaliplatin damage, 82 and cisplatin-seq involves using the HMGB1 domain A protein, which binds to distorted helices induced by cisplatin-DNA damage (Fig. 4). 83 In developing the cisplatin-seq approach, different constructs of HMGB1 protein were tested for their affinity towards cisplatin-induced distorted DNA structures. HMGB1 domain A was found to be the most specific for binding cisplatin-induced damage. Interestingly, in the damage-seq study, enrichment for cis-GG and cis-AG damage products, which was previously claimed to be impossible using any commercially available antibody for Pt-DNA damage, 83 were successful. In both studies, only intrastrand crosslinks could be mapped due to the specificity of these antibodies and proteins, but this selectivity is biologically relevant since intrastrand crosslinks are the most potent in killing cancer cells.
Following enrichment, both methods then take advantage of the fact that cisplatin-DNA damage stalls polymerases to mark the specific location of damage during PCR (Fig. 4). Specifically, a biotinylated primer is used in damage seq for amplification of the enriched DNA from human lymphocytes with the high-fidelity Q5 polymerase. Q5 polymerase stalls upon encountering Pt-DNA damage such that DNA synthesis termination sites mark the site of the damage. Next, different sized DNA fragments yielded from the biotinylated primer were purified using streptavidin beads. Finally, a second adapter is ligated, and the resulting DNA library is sequenced by nextgeneration sequencing. Alignment of the sequencing reads with a human reference genome then allows for the identification of damage sites. Cisplatin-seq follows a very similar protocol as damage-seq for DNA damage location site marking. Damage-seq and cisplatin-seq methods reported the first genome-wide maps of cisplatin damage distribution in the human genome at single nucleotide resolution.
In addition to formation of DNA damage, DNA repair is expected to have a major role in shaping DNA damage distribution in the genome. Therefore, Sancar and co-workers examined damage distribution with damage-seq, but also used another method called XR-seq to map NER repair events in order to relate damage formation and repair patterns on a genome-wide level. 82 The XR-seq method was previously described to map UV damage, but was adapted to map NER repair of cisplatin (Fig. 4). 84 Therefore, the methodological details of XR-seq will be discussed below in the UV section. XR-seq requires DNA damage reversion for proper strand amplification containing cisplatin damage. Reversion was achieved by using sodium cyanide which can remove platinated DNA damage. While XR-seq was already a known technique, the strength of this study resided in the establishment of the damage-seq method which involved mapping any cisplatin damage present in genomic DNA and not only repair events. 82

Genome-wide profiles of cisplatin damage in human cells
Damage-seq and cisplatin-seq studies led to the first genomic maps of cisplatin-induced DNA damage at single nucleotide resolution, opening a new possibility towards understanding anti-cancer agent efficacy and strategies to overcome resistance in cancer patients on the basis of genome-wide damage profiles. Both approaches mapped cisplatin damage in the human genome, though with different cell lines and cisplatin doses. Nonetheless, in both cases, G-G dinucleotides were highly enriched at damage sites in accordance with previous observations that the major cisplatin intrastrand crosslink is Pt-d(GpG), 75 and to a lesser extent, A-G dinucleotides were enriched at the damage sites, confirming Pt-d(ApG) to be the second most prevalent intrastrand crosslink. This latter observation refutes that the pull-down is specific to only one type of intrastrand crosslink. Finally, in both studies an increase in damage at the TSS was also noted.
An important strength of the damage-seq study 82 was the coupling of damage formation data derived from damageseq with damage-specific NER events derived from XR-seq. The coupling of these types of data permitted several key findings. First, sequence context analysis revealed a preference for cisplatin damage formation at G-G dinucleotides downstream of A, but in damage repair data the preference switched to a T upstream and a G downstream, meaning that the first was more prevalently formed but the latter was more resistant to repair. These sequence context findings are in conflict with previous biochemical studies testing DNA damage recognition for NER where the preference was for an A both up-and downstream. 85 Therefore, further studies are needed to determine whether the differences are due to the cellular context or due to biases introduced in the library preparation. Second, comparing the damage distribution maps over time as well as the XR-seq maps indicated that NER repair is the main driver in shaping the distribution of platinum-DNA damage. In particular, overall damage formation was fairly uniform in genomic regions with the exception of only a slightly higher damage abundance at the TSS and a slightly lower one at the TES. Interestingly, less damage was found on the transcribed strand (TS) which is consistent with a key repair process dictating the distribution of damage is TC-NER. 86 Damage-seq and XR-seq were compared to nucleosome occupancy data for the same lymphocyte GM12878 cell line annotated in the ENCODE database. This analysis suggested how chromatin folding may impact damage distribution. A 5% reduction in damage formation was observed within the nucleosome center, whereas repair was substantially inhibited due to the inaccessibility of the nucleosome center. As such, the overall damage load was higher in the nucleosome center, aligned with observations made for UV-induced photodimers. 87 There are several important unique findings worth noting from the cisplatin-seq study. 83 The combined cisplatin damageseq, XR-seq study mainly focused on chromosome 17 which contains TP53, whereas the cisplatin-seq study provided a more detailed investigation of all chromosomes and mitochondrial DNA. Interestingly, in comparison to the fairly uniform distribution of cisplatin damage load observed using damage-seq, results from cisplatin-seq differed up to 3-fold amongst chromosomes. In addition, mitochondrial DNA (mtDNA) carried the largest amount of cisplatin damage, likely because NER does not take place in the mitochondria. 88 This finding was supported in later studies in mice (described below). 89 Furthermore, short-duration cisplatin exposure led to less damage on the mtDNA light strand, which carries more genes than the heavy strand, suggesting protection or repair proficiency especially for the mtDNA light strand by an as yet undefined mechanism. Unlike damage-seq, which benefited from the previously annotated nucleosome occupancy, the cisplatin-seq study additionally performed ChIP-seq on HeLa cells to determine the influence of chromatin states on damage distribution. Here, an increase in cisplatin damage was observed to coincide with nucleosome signals, suggesting that there is preferred crosslinking of cisplatin on nucleosomes.
These last results 83 contradict the damage-seq study 82 and may be due to the lack of repair data. Specifically, the higher cisplatin damage load could be the result of a lack of repair rather than a preference for damage formation. Subsequent studies mapping NER events in mice support this possibility, having demonstrated a very rapid peak of transcription-coupled NER (TC-NER) activity 2 hours after cisplatin exposure. 90 Given that cisplatin-seq was performed following cell exposure after 3-24 h, it is likely that TC-NER already took place. Finally, cisplatin-seq data was compared with ChIP-seq data, indicating that the occupancy of DNA binding proteins Pol II, EZH2, and CTCF coincide with cisplatin damage. The conclusion of this comparison is that there is an increase in DNA damage formation at sites where NER accessibility is reduced. As repair seems to play a major role in cisplatin crosslink distribution, further efforts should characterize the influence of genomic architecture on repair accessibility. Additionally, because XR-seq data represent a snapshot of repair at a given moment, these sequencing techniques should be applied in a timerecovery course to investigate how damage distribution changes over time. Finally, cisplatin doses investigated were substantially higher than therapeutic levels; therefore, future studies should address relevant doses.
Damage-seq and XR-seq have been applied to investigate the effect of cisplatin chronochemotherapy on genome-wide cisplatin damage distribution and repair across different organs in mice, the first time DNA damage mapping has been performed in vivo. [89][90][91] In addition to addressing a basis for cisplatin resistance, a second motivation to map cisplatin damage concerns potential off-target effects, for example, in chronochemotherapy, to find the optimal time of the day where the drugs will most efficiently kill cancer cells while reducing toxicity in other organs. Thus, NER of cisplatin-DNA damage was characterized using XR-seq and applied to analyze mouse liver and kidney due to the common side effects of nephrotoxicity and hepatotoxicity. 91 Damage in TSs was repaired up to 10-fold more efficiently than in non-transcribed strands (NTSs), an observation potentially explained by TC-NER being more active than global genome NER (GG-NER). Indeed TC-NER is active all the time because it depends on transcription, whereas a peak was observed at a specific time within the circadian rhythm (here it was Zeitgeber time ZT08) for GG-NER. This study showed when each gene strand will be repaired giving the first circadian DNA damage repair map in mice and is now being extended by the same group to obtain individual circadian map in different human tumor cell lines.
In a second study, this time mimicking clinical dosing (70 days), the same approach combining damage-seq and XRseq was used, to characterize damage maps in mouse liver. 90 Results indicated that up to 5 weeks were need to completely remove platinated DNA crosslinks from the mouse genome. Again, 90% of TSs were repaired after only 2 days, whereas damage persisted for NTS, which might have a detrimental effect on healthy cells by causing replication fork arrest, leading to cell death. Therefore, TC-NER should be considered as being the dominant form of cisplatin damage repair following drug administration, and therefore could be an important pathway for additional targeted therapeutic strategies.
Finally, the most recent study of the genome-wide distribution of cisplatin-DNA damage coupled XR-seq, damage-seq, and RNA-seq data for mouse kidney, liver, lung, and spleen. 89 The study revealed that the rate of NER on the TS and NTS of active genes is positively correlated with gene expression. Specifically, repair in the TS and NTS increases with gene expression and plateaus in the TS among highly expressed genes.
The data further suggest that cellular transcription stimulates the repair of damage in the NTS due to the fact that transcription is associated with an open chromatin conformation and increased accessibility to repair machinery. Interestingly, the spleen carried the least cisplatin damage, which could be explained by the fact that genes thought to be associated with cisplatin transport were downregulated (atp7B & Steap3). Finally, consistent with cellbased results, patterns of damage distribution appear to mainly be driven by repair activity.

Occurrence, relevance, and repair
Ultraviolet light (UV) is associated with the occurrence of at least 95% of all skin cancer cases. UVC (254 nm) and UVB (290-320 nm) wavelength ranges photoexcite pyrimidines in DNA to form cyclic dimers, 92 the most frequent of which are cyclobutane pyrimidine dimers (CPDs) and 6-4 pyrimidinepyrimidine photoproducts (6-4PPs). 87 UVC light is less of concern as it is mainly blocked by the atmosphere. 93 UVA light (320-400 nm) can also lead to CPD DNA damage to a lesser extent, but can penetrate the skin layers more deeply, reaching the dermis. 94 Additionally, UVB and UVA can lead to the formation of oxidative damage (8-oxodG) in DNA. 92 It is well-known that formation of pyrimidine dimers occur predominantly at TT sites, with much lower amounts of the corresponding TC, CT, and CC dimers. 95 Furthermore, CPDs are more abundant and likely more cytotoxic, but 6-4PPs are more mutagenic. 96 All of these bulky DNA damage events are the basis of human skin cancer because they can mispair during DNA replication giving rise to mutations.
The pathological role of DNA photodimerization was compellingly revealed through the characterization of human genetic deficiencies in XP proteins, contributing to the rare human disease xeroderma pigmentosum. It was later understood to be part of the NER machinery, which effectively removes these dimers from cells and protects against mutagenicity. 97 Similar to cisplatin crosslinks, UV-induced DNA damage is removed by NER and therefore UV damage repair can also be investigated using mapping methods specific to NER repair events. 98 If cellular repair is overwhelmed, there are deleterious implications to the cell including cell death 95 due to the stalling of replicative polymerases during DNA synthesis. 99 Finally, as specialized TLS polymerases bypass UV lesions, 96,100,101 there is a characteristic mutation signature comprised mainly of C -T and CC -TT mutations 102,103 that correspond to signatures found in skin cancer. 5 To test whether damage distribution is predictive of mutations, there is a need for more insight into the genome wide location of UV damage and identification of regions which are recalcitrant to NER repair.

Genome-wide mapping of UV damage
Given the strong association of skin cancer with the UV-induced mutational signature, 102 methods to map UV damage in the genome are essential to resolve the gap between DNA damage formation and mutagenesis. Two precursor studies obtained CPD maps with lower resolution by using antibodies for enrichment followed by microarrays hybridization in yeast and human cells. 104,105 Another study used a similar strategy of damaged DNA immunoprecipitation (DDIP-seq) but followed by next-generation sequencing to map CPD damage at a resolution of 100-300 base pair in nuclear and mitochondrial DNA in human cells. 106 At least four methods have been developed to map genome-wide UV damage formation or repair with single-nucleotide resolution: excision-seq, CPD-seq, XR-seq, and HS-damage-seq (Fig. 5). Excision-seq and CPD-seq rely on enzymes that cleave upstream or downstream of UV damage. XR-seq is unique in that it does not directly map UV-specific damage formation, but rather it maps the occurrence of NER events occurring at UV damage sites. Finally, HS-damageseq is an improvement of the damage-seq method which was previously used and described in this review to map cisplatin and oxaliplatin-induced DNA damage. As discussed above, damage first requires specific enrichment via an antibody against the DNA damage followed by stalling of a high-fidelity polymerase used for DNA replication to mark the damage location during library preparation. The first two methods (i.e. excision-seq and CPD-seq) have been used to map damage in yeast, and XR-seq and HS-damage-seq has been effectively extended to gain insight on human genome from cultured cells.
Of the four strategies reported to date addressing the genome-wide mapping distribution of UV damage at singlenucleotide resolution, excision-seq was the first to map UV damage at the genome wide level. 107 Genomic yeast DNA was selectively digested by Ultraviolet Endonuclease Damage (UVDE) enzyme, which cleaves upstream CPD and 6-4PPs (Fig. 5). This enzymatic digestion releases short damaged dsDNA fragments that need to be repaired before amplification. Next, specificity for either CPD or 6-4PPs mapping was achieved by repairing the fragments with specific photolyases. Specifically, Vibrio cholera CPD photolyase or X. laevis 6-4 photolyase was used to repair the pyrimidine dimers into mono pyrimidines, thus allowing for end-repair of the damage of interest and enabling adapter ligation for NGS library preparation. DNA fragment ends read in the sequencing data thus correspond to the location of previous UV damage.
With a very similar approach to excision-seq, CPD-seq also achieved precise genome-wide mapping of CPD damage, while also adding a new dimension to previous information on the distribution of photodimers in the yeast genome by integrating insight on the impact of repair and chromatin structure. 87 Genomic yeast DNA treated with UV was fragmented, endrepaired, dA-tailed, and ligated to adapters prior to digestion by T4 endonuclease V that specifically cleaves downstream CPD damage generating single-strand DNA breaks with AP sites at the 3 0 end. Next, the APE1 enzyme was used to remove the AP sites, releasing 3 0 -OHs at the end of the short ssDNA breaks to allow ligation and sequencing (Fig. 5). 108 CPD-seq specifically permitted mapping of UV CPD damage at single nucleotide resolution in yeast.
Application of excision-seq and CPD-seq to map CPD photodimers in UV-exposed yeast, despite the use of UV doses 100 times higher for excision-seq, revealed similar sequenceassociated preferences for photodimerization. As expected, CPD dimers primarily occurred between two Ts. The next most prevalent CPD sequence pairings were T-C, C-T, and C-C. Excision-seq additionally mapped 6-4PPs, indicating T-C is the most abundant followed by, T-T, C-C, and C-T. These results confirm older chromatography data. 109 A key benefit of the sequencing approach is the ability to examine the sequence context surrounding the dimer positions. Specifically, excisionseq data indicated a preference for an A downstream of 6-4PPs in yeast and this same preference was also observed in later experiments in human cells. 107 As the downstream A preference was not observed in the CPD dataset, it was concluded that the UVDE enzymes did not introduce this sequence bias; however, it is possible that this is an artifact related to the sequence preference of X. laevis 6-4 photolyase. While both studies revealed similar sequence preference, excision-seq revealed a uniform distribution of UV damage (CPD and 6-4PPs) in the yeast genome, whereas CPD-seq indicated that CPD damage distribution was not uniform. These differences may be a result of either the methodology or repair.
One strength of the CPD-seq study is that NER repair, chromatin structure, and their influence on CPD damage distribution were measured. Notably, UV damage formation and repair were reduced at strong nucleosome positions. Furthermore, NER was inhibited at translational positions near the strongly positioned nucleosome dyad and CPD formation within the nucleosome was lower at inward rotational DNA and higher in outward rotational settings. The interpretation of these data is that inward-rotated DNA is protected from UV damage because of DNA bending and flexibility imposed by the nucleosome structure (i.e. due to the principle that two pyrimidines need to be close and correctly aligned to form a dimer). Interestingly, cells might use the inward setting to protect A-T rich regions, which are more prone to be damaged by UV. Another finding was that there was significantly less CPD at TF-binding sites, suggesting TFs may act as guardians of important DNA sequences. While more studies are required, the CPD-seq study suggests UV-induced damage distribution is strongly influenced by nucleosomes and TF-binding sites.
Both excision-seq and CPD-seq methods are effective at mapping UV damage, however the use of digestion enzymes brings potential liabilities with regards to damage specificity and the potential for introducing artifacts during library preparation. For instance, UVDE enzymes can cleave after other bulky DNA adducts, or there could be certain sequence contexts in which the excision/repair enzymes are more efficient. In both studies, results were presented from experiments involving high levels of UV exposure (10 000 J m À2 for excision-seq and 125 J m À2 for CPD-seq) with no indication of a dose-response relationship or threshold for effective mapping. Additionally, a common limitation is that these methods may not be entirely specific and cannot be generally extended to every bulky DNA adduct because damage specific digestion/repair enzymes are required. Finally, these observations contrast subsequent findings in the human genome, highlighting that DNA damage distribution might be unique to certain species, or even to certain cancers.
The distribution of UV damage in the genome has also been characterized using more generally-applicable methods for bulky adducts: specifically XR-seq and HS-damage-seq (Fig. 5). 84,98 Rather than measuring damage itself, XR-seq uses NER's unique characteristic of releasing excised 30-mer damaged fragments during repair. These small fragments are isolated from genomic DNA based on their low-molecularweight and subjectivity to specific NER repair protein immunoprecipitation (TFIIHa). Fragments that are pulled down can then be subjected to a second damage-specific immunoprecipitation (in this case for CPD or 6-4PPs). Finally, either CPD or 6-4PPs photolyases are used to repair the damaged fragments to allow PCR amplification and sequencing. XR-seq was applied to map UV damage repair in human cells. However, given that XR-seq captures a map of NER repair of bulky adducts, it is a general approach and has also been used to map cisplatin and BPDE damage as well as NER events in various model organism including bacteria, plant, yeast, mouse, and human. [110][111][112][113][114][115] HS-damage-seq can also be considered a general approach for mapping bulky DNA damage. In fact, HS-damage-seq is based on the previously published damage-seq method (used to map for cisplatin and oxaliplatin damage 82 ) but includes an extra antibody enrichment for UV damage. As such, HS-damage-seq results in an increase in sensitivity needed to map more physiologically relevant exposure conditions and only requires 1 mg of input DNA. Briefly, HS-damage-seq requires initial immunoprecipitation of the damage of interest. This enrichment process is followed by a primer extension using the pull-down fragments as templates and a highfidelity polymerase to perform DNA synthesis. The highfidelity polymerase, like in cisplatin-seq and damage-seq described above, stalls at the site of the bulky damage leading to a termination site and production of a shorter DNA fragment attached to biotin. The resulting synthesized strands are purified by a biotin-streptavidin system and then undergo a subtractive hybridization step to further remove undamaged strands prior to amplification. As such, more than 95% of the reads generated through sequencing are specific to the damage of interest, also increasing sensitivity. To compare HS-damageseq, a new cisplatin map was generated using only 1 mg of input DNA. Importantly, this new map compared favorably with the original cisplatin map generated using damage-seq, 98 confirming that the new HS-damage seq method enabled more sensitive mapping of damage. HS-damage-seq and XR-seq are excellent methods to understand UV damage distribution dynamics and can be further combined to understand the basis of damage-induced cancer in human cells.

Genome-wide profiles of UV damage and repair in human cells
XR-seq and HS-damage-seq were combined to map UV damage in human skin fibroblast cells with deficiencies in GG or TC-NER pathways. Specifically, cells were exposed to environmentally relevant UV levels (20 J m À2 ) and DNA was collected at different time points (ranging from 20 to 240 min). The overall sequence context preference for UV damage was similar to what was reported in yeast using CPD-seq. However, when focusing on repair preferences and the influence of chromatin, several interesting patterns emerged.
First, global genome repair of CPD was slightly higher around the TSSs of genes. While the reason is not clear, it may be due to the higher levels of CPD formation in this area. However, it is more likely that the higher CPD repair in the TSS regions is a direct consequence of the higher levels of the TF TFIIHa binding, which ultimately initiates global NER. Additionally, the correlation between 16 different TF-binding sites and damage occurrence was investigated. However, no general pattern relevant to all TF-binding sites emerged. Rather, the relevance of TF-binding sites in damage formation is dependent on the TF and type of damage. As one example, the BHLEH40 TF position correlates with higher 6-4PPs but lower cisplatin damage load.
This study also confirmed that CPD damage is mainly repaired by TC-NER, while 6-4PPs are mainly repaired by GG-NER. 98 However, CPD distribution was the same regardless of chromatin states whereas the occurrence of 6-4PPs varied by chromatin state. In particular, 6-4PPs were most abundant in poised and active promoters as well as in repetitive regions, but had a lower frequency in heterochromatin. Therefore, similar to the findings of 8-oxodG and cisplatin, UV damage distribution appears to be the result of fairly uniform damage formation, but heterogeneous repair. In summary, the combination of XR-seq and HS-damage-seq helps to understand the importance of damage formation and damage repair in providing the overall map of damage and may be useful to more closely link damage patterns with mutagenesis and mutational signatures.

Relevance and repair
Benzo[a]pyrene (BaP) is a known human carcinogen, particularly associated with lung cancer in patients who smoke. 116 BaP is a polycyclic aromatic carbon environmental pollutant resulting from incomplete combustion. 117 While BaP is found naturally in fossil fuels, shale and crude oils, and coal tars, 117 human exposure to BaP is mainly from diet or cigarette smoke. 118 In cells, BaP can be metabolized by CYP1A1 and CYP1B1 to form the ultimate carcinogen diol epoxide (BPDE), which forms BPDE-DNA adducts, mainly by addition to the N 2 position of guanine. 119 BPDE-dG DNA damage is also repaired by NER 120 and, if unrepaired, BPDE-dG induces apoptosis and necrosis associated with mitochondrial signalling, and DNA double-strand breaks. 121,122 As such, different TLS polymerases are capable of bypassing BPDE-dG; Pol Z is highly error prone when bypassing BPDE-dG by inserting an A or T more efficiently than C, 123 on the other hand, Pol k can perform error-free TLS. 124 Pol i is unable to bypass BPDE-dG whereas in combination with Rev 1, pol z is error-prone. 125 Thus, BDPE-dG DNA damage typically gives rise to G -T transversions, preferentially at CpG dinucleotides. 126 Using cleavage enzymes on genomic DNA, followed by LM-PCR with specific TP53 exons primers, preferential formation of BPDE-dG in hotspot codons 273 and 248 in the human TP53 gene was observed, indicating a causal link between BPDE-dG damage and cancer development. 127,128 Unfortunately, it is not yet feasible to extend this strategy to an entire genome, and modern methods for whole genome identification of BPDE-dG are highly desirable.

Genome-wide mapping of benzo[a]pyrene
Since BPDE-dG DNA damage is repaired by NER, XR-seq mapping at the genome-wide level is possible (Fig. 6). 110 XR-seq, described above in the context of UV and cisplatin, involves capturing the short oligomers that are released during NER of bulky adducts. The fragments that are positive for damage are enriched for with the use of specific antibodies for the damage of interest. The captured fragments are then repaired to reverse the damage and are used for NGS library preparation and sequencing.
In the case of applying XR-seq to map BaP damage, there is one major challenge: no enzymes to reverse BPDE-dG damage. Therefore, XR-seq was modified by incorporating a translesion DNA synthesis step during PCR amplification using pol k for N 2 -BPDE-dG and pol Z for CPD, permitting bypass of the damage. This variation on XR-seq, termed tXR-seq expands its capacity to map damage repaired by NER. Nonetheless, there is still a limitation related to availability of an error-free TLS polymerase or combination of polymerases that bypass the damage of interest. After validating the tXR-seq method comparing the previous XR-seq CPD data with new tXR-seq CPD data, tXR-seq was applied to provide the first human repair events map of BPDE-dG DNA damage associated with a wide range of exposure in humans.

Genome-wide profiles of BPDE-dG repair in human cells
To map NER repair events of BPDE-dG using tXR-seq, DNA was extracted from a human lymphocyte cell line well characterized by the ENCODE project concerning chromatin states and regulatory region information (GM12878). Cells were treated with 2 mM BPDE and repair was allowed to take place for 1 h before DNA extraction. Preferred sequence contexts, such as a high frequency of CG dinucleotides for BPDE-dG were found in the NER excised fragments. The higher frequency of CG is likely related to the increased formation of BPDE-dG at CpG islands; 129 however, is it not clear whether these preferred sequence contexts are related to damage formation or repair.
Damage formation data, for instance from a damage-seq experiment, would be essential to discriminate between these two processes. Furthermore, the sequence coverage in this experiment was insufficient to make any claims regarding the repair of relevant genes involved in cancer development, such as TP53 hotpots. However, in comparing the repair of BPDE-dG to previous repair data (i.e. UV damage), the authors observed that the rate of the NER machinery repair of BPDE-dG damage ranged between fast repair for 6-4PPs and slower repair for CPD. Furthermore, the data revealed that BPDE-dG damage was only slightly more prevalent on the NTS, suggesting a minor role of TC-NER. This contradicts mutational signature data, specifically signature 4 associated with tobacco exposure, which shows the G -T mutations exhibit strong transcriptional strand bias. 130 This difference could be explained by a tissue type difference, given that lymphocytes, which were used in the tXR-seq study are not the most biologically relevant model for tobacco exposure. Regardless, the novel method tXR-seq is an improvement of the previous XR-seq which could not have been applied to map BPDE-dG DNA damage. The main limitation of tXR-seq is the requirement of identifying TLS polymerases which are known to be more efficient in bypassing BPDE-dG DNA damage in certain sequence contexts, therefore introducing a small bias during the one round PCR amplification. Next studies on benzo[a]pyrene damage distribution should aim at treating cells with meaningful doses of BaP or BPDE, use higher sequencing depths, improved bioinformatics and statistical analysis, and couple repair data to new damage formation datasets.

Relevance and occurrence
O 6 -methyl-2 0 -deoxyguanosine (O 6 -MedG) only contributes B5% of total damage induced by alkylating agents, but it has dramatic biological effects compared with more abundant N-methyl adducts. 131 Endogenous methylation results mainly from S-adenosyl methionine 132 which forms 10-30 residues of O 6 -MedG per cell each day. 133 Exogenous alkylating agents also contribute to the occurrence of O 6 -MedG in the genome, including natural and anthropogenic components of air, water, and food, as well as several chemotherapeutic agents. 134

View Article Online
For example, the oral methylating agent temozolomide (TMZ) is widely used to treat brain cancers, including glioblastoma multiforme and astrocytomas. 135 While there is evidence that TMZ extends the survival of patients, resistance often occurs due to increased expression of the repair enzyme alkylguanine alkyltransferase (AGT). 136 The primary repair of O 6 -MedG is through direct reversion by AGT. Biochemical assays, 137 crystal structure data, 138 cellular studies, 139 and transgenic mouse studies 140 all confirm that AGT repairs O 6 -MedG and other O 6 -alkylguanine DNA damage, and that its expression greatly reduces the incidence of mutations caused by exposure to methylating agents. On the other hand, in the case of TMZ therapy, AGT overexpression renders TMZ less effective. To mitigate this problem, the methylation status of the AGT gene promoter, MGMT, is used as a diagnostic strategy for stratifying treatment regimes, and its epigenetic silencing in tumor cells is associated with glioma sensitivity to TMZ. 141,142 AGT removes alkyl groups located on the O 6 -position of guanine in one direct transfer step, regenerating the undamaged guanine residue. The current model of AGT-mediated repair involves cooperative binding of the AGT protein to the minor groove of DNA. AGT then scans the genomic DNA and flips the O 6 -MedG residue into its active site, permitting the transfer of the alkyl group and releasing the dealkylated DNA. 143 Several biochemical studies have revealed that sequence context, including the opposing base, impacts the repair of O 6 -MedG by AGT. 137,[144][145][146] In contrast, recent work from Essigmann and coworkers showed no specific mutational patterns arising from AGT repair in mouse cells treated with the alkylation agent N-methyl-N-nitrosourea (MNU). 147 However, there is little work on AGT accessibility to different chromatin states in mammalian cells. Thus, developing a mapping method for methylation formation and repair processes would help fill this gap. 148 O 6 -MedG is highly mutagenic as a result of mispairing during DNA replication. 149 Replicative polymerases and TLS polymerases including Pol Z, Pol k and Pol z can bypass O 6 -MedG, causing misincorporation of T with up to a 10-fold misinsertion. [150][151][152][153] The O 6 -MedG:T mismatch results in a G -A transition mutation upon the second round of synthesis. In addition to being mutagenic, O 6 -MedG is cytotoxic via a unique indirect process resulting from the recognition of the O 6 -MedG:T mispairing by the mismatch repair pathway. [154][155][156] This mispairing gives rise to the mutational spectra mainly composed of G -A transitions. 157 In particular, signature 11 is observed in malignant melanomas and glioblastoma multiforme treated with the alkylating agent temozolomide. 5 Likewise, in vitro immortalized primary murine embryonic fibroblasts treated with the methylating agent N-methyl-N 0nitro-N-nitrosoguanidine (MNNG) showed signature 11. 158 A key defining feature of signature 11 is the prevalence of these G -A transition mutations; 159 and thus suggests that signature 11 is relevant to O 6 -MedG damage derived from exposure to methylating agents. However, there are no methods to map O 6 -MedG at a single nucleotide resolution. Their development would allow for the extrapolation of a damage spectrum that could directly be compared with mutational signature, highlighting the contribution of specific DNA damage into the final mutational load.

Sequence-based DNA damage analysis
Despite the high volume of sequencing methods and studies addressing methylated cytosine (5-methylcytosine, 5-mC) [160][161][162][163] and methylated adenosine (6 0 -methyladenosine, 6-mA), [164][165][166][167] there are no genome-wide sequencing methods for mapping O 6 -MedG or other chemically-induced alkyl DNA damage. The lack of methods is likely a consequence of both the base pair misincorporation by and stalling of polymerases opposite O 6 -MedG 168 -whereas natural modifications such as 5-mC and 6-mA are more readily amplified. Despite this major challenge, there has been recent progress in using unnatural nucleobases that specifically pair with this damage as an approach to identify or amplify it within particular sequence contexts by hybridization of polymerase-based amplification strategies. 169,170 A nanoparticle-based-hybridization strategy permitted sensitive detection of O 6 -MedG in a sequence-specific manner. 169 In this approach, elongated hydrophobic nucleobase analogues were designed to base pair to O 6 -MedG. As such, short oligonucleotides that contained the nucleobase analogue formed a more stable duplex with a complementary sequence that contained O 6 -MedG vs. G. To develop a biosensor from this system, the nucleobase-containing oligonucleotides were conjugated to gold nanoparticles. In this way, the gold nanoparticles served as a quantitative dose-responsive optical readout, where dispersed nanoparticles displayed a bright red color and aggregated nanoparticles resulted in a measurable blue color. In particular, the abundance of O 6 -MedG located at a mutational hot spot could be quantified within the human KRAS gene in mixtures with competing unmodified DNA. While many challenges remain before this approach can be implemented for biological studies, the approach is suited for massively parallel sequence probes for analysis of multiple genes, but not whole genome analysis. Moreover, high levels of input DNA are needed, 171 thus, additional enrichment of samples would be needed. In spite of these hurdles, this proof-of-concept suggests hybridization-based assays that incorporate nucleotide modifications as a strategy for detecting DNA damage within defined genomic contexts.
A second chemistry-oriented approach to the selective amplification of O 6 -MedG in defined sequence contexts and a potential basis for mapping DNA alkylation damage involves the use of artificial nucleotides incorporated opposite the damage site by a DNA polymerase. In this way, DNA adducts can be marked by a non-natural nucleotide rather than by inserting a mismatched T which causes a loss of the damage identity. Furthermore, if a polymerase can efficiently incorporate the artificial nucleotide, exponential amplification of the marked damage may also be possible, thus enabling the detection of extremely low abundance DNA damage. Polymerase-mediated incorporation of synthetic nucleotides opposite a DNA adduct has been reported for AP sites, 172 8-oxodG, 173 cis-platinated guanine, 174 as well as for O 6 -alkyldG. 170,175 The detection of damaged bases was achieved thanks to the development of new artificial nucleotides that were designed and tested for the base pairing and that are reviewed here. 176 Recently, we have used this strategy to quantify and localize the related mutagenic O 6 -alkyl-G adduct O 6 -carboxymethyldG by amplification with artificial nucleotides. 177

DNA sugar and backbone damage
We have so far reviewed methods for mapping various DNA damage types including oxidative DNA damage (i.e. 8-oxodG), cisplatin-derived damage (i.e. crosslinks), UV damage (i.e. 6-4PPs and CPD), and alkylation damage including BPDE-dG and O 6 -MedG, focusing on how the chemistry of the damage drives strategies to map them. However, from these various chemicallyinduced adducts and their repair, several downstream forms of DNA damage also occur. For example, AP sites are a common product of DNA damage and are generated as an intermediate in base excision repair (BER). Single-strand breaks (SSBs) also arise due to oxidative DNA damage or a failure in DNA repair and thus are the most common form of DNA damage. SSBs may also lead to double-strand breaks (DSBs) which, if unrepaired, lead to cell death and, if mis-repaired, cause chromosomal translocations, an early step in carcinogenesis. Finally, due to the high levels of ribonucleotide precursors present as potential substrates during DNA synthesis and repair, many are incorporated into DNA. In particular, studies have addressed the distribution of uracil in DNA. Here, we introduce recent technologies to map these general forms of DNA damage and summarize the most important biological insights resulting.

Abasic sites
AP sites commonly arise as a by-product of DNA damage, either because damage formation typically accelerates depurination rates, 178 or because they are generated as an intermediate in base excision repair, wherein small DNA base modifications such as 8-oxodG or uracil are removed by DNA glycosylases. 179 Another source of AP-sites is in the maintenance of epigenetic DNA base modifications by thymine DNA glycosylase when removing 5-formylcytosine (5-fC) or 5-carboxylcytosine. 180 Typically, AP sites are recognized by AP-endonuclease and repaired with reinsertion of the correct base. However, if not repaired, their persistence can lead to mutations, stalling of polymerases and genomic instability. 181 It has been shown that AP sites can cluster in the genome during DNA replication, but little is known about their precise location in the genome and in which sequence context they preferentially form or persist. 182 Three methods were recently developed to sequence AP sites; AP-seq, 59 snAP-seq 72 (Fig. 7) and Nick-seq, 183 with the latter two having single-nucleotide resolution. AP-seq and snAP-seq rely on the reactivity of the aldehyde group exposed in the acyclic form of the 2 0 -deoxyribose ring to tag the AP site with biotin. Following tagging, the DNA is enriched using streptavidin, recovered, and prepared for NGS, where AP sites are called and thus mapped to the genome. Since the epigenetic base modifications 5-fC and 5-formyluracil (5-fU) contain reactive aldehydes, sometimes occurring at a higher abundance than AP sites, snAP-seq includes an additional step involving alkaline cleavage in which only the AP sites (i.e. not the formylated bases) lead to DNA strand scission. This selection increases the specificity of the capture. 184 As such, the DNA fragments containing AP sites are released from the biotin pull down and are recovered for NGS, whereas DNA fragments containing 5-fC and 5-fU remain immobilized. 72 More recently, Nick-seq was reported to use endonuclease IV (Endo IV), an AP endonuclease to create new strand breaks at AP sites after blocking pre-existing breaks. Nick-seq is a versatile method that could generally be used for any DNA damage that can be converted into a single-strand break. The strand breaks originating from AP sites were captured at the 3 0 -and 5 0 -ends using two complementary strategies: nick translation with a-thio-dNTPs for 5 0 -end sequencing and terminal transferase tailing for 3 0 -end sequencing. The co-location of AP sites at both ends increases the sensitivity and specificity of the resulting map. As such, these AP-specific sequencing methods provide efficient approaches for further exploring the genome-wide biological impacts of AP sites.
AP sites have been so far mapped in parasitic worms (Leishmania m.), bacteria and human cell lines (HepG2 and HeLa). In the snAP-seq study, AP site distribution at single nucleotide resolution in human cells competent or not for BER (APE1) was investigated. No specific locations where AP sites might accumulate were identified, so it was concluded that AP site accumulation is stochastic in a cell population. On the other hand, when data was binned to characterize genomic regions more broadly, APE1 was identified as especially proficient in repairing AP sites in regulatory and genic regions. This latter observation highlights the paradox that with singlenucleotide resolution it might be harder to make claims due to the heterogeneity present between each cell and that often, more informative information may be derived based on lower resolution analysis.

Single-strand breaks
SSBs are one of the most common types of DNA damage. They can arise from oxidative stress, DNA repair failure, activity of topoisomerase, or destruction of the sugar backbone. 185 SSBs can cause the replication fork collapse, leading to even worse damage such as DSBs. 186 Thus three methods have been developed to map SSBs including SSB-Seq, 187 SSiNGLe (single-strand break mapping at nucleotide genome level) 188 and GLOE-Seq. 189 In the SSB-Seq study, SSBs were tagged with nucleotides attached to digoxigenin during a nick translation step with DNA polymerase I followed by an immunoprecipitation with an antibody anti-digoxigenin. 187 Results suggested that SSBs induced by topoisomerase II were primarily located in the promoter regions of genes in human cell lines.
The SSiNGLe method involves tagging the 3 0 -OH terminus of a DNA strand, which represents the position of a SSB, by adding a poly A tail with a terminal transferase. Helicos Single Molecule Sequencing (SMS) and Illumina platforms (ILM) were used to map SSBs in genomic DNA. Extracted DNA is first fragmented by MNase leaving 3 0 primer-phosphate ends which are not recognized by the terminal transferase. The polydA tail is then either captured on a flow cell harboring chain of dT oligonucleotides for SMS or amplified with oligo-dT primers before adapter ligation for Illumina sequencing. The distribution of SSBs was characterized in a variety of human and mouse cell lines and called the breakome. The patterns of distribution of SSBs changed across cell types as well as within the same cell type in response to anticancer drugs. Thus, the breakome of peripheral blood mononuclear cells from patients correlated with age, which shows the close association between SSBs and aging-related disease states.
Similarly, GLOE-Seq takes advantage of the presence of the 3 0 -OH terminus but introduced a new strategy to eliminate the polydA and the possible repetitive sequence limitation inherent to the Illumina platform. Thus, a ligation strategy on the 3 0 -OH terminus of SSB with a biotinylated adaptor followed by a biotin pull down was carried out. The strength of this method is its applicability to double-strand breaks, Okazaki fragments and to any type of DNA damage that that can be converted into a nick or a gap with a free 3 0 OH terminus. The distribution of SSBs was mainly located in the leading strand due to polymerase e activity. The authors suggest that since polymerase e incorporates ribonucleotides, the main cause of SSBs are repair intermediates of ribonucleotides misincorporations.

Double-strand breaks
DNA DSBs are an ultimate fate of diverse forms of DNA damage. For example, DSBs arise indirectly from two closely located SSBs or during the repair of other DNA damages. DSBs are also formed due to replication fork collapse, which occurs during the replication of SSBs or damaged bases. 190 However, DSBs also arise during transcription, meiosis, and replication stress. 191,192 Regardless of the etiology, unresolved DSBs occur at an estimated rate of 50 breaks per cell per day and may lead to cell death or mutagenesis, including translocations, deletions and amplifications. 193 As such, there have been numerous methods developed to sequence the location of DSBs in whole genomes.
First developed in 2013, the BLESS (breaks labeling and enrichment on streptavidin and sequencing) method involves the ligation of DSBs to a biotinylated linker. Following streptavidin enrichment, an additional linker is added which allows PCR amplification and sequencing of DNA fragments containing the DSBs. 194 Follow-up methods to this basic ligation and enrichment strategy include Break-seq, 195 DSBCapture, 196 END-seq, 197 GUIDE-seq, 198 BLISS, 199 DSB-Seq, 187 and qDSB-Seq. 200 Notable improvements on the original method include use for mapping DSBs in mice in vivo (i.e. END-seq) and increased sensitivity (i.e. DSBCapture, 196 END-seq, 197 and BLISS 199 ). For example, the authors impressively demonstrated that END-seq is sensitive enough to detect a single DSB within a sample of 10 000 cells.
The most recently reported method, qDSB-Seq, uses sitespecific endonucleases to create a DSB in a controlled manner, using this material for quantitative normalization of the subsequent analysis. As such, the key innovation of qDSB-Seq is combining nucleotide-resolution mapping for localization and simultaneous quantification of DSBs. The coupling of damage localization with direct quantification is a very new concept which might play a central role when investigating doseresponse relationships for DNA damaging agents.

Ribonucleotides
Ribonucleotides, the building blocks for RNA synthesis, are also incorporated into nuclear DNA by several polymerases, including DNA polymerases, [201][202][203] RNA primase, 204 and PrimPol. 205 As a result, ribonucleotides are by far the most abundant noncanonical nucleotides present in eukaryotic cells. As an example, an excess of 1.3 million ribonucleotide sites are introduced into the nuclear genome each cell division in mouse embryonic fibroblasts. 206 To maintain genome stability, ribonucleotides are quickly removed in multiple ways, including Okazaki fragment (OF) maturation, ribonucleotide excision repair (RER), topoisomerase 1 (TOP1) cleavage, and prokaryotic nucleotide excision repair. 207 Unrepaired ribonucleotides represent a major threat to genome integrity due to the reactive hydroxyl group at the 2 0 position of the ribose sugar, 208 leading to strand breaks, 209 deletion, 210 cell cycle checkpoint activation, 211 abortive DNA ligation, 212 aberrant recombination, 206 and the formation of protein-DNA crosslinks. 213 Strategies to map ribonucleotides at single-nucleotide resolution are based on specific alkaline cleavage or enzymatic cleavage of ribonucleotides, because of the highly reactive 2 0 -hydroxyl groups (Fig. 8). One strategy is to use Arabidopsis thaliana tRNA ligase (AtRNL) to ligate the 2 0 -phosphate termini of DNA derived from alkaline cleavage to the 5 0 -phosphate terminus of the same DNA strand to produce an ssDNA circle containing the embedded ribonucleotide (Ribose-seq). 214 Sequencing results using Ribose-seq in yeast show a higher rNMP incorporation on the newly synthesized leading strand than lagging strand. This is consistent with results showing that leading strand DNA Pol e shows lower fidelity to ribonucleotides than the lagging strand Pol d. 214 Other sequencing methods include HydEn-seq 215 that involves ligation of an adaptor directly to the free 5 0 -OH end and Pu-seq 216 that uses random hexamer primer extension to synthesize a flush end adjacent to the initial ribose. Another strategy, emRiboSeq, 217,218 uses the RNase H2 to recognize and cleave the ribonucleotides site, generating 3 0 -OH and 5 0 -phosphate groups. All these four methods have been applied to budding yeast, including strains with mutant replication polymerases that introduce excess ribonucleotides into DNA.
To date, all applications of ribonucleotide sequencing have focused on leveraging the distribution of ribonucleotides as markers of replication enzymology. Error-prone synthesis by Pol a is retained in yeast and incorporates ribonucleotides into 1.5% of the mature genome. The different methods used all obtained similar results concerning replication polymerases, and it appears that these methods are well suited to address broader biologically relevant or toxicology studies. For example, there are several open questions regarding the impact of ribonucleotide misincorporation and the influence of chromatin state, transcriptional activity and local sequence context on ribonucleotides distribution. Furthermore, future studies may be useful for understanding the genomic connections between embedded ribonucleotides and diseases related to RNase H2 mutation.

Uracil
Uracil in genomic DNA (dU) results from cytosine deamination or direct incorporation during DNA synthesis. Cytosine deamination leads to a U:G mispairing, eventually leading to C 4 T mutations whereas dUTP incorporation instead of dTTP leads to A:U mispairing, which is not directly mutagenic. 219 Uracil also can be repaired by uracil glycosylases (UDG enzymes) involved in BER, resulting in abasic sites and mutation. Therefore, there are several methods aimed at locating uracils.
Excision-seq was first used to map uracil in genomic DNA. 107 Specifically, cleavage at the site of the uracil, using UDG and T4 endonuclease IV coupled to Excision-Seq, generated a singlenucleotide resolution map of uracil distribution. The authors observed that distribution of uracil in the genome is tightly correlated with replication timing and hypothesized that it arises from changes in nucleotide pool composition during replication. This observation could be important to relate uracil distribution to mutational signatures where a replication timing factor is also observed.
Surprisingly, since its report in 2014, excision-seq was no longer used, but lower resolution methods were developed such as dU-seq, 220 UPD-seq 221 and U-DNA-seq. 222 Alternatively a gapligation approach developed by Burrows and et al. could in theory be used to map any lesions repaired by BER but has not yet been used for whole genomes. 69 dU-seq uses UDG to remove uracil bases and replace them with biotinylated nucleotides to yield pull-down fragments for sequencing. The authors found that uracil content was high at centromeres in human genome. UPD-seq also uses UDG to remove uracil and then tag the abasic sites, forming a very easy disulfide link and a biotin-containing chemical (ssARP). With the UPD-seq strategy, the so-called uracilome was defined in a bacterial strain active in human apolipoprotein B mRNA-editing enzyme catalytic subunit (APOBEC), allowing for correlation of the mutational footprint left by cytosine deamination enzymes.
Finally U-DNA-seq involves the use of a uracil sensor (a mutant of human protein UNG2 one of the BER glycosylase of uracil) to locate uracil damage followed by immunoprecipitation and purification before sequencing of the uracilenriched DNA. Furthermore, uracil distribution in human tumors cells upon chemotherapeutic treatment with raltitrexed and 5-fluoro-2 0 -deoxyuridine moves from heterochromatin regions to euchromatin (active transcriptional regions).

Conclusion
The field of DNA damage sequencing is new and advancing rapidly. There is massive data available concerning landscapes of mutation in disease, and relative to this little on how the chemistry-driven process of DNA damage and its patterns of distribution in the genome shape them. The more we gain insight into damage distribution both in regards to local sequence context and higher-scale genome arrangement, the better we will be able to link mutational signatures observed in human cancer to their etiological basis in a potentially prognostic manner. Thus, this review is useful to consider a broad spectrum of DNA damage sources, both derived from specific chemical modifications, such as carcinogen-DNA adducts, as well as general forms of damage, such as strand breaks, in regards to the state of the art on how to sequence them and emerging biological implications.
In the future, DNA damage maps are expected to be an important tool in the sequencing arsenal for studying mutagenesis, carcinogenesis, aging processes, and responsiveness to DNA damaging drugs. However, sequencing DNA damage in a reliable and robust manner still requires significant work. Notably, most of the methods discussed here rely often on a specificity step and therefore cannot be extended to different forms of DNA damage. This specificity is required because damage products are present at such low frequencies and DNA samples need to be enriched for analysis. For 8-oxodG, platinated DNA, UV damage, benzo[a]pyrene DNA damage and uracil, enrichment has been achieved using antibodies. Apart from such antibodies, which could be devised de novo, albeit with significant time and expense, other enrichment strategies such as specific excision/cleavage or chemical conjugation are exclusively applicable to particular forms of damage that they naturally target. Thus, specific direct excision of the damage or cleavage next to the damage site to insert a probe/adapter have also been possible for 8-oxodG and UV damage, in addition to ribonucleotides and abasic sites. Finally damage-specific chemical conjugation has been used to enrich 8-oxodG and abasic sites.
Enriched samples of damaged DNA fragments, obtained i.e. by antibody pull-down, could be sequenced directly after damage removal, or by using TLS polymerases to bypass the DNA damage. The sequencing resolution of this general approach depends on the basis of fragmentation and resulting size of the fragments sequenced. In the case of tXR-seq and XR-seq, the fragments were released by NER enzymes, and they were short (20 mer). Thus, a single damaged nucleotide position could be identified because the damage was present exactly in the middle of the resulting oligonucleotides. With OxiDIP-seq and enTRAPseq, however, the resolution depended on the size of the sonicated or digested DNA fragments, around several hundred base-pair resolution. A significant advance was introduced by the cisplatin-seq, damage-seq and HS-damageseq methods in order to obtain single-nucleotide resolution data, namely a high-fidelity polymerase that stalls at the site of DNA damage during a single PCR round. While this highfidelity polymerase strategy was effective for large adducts including platinated crosslinks, UV damage and BPDE-dG which reliably block synthesis, such an approach is unlikely to work for smaller DNA damage such as 8-oxodG or O 6 -MedG. Third-generation sequencing technologies could be in this case potentially useful as (single-molecular real-time sequencing and nanopore sequencing) that do not require amplification and work on small epigenetic DNA modifications or DNA damage. 64,[225][226][227][228][229][230][231] The second strategy involving specific DNA strand cleavage or enzyme excision of the damage was also suited to map DNA damage independent of their size. This strategy was based on the recognition of DNA damage by nature's own recognition systems, such as Fpg for 8-oxodG, UVDE for UV damage, T4 endonuclease V for CPD damage, RNase H2 for ribonucleotides, or UDG for uracil. These methods created a nick or gap at the damage location, enabling the introduction of a sequencing adaptor/enrichment probe by ligation, as in excision-seq, CPDseq, all ribonucleotide seq, dU-seq or UDP-seq, or by click chemistry, as in click-code-seq. One of the largest shortcomings of these strategies was false positive reads from background gap or nick sites generated during DNA sonication for instance. Thus, excision-seq, Pu-seq, and HydEN-seq corrected for this by fragmenting genomic DNA directly with cleavage treatment therefore removing the sonication step. CPD-seq, emRiboseq, ribose-seq, click-code-seq and nick-seq corrected the false positive signals by blocking the already present nick or gap with an adapter before excision/cleavage, and only then a second adapter/ddNTP was introduced for specific amplification. Finally, as in the case of antibodies, enzymatic cleavage enrichment methods also are potentially limited in regards to specificity. Namely, the excision enzymes might not be only specific to one substrate and therefore the mapping is the result of the enzyme substrate scope rather than a single DNA damage. Nonetheless, such approaches are valuable to understand the biological implication of an enzyme in a controlled experiment with controlled exposures that generate DNA damage.
A third key strategy for damage sequencing has involved direct chemical conjugation of DNA damage with affinity probes, such as OG-seq for 8-oxodG, AP-seq and snAP-seq for abasic sites. Damage-enriched DNA could be used directly for sequencing with B100 bp resolution (OG-seq, AP-seq) or fragmented at damage sites by alkaline treatment, in order mark the single-nucleotide position (snAP-seq). Chemical conjugation can also pose the problem of specificity as the probe might react with other aldehydes group present on DNA, such as 2-deoxyribose oxidation.
Despite the impressive obstacles that have been overcome with damage sequencing, there remain significant limitations. One of these is the possibility to determine both quantities and locations of DNA damage, which are critically needed to evaluate dose-response relationships to DNA-damaging agents. So far only qDSB-Seq was able to accurately quantify the number of DSBs simultaneously with their location by using calibrated samples that have a known amount of DSBs. Additionally, when working with large genomes such as human or mouse genomes, billions of reads need to be sequenced to draw robust biological conclusions in specific genes. Finally, bioinformatics pipelines have been developed for each of the strategies, integrating key aspects such as data normalization, however there are no standardized pipelines as for more common sequencing analysis such as for variant calling. Despite these aspects that are undergoing development, damage sequencing has already offered the opportunity to understand dynamics of damage formation and repair at the genome-wide level in a variety of organisms.
Given the difficulties of uncovering discreet chemical reactions on the scale of the genome, it is astounding that there are finally several possibilities to use sequencing methods to address chemically induced DNA damage distribution from the level of local sequence context, to the influence of transcription factors and higher chromatin structures. Amongst these data, almost all have illuminated that there is a heterogeneous distribution of DNA damage in the genome, resulting from combined dynamics of damage formation and repair activity both being influenced to varying extents by sequence context, DNA protein binding sites and nucleosome positioning. For instance, coupling of damage sequencing with nucleosome location data have revealed how the rotational setting of wrapped DNA around histones influence damage formation and repair. Additionally, DNA damage such as 8-oxodG in gene promoters have also been found to have a regulatory role in gene expression. As DNA damage distribution may be an early event dictating potentially adverse effects, anticipated future development of diverse DNA damage maps is expected to help understand and better predict etiologies of mutational landscapes in human cancer genomes and other biological processes driven by genome instability.

Conflicts of interest
There are no conflicts to declare.