Manjula
Jaisal‡
,
Rajesh Kumar Reddy
Sannapureddi‡
,
Arjun
Rana
and
Bharathwaj
Sathyamoorthy
*
Department of Chemistry, Indian Institute of Science Education and Research, Bhopal 462066, India. E-mail: bharathwaj@iiserb.ac.in
First published on 5th December 2022
DNA epigenetic modifications such as 5-methyl (5mC), 5-hydroxymethyl (5hmC), 5-formyl (5fC) and 5-carboxyl (5caC) cytosine have unique and specific biological roles. Crystallographic studies of 5mC containing duplexes were conducted in the A-, B- or the intermediate E-DNA polymorphic forms. 5fC-modified duplexes initially observed in the disputed F-DNA architecture were subsequently crystallized in the A-form, suggesting that epigenetic modifications enable DNA sequences to adopt diverse conformational states that plausibly contribute to their function. Solution-state studies of these modifications were found in the B-DNA form, with marked differences in the conformational flexibility of 5fC containing duplexes in comparison to C/5mC containing duplexes, compromising the DNA duplex's stability. Herein, we systematically evaluate sensitive and commonly inaccessible NMR parameters to map the subtle differences between C, 5mC, and their oxidized (5hmC/5fC) counterparts. We observe that 15N/1H chemical shifts effectively report on the weakening of 5fC–G Watson–Crick base-pair H-bonding, extending the instability beyond any achievable within the sequence-specific changes in DNA. Triple 5fC containing sequences propagate the destabilization farther from the site of modifications, explaining reduced duplex stability upon multiple modifications. Additionally, scalar and residual dipolar coupling measurements unravel local sugar pucker fluctuations. One-bond 13C–1H scalar coupling measurements point towards a significant deviation away from the anticipated C2′-endo pucker for the 5fC modified nucleotide. Structural models obtained employing 13C–1H residual dipolar couplings and inter-proton distances corroborate the sugar pucker's deviation for 5fC modified DNA duplexes. The changes in the sugar pucker equilibria remain local to the 5fC modified nucleotide sans additive/long-range effects arising from multiple contiguous modifications. These observations highlight the impact of a major groove modification that alters the physical properties of DNA duplex without disturbing the Watson–Crick face. The changes observed in our studies for the 5fC containing DNA contrast with the perturbations induced by damage/lesion highlight the varied conformational preferences that modified nucleobases impart to the DNA duplex. As sequence-specific DNA transactions are rooted in the base-pair stability and pucker deviations, the observed structural perturbations for 5fC-modified DNA potentially play critical functional roles, such as protein-DNA recognition and interactions.
Cytosine modifications in the major groove retain the conventional Watson–Crick hydrogen (H-)bonding pattern (Fig. 1A). X-ray crystallographic studies of singly hemi-modified 5mC/5hmC/5fC in the CpG step of palindromic Drew-Dickerson dodecamer duplex DNA (5′-CGCAATTNGCG-3′, referred to as DDDN, N = 5mC/5hmC/5fC modification, indicates the N-G pair) showed minimal perturbation from the B-DNA architecture.33–355mC incorporated in a G-C base-pair rich palindromic hexamer d(5′-GGCC-3′)2 was crystallized in an intermediate E-DNA form with bases being perpendicular to the helical axis (B-form like) while the sugars sample an A-form like the C3′-endo pucker.36 The metastable E-DNA eventually equilibrates under crystallographic conditions to the A-DNA form.36 On the other hand, the triply 5fC modified palindromic dodecamer sequence (5′-CTATAG-3′, referred henceforth as DNAF3, Fig. 1A) was crystallized in a form that alters the hydration pattern stabilizing propeller twist and base-pair opening parameters, that appeared to differ significantly from A- and B-DNA forms, and hence led to a newly proposed class of architecture called the F-DNA.37 Such an observation correlated with differences in the circular dichroism (CD) signatures of DNAF3 compared to the unmodified DNA (DNAcontrol, Fig. 1A), in line with in silico modeling that predicts that the helical under-winding traps water molecules stabilizing the proposed F-DNA form.38 However, a subsequent study showed that structures of both DNAF3 and DNAcontrol sample the A-DNA form with no significant differences in the spatial arrangement of heavy atoms.39 Previously reported differences in CD signatures between DNAF3 and DNAcontrol were attributed to potential changes in the local electronic transition dipole moment rather than due to global structural perturbations of the DNA duplex.39 Hence, the next question follows whether the structure observed in the crystal form would be retained or be any different in the solution-state conditions.
Solution-state 1H-based nuclear magnetic resonance (NMR) studies of DNAF3 substantiated that the 5fC modification maintained the B-DNA form, as adjudged from the inter-proton distance and 1H–1H scalar coupling measurements.39 Interestingly, this study hinted at a deviation from the C2′-endo pucker only for the 5fC-modified nucleotides. Imino 1H-exchange NMR experiments performed on hemi-modified DDDN (N = 5mC/5hmC/5fC) samples showed increased base-pair opening rates for 5fC compared to the unmodified duplex suggesting subtle differences in their conformational landscape.35 Single-molecule fluorescence-based DNA cyclization assays revealed that 5fC modification imparts enhanced flexibility compared to unmodified cytosine-containing duplexes, while 5mC rigidifies the duplex.40 Steady-state and time-resolved infrared spectroscopy showed that 5fC in DNAF3 increases base-pair fluctuations reducing the cooperativity of duplex formation and thereby increasing the double-strand dissociation rate constant.41 The weakening of the duplex was attributed to the reduced pKa of the N3 nitrogen atom in 5-formyl modified cytosine that accepts the proton from the pairing guanosine nucleobase (Fig. 1A).42,43 Recently, solution-state 1H-based relaxation dispersion measurements have demonstrated an increase in the population of the single-stranded form for the 5fC containing DNA duplex44 (5′-GCGATGATCGC-3′). Additionally, it was reported that the destabilization propagates across the DNA duplex beyond the single 5fC–G fully modified base-pair. These observations suggest that 5fC modification might not alter the structure as much in comparison to cytosine or 5mC, but may interfere with the conformational fluctuations due to its unique chemical properties.
While the effect of a single site modification has been characterized, the influence of multiple contiguous modifications on DNA duplex structure is yet to be explored. Additionally, cytosine nucleotides are known to exhibit enhanced sugar puckering dynamics in comparison to other canonical nucleotides catering towards sequence-specific recognition.45,46 Therefore, a question arises whether these modifications alter such specific conformational dynamics of DNA duplexes, and whether can there be more NMR probes for measuring the same. Also, we sought to compare the destabilization/fluctuations achieved by the 5fC–G pair to what is achievable within the canonical C–G framework without modifications by only altering the primary sequence. In this study, we present NMR probes to understand the effect of single and multiple cytosine modifications (5mC, 5hmC, and 5fC) on the global structure and dynamics of DNA duplexes using solution-state NMR spectroscopy. Additionally, using these parameters we probe the presence/absence of differential sugar puckering of 5fC-containing duplexes.
Heteronuclear 13C/15N chemical shifts,47–50 scalar couplings,51,52 and partial anisotropic parameters, such as residual dipolar couplings53–57 (RDCs), are sensitive in characterizing conformational properties of DNA duplexes.58 RDCs provide a relative orientation of bonds across the molecule and thus improve the global structure of DNA duplexes, that otherwise evade conventional characterization that employs inter-proton distances and 1H–1H scalar coupling measurements. The structural perturbations employing RDCs for duplexes have been well characterized for DNA comprising of A-tracts,55 nucleotides with a locked sugar pucker,56 and N1-methyladenine57 (m1A) modification. In particular, the damage modification m1A present in duplexes results in bending of the helical axis and contributes to local base-pair melting suggesting a pre-primed bent DNA for effective protein recognition toward damage repair.57
In this work, we employ an optimized sparse sampling methodology that reduces overall measurement times of two-dimensional NMR data by 75%, thus making it possible to measure heteronuclear (13C/15N) shifts and RDCs robustly at low concentration (∼100 μM) in natural isotopic abundance samples (ESI†). Application of the optimized methods reveals that 15N imino chemical shifts of the paired guanosine are sensitive to the weakening of the H-bond for 5fC modified duplexes in comparison to DNAcontrol. The triply 5fC modified sample (DNAF3, Fig. 1A) shows a weakening of H-bonds farther than the singly modified samples (DNAF6/F8, Fig. 1A) indicating propagation of base-pair destabilization. At the same time, no discernable effect is observed for the 5mC/5hmC analog. One-bond 13C–1H scalar coupling (1JCH) measurements for sugar C1′–H1′ bonds point towards deviation from the C2′-endo pucker confined to the 5fC modified nucleotides. Structural models, obtained by employing inter-proton distances and one-bond 13C–1H residual dipolar couplings (RDCs, 1DCH), indicate that 5fC modified nucleotides’ sugar moiety samples conformations away from the C2′-endo pucker, while C, 5mC, and 5hmC containing DNA duplexes do not display any appreciable excursions. Such sugar pucker perturbations are localized to 5fC modified sites, with no additive effect arising from multiple modifications next to each other. The results highlight the impact that conformational changes due to 5fC incorporation may potentially have on protein–DNA recognition.
Firstly, the G5 imino chemical shift (associated with C8–G5 pairing) in DNAM8/H8/F8 was examined to probe the influence of modifications solely on the base pairing. G5–N1/H1 resonances shift upfield by ∼0.8/0.4 ppm and ∼0.3/0.1 ppm for 5fC and 5hmC, respectively, in comparison to unmodified C, while 5mC shows marginal downfield shifts of 0.05/0.05 ppm (Fig. 2A, B and Table S1, Fig. S3, ESI†). It is evident that amongst the C–G pairs, modification with 5fC tends to shift both G–N1/H1 resonances significantly in contrast to the control and the other epigenetic modifications. The electron donating/withdrawing characteristics of the CH3, CH2OH, and CHO functional groups present in modified cytosine are correlated to the direction of the imino 1H CSP. A chemical modification on the C alters the C[N3]–G[N1] H-bond distance which in turn causes deshielding/shielding of the G–N1/H1 spins affecting CSP relative to the unmodified cytosine.48,59 The longer (shorter) the hydrogen bond, the higher (lower) the (de)shielding of the imino group. Consequently, imino CSP is being upfield shifted for 5fC/5hmC and downfield shifted for 5mC paired G–N1/H1 in comparison to unmodified C. Prior computational studies predict a correlated change in G–N1 and G–H1 chemical shifts due to the weakening of the C–G base pair upon chemical modification of the cytosine base.48
Having assessed the effect of cytosine modification on base pairing, next is to quantify the changes that may arise due to the stacking of a chemically altered base on the 5′- and 3′-neighbors. The G7–N1/H1 resonances in DNAN8 (5′-end neighbor of C8, Fig. 2 and Fig. S3, ESI†) are downfield shifted to 0.15/0.06 ppm for 5fC, while a negligible change is observed for 5m/5hmC (Table S1 and Fig. S3, ESI†), suggesting either ring current or stacking change (or both) only for the 5fC modification. These measurements would come in handy to interpret the chemical shift perturbation for DNAN6 modifications, wherein a mere arithmetic sum of H-bonding and ring current effects would then indicate no appreciable difference between single hemi-modified (i.e., DNAN8) and single fully modified (i.e., DNAN6) cases. The magnitude and directionality of G–N1/H1 chemical shift perturbation for the C6–G7 pair in DNAN6 are in line with the observation for C8–G5 in DNAN8 sequences across all modifications (N = M/H/F). Such an observation suggests that base pairing affects the chemical shifts more significantly than the effect of modified ring current effects. Importantly, the G7–N1/H1 shifts in DNAN6 (for all modifications) show a simple arithmetic sum of chemical shift perturbation due to H-bonding and 3′ neighbor effect, indicating no significant structural changes from single hemi-modified to single fully modified systems (Table S2, ESI†).
Next, the question arises whether single versus multiple modifications cause any differential effects on the DNA duplex. Like the observation in DNAN6 systems, G5–N1/H1 and G7–N1/H1 chemical shift changes in DNAN3 (for all modifications) are simple arithmetic sums of a single fully (6th position) and hemi-modified (8th position) chemical shift. The only exception is observed with the magnitude of the G9–N1/H1 chemical shift change that arises due to inherent differences in the dinucleotide step (A vs. G). Noticeably, in DNAF3, the T10–N3/H3 and T2–N3/H3 nuclei experience a significant upfield shift to 0.25/0.13 ppm and 0.11/0.03 ppm suggesting a weakening of pairing that is two base pairs away from the sight of 5fC modification (Fig. 2A and Table S1, Fig. S3, ESI†) for the triply 5fC modified system. This observation is in agreement with complementary infra-red41 and NMR44 experiments, where the rate of duplex association is markedly reduced while that of dissociation is increased upon 5fC incorporation.
It is intriguing to comprehend the implications of the upfield shift of imino resonances of 5fC–G pairs in the context of the DNA duplex structure. Comparison of the measured shifts for the imino resonances of C–G pairs across primary sequence contexts would yield insights into how the 5fC–G pair differs from the canonical unmodified C–G pair. This was carried out by generating DNA samples consisting of trinucleotide steps in the non-terminal regions of dodecamer duplexes with C–G being the middle base pair (i.e., 5′-XY-3′ paired “•” with 5′-Y′X′-3′) flanked by canonical Watson–Crick pairs (X–X′ and Y–Y′). The first nearest neighbors to the C–G pair on both 5′- and 3′-ends were sampled across all possible trinucleotides (X/X′/Y/Y′ = A/T/G/C) resulting in 16 combinations, with a minimum of four replicates for each combination (unpublished data). The average G–N1/H1 chemical shift for all C–G pairs is observed to be 146.9/12.75 ppm (110 data points, Fig. 2C), agreeing well with the data obtained for DNAcontrol. The 5mC and 5hmC modified G–N1/H1 resonate at 147.0/12.87 ppm and 146.7/12.79 ppm, respectively, with 5 data points each across DNAM#/H# (Fig. 2B). Interestingly, for the 5fC modified base-pair, G–N1/H1 are well resolved from the entire cluster of C–G canonical pairs and resonate at 146.2/12.51 ppm (5 data points across DNAF#) – upfield shifted in both 15N and 1H dimensions (Fig. 2C). The significant average upfield shift for G–N1/H1 paired to 5fC in comparison to 5mC/5hmC and the entire C–G cluster indicates that the destabilization achieved for C–G upon formylation is beyond the scope that is achievable for any given trinucleotide primary sequence of DNA. This is an important observation given the fact that C–G pairs tend to impart stability to the DNA duplex in comparison to A–T pairs. The 5fC modification, in contrast, relaxes this property and contributes to the necessary level of destabilization beyond the scope achievable from the primary sequence, yet suitably retaining the Watson–Crick pairing that is essential for biomolecular processes.
Amino 1H spins present in the cytosine nucleobase (C–H41/H42) also corroborate the above observations. 1H chemical shifts of C–H41, which is also involved in the formation of Watson–Crick H-bonding, are relatively downfield shifted at the 5m/5hm/5fC nucleotide position. On the other hand, the chemical shift of C–H42 experiences an upfield shift for 5mC (0.30–0.40 ppm) and 5hmC (0.10–0.14 ppm), while 5fC modification results in a significant downfield shift (∼1.5 ppm). This observation supports the formation of a intranucleobase H-bond between the formyl group's carbonyl oxygen (CO) and the amino proton (H42) of 5-formyl cytosine.60 This intramolecular H-bonding of 5fC restricts formyl substituent conformation and hence forces it to be in plane with the cytosine aromatic ring, consistent with the previous reports.35 The small magnitude of chemical shift perturbation for 5m/5hmC indicates these bases do not make such type of H-bonding (CHO H-bond for instance), with prior crystallographic studies involving 5hmC containing DNA providing evidence that the orientation of CH2OH precludes such intramolecular H-bond formation with C(H42).34 Such an intramolecular H-bond excludes the interaction of water molecules at this site, which is otherwise available with the CH3 and CH2OH modifications.61
Following the characterization of 15N/1H imino/amino shifts, changes in 13C/1H were pursued for the aromatic base [C–C6/H6 and G–C8/H8]. As anticipated, C–C6/H6 was highest for the modified base due to the change in the functional group present in the 5th position, with upfield shift (3.3/0.2 ppm) for 5mC and downfield shifts for 5fC (13.3/0.9 ppm) and 5hmC (1.7/0.04 ppm) (Fig. S3, ESI†). Importantly, G5–C8/H8 nucleotide DNAF8 (5fC pair) experiences a downfield shift of 0.3/0.04 ppm (Fig. S3, ESI†), sensing the weakening of 5fC–G H-bond strength propagated by the aromaticity of the nucleobase. Next, 13C–C8 CSP of G7 in DNAN6 samples was analyzed to probe for any effects that may arise due to single contiguous modifications in the DNA duplex, versus a hemi-modified case (DNAN8). We observe a simple arithmetic sum of the H-bonding and ring-current changes manifested by the 5′/3′-neighbor (as adjudged from DNAN8) for all the cytosine modifications (Table S2, ESI†), without any exceptions. This suggests that the modifications do not confer any additive effect in terms of structural perturbations beyond the site of change. A similar observation is made when comparing 13C-C8 CSP of G5, G7, and G9 for DNAN3 samples, potentially indicating minimal changes along the major groove of the DNA duplex due to multiple contiguous modifications present in the system. Like aromatic 13C–H chemical shift perturbations, the furanose ring was most affected for the modified bases, with 5fC–C1′/H1′ nuclei experiencing the highest magnitude of 0.7–0.9/∼0.02 ppm (Fig. S3, ESI†). Although C1′ shifts report on sugar pucker equilibria,49,62 their interpretation, in this case, is affected due to the strong influence of ring current effects. Thus, furanose 13C/1H shifts are not further interpreted.
Beginning with the DNAcontrol system, we observe that the position of the cytosine in the sequence influences the magnitude of the coupling magnitude. For instance, for the cytosine nucleotide in the RG (R = purine, A or G) trinucleotide step is found to be ∼166 Hz, while 5′-T (cytosine positioned at the 5′-end of the DNA strand) averages ∼172 Hz. This is expected as conformational degrees of freedom allow 5′-terminal cytosine to sample a broader range of puckers and glycosyl torsion angles. No significant difference in (, relative to DNAcontrol) is observed for all nucleotides present in DNAM# and DNAH# within the measurement uncertainty (±2 Hz) (Fig. 3A). On the other hand, for singly modified 5fC6 (in DNAF6) and 5fC8 (DNAF8) results in an increase of 5–6 Hz, while the unmodified cytosine nucleotides within these samples show no change (Fig. 3A). All 5fC-modified nucleotides in DNAF3 also exhibit an increase of 3–6 Hz (Fig. 3A). No significant changes were observed for aromatic 13C–1H 1JCH (adenine C2–H2, pyrimidine C6–H6, purine C8–H8), indicating the reliability of the scalar coupling measurements (Fig. S4, ESI†). An increase in indicates a deviation from the C2′-endo sugar pucker as predicted from a computational study involving ribose sugars for a given anti glycosyl dihedral angle, with C3′-endo being predicted to have a coupling of 178 Hz, 10 Hz increase over the C2′-endo conditions.52 NMR data analysis across 2D spectra (NOESY, HMQC, and HSQC) of 5fC modified DNA (DNAF#) rules out any evidence of 5fC/G syn orientation. Hence, the increased of 5fC potentially arises due to the shift in sugar pucker equilibrium from C2′-endo and plausibly subtle changes in the glycosidic dihedral angle.52,66,67
The magnitude is also influenced by the 13C–1H bond distance.66 Formyl being an electron-withdrawing group might affect the bond lengths of base C6–H6 and furanose C1′–H1′ due to the resonance effect in aromatic rings. Although C6–H6 chemical shifts are most affected by C5 modifications of cytosine, Δ1JC6–H6 for all nucleobases (including modified cytosine) remains within ±2 Hz across all systems (DNAN#, Fig. S4, ESI†). And, if it was the bond distance that caused a change in , then irrespective of the position and across samples (i.e., DNAF6/F8 and DNAF3) the magnitude of change would have remained constant. The mere fact that 5fC modified in the sixth position in DNAF6 (∼6 Hz) and DNAF3 (∼3 Hz) are different suggests that the change in scalar coupling is not due to bond-distance changes. Additionally, a comparison of high-resolution (∼1 Å) crystal structures of the cytosine nucleotide (BOXGIE, CCDC 114593) and 5fC (RAKLOG, CCDC 843055) showed no substantial increase in the C1′–H1′ bond length, supporting the fact that the change is not due to change in bond length but due to other structural factors (pucker and glycosyl dihedral angle).
To put things in perspective regarding scalar coupling measurements, similar data were measured for cytosine present across trinucleotide repeats and in the 5′/3′ termini of duplex DNA (unpublished data). The presence of cytosine in the 5′-terminus observed for 5′-G and 5′-T results in 169.8 ± 0.8 and 172.6 ± 0.5 Hz, respectively, while the 3′-terminal G-3′ displays an average of 167.3 ± 1.3 Hz (Fig. 3B). Penultimate to 5′/3′-termini results in reduction (166.7 ± 1.1 for 5′-GC and 166.0 ± 1.1 Hz for TG-3′) in the magnitude with respect to the termini by 1–3 Hz. Similar measurements across the RR, RY, YR, and YY (where R = purine and Y = pyrimidine) trinucleotide steps within the “core” of the duplex resulted in 166.4 ± 1.0, 167.6 ± 1.5, 165.8 ± 1.2, and 168.2 ± 2.0 Hz, respectively, with the highest magnitude and spread of measured scalar couplings for the YY (Fig. 3B) step. The observations are thus consistent with the fact that the cytosine nucleotide tends to sample a larger conformational pool68 depending on the available degrees of freedom, with measurements reflecting the same. The increase in by 3–6 Hz suggests that 5fC modification to the RG step makes it behave like the YY step, the most conformationally flexible trinucleotide present.
To further validate the results obtained from , control experiments were performed with ribose sugars in a non-palindromic DNA duplex (Fig. 3C, reference “Chi” system) anticipated to force pucker equilibria away from C2′-endo.69,70 In this sequence, ribose sugars were strategically positioned to increase the population of the C3′-endo pucker on the cytosine nucleotide (C7). Positioning the ribose sugar in A6 (Fig. 3D, Chi6) results in an increase of of ∼7 Hz, accompanied by a decrease in ΣH1′ (H1′–H2′′) of ∼7 Hz (Fig. 3E) indicating the pucker equilibria shifting towards C3′-endo. This is validated by ribose sugar modification for Chi at positions C7 (Chi7), A6 and C7 (Chi6,7), and C4–A9 (Chi4–9), where C7 increased by 7–12 Hz (Fig. 3F), and by the disappearance of the H1′–H2′ cross peak in the DQF-COSY spectrum. Hence a change in for 5fC modified nucleotides indicates puckering away from C2′-endo by a small yet significant degree.
Measuring RDCs and correlating the measured values across DNAcontrol and modified systems (DNAN#) would aid in characterizing any global bending that may be present upon cytosine modification. To start with, a good RDC agreement (Pearson's coefficient of R2 ∼ 0.95 and RDC RMSD ∼ 1.2 Hz, Fig. 4A) was observed for concentrated (2.7 mM, uniform Nyquist NMR data sampling and conventional Fourier transform processing) and diluted (500 μM, 25% sparse sampling and compressed sensing processing) DNAcontrol samples indicating that the sparse methodology for limited concentration samples works as efficiently (within the experimental uncertainty of ∼2 Hz) as the routinely employed conventional methods.
RDCs measured for 5mC and 5hmC modified samples (DNAM# and DNAH#) correlate well with DNAcontrol (R2 in the range of 0.86–0.91 and RMSD < 2 Hz, Fig. S9, ESI†), indicating similarity in their overall structure. Strikingly, significant RDC differences are observed for DNAF6 and DNAF3 (R2 0.75–0.80, RMSD 3.0–3.5 Hz, Fig. 4B and D) but within the experimental uncertainty for DNAF8 (R2 0.88, RMSD 2.0 Hz, Fig. 4C) pointing at differences between single hemi-modified (DNAF8) and single fully modified (DNAF6) systems. Noticeably, 5fC–C1′–H1′ RDC is the only data point (indicated in pink color in Fig. 4B and D) that deviates by 6–10 Hz reduction in the correlation plot. Removal of these 5fC C1′–H1′ RDC outliers improves the correlation (R2 ∼ 0.90, RMSD′ < 2 Hz, Fig. S9, ESI†), implying only a change in the local structure for DNAF6/F3 with no apparent helical bending that is any different from DNAcontrol.
The RDC measurement also helps rule out the possibility of C–H bond length changes for the C1′–H1′ bond vector. A back-of-the-envelope calculation suggests that a ∼6 Hz decrease in RDC (given an alignment and B-DNA structure for DNA and DNAN#) requires an increase of ∼0.25 Å in the C1′–H1′ bond length, which is rather unlikely. The 5fC selective deviations corroborate with the ∼6 Hz increase in suggesting a local structural perturbation induced by 5fC plausibly due to changes in sugar pucker equilibria away from canonical C2′-endo conformation for B-DNA.
It is pertinent to note here that the magnitude of terminal 5′-T C1′–H1′ RDCs is in the range of −5 to −8 Hz across DNAcontrol and DNAN# samples (Table S1, ESI†). This scenario yet again highlights that 5fC alters the local structure in terms of pucker and glycosyl dihedral angle for the RG step; however, it does not make it as flexible as the terminal cytosine nucleotides.
Next, the characterization of the structures sampled by DNAcontrol and DNAN# was pursued using inter-proton distances and RDCs as constraints. As the number of measurements/constraints are significantly small given the total number of degrees of freedom available for nucleic acids,45 the aim here was to avoid overfitting the NMR data yet obtain a (low-resolution) conformational model for DNAcontrol and DNAN# that may highlight any differences in the DNA duplex upon modification. Also, as the modifications are in the major groove with no effect on Watson–Crick pairing, the unmodified cytosine nucleobase was refined against the measured NMR parameters for each of the DNAN# modified sequences. Thus, the measured data (inter-proton distances and RDCs, Table S3, ESI†) were supplied to refine initialized from “idealized” B-DNA geometry using the XPLOR-NIH structure refinement program77 (see Experimental methods).
Upon refinement, DNA systems studied (DNAcontrol and DNAN#) continue to sample an overall B-DNA as anticipated and predicted in previous studies (Fig. S5, ESI†).39 Notably, RDCs refine the B-DNA structure where back-prediction of RDCs measured for DNAN# with the DNAcontrol structure (and vice versa) yields experimentally derived correlations (Table S5, ESI†). It indicates that refined structures mimic conformations sampled across these modifications. Structural analysis of refined conformers was performed to determine base pairs, base-pair step parameters, sugar pucker using 3DNA, and Curves+ to determine DNA helical curvature (methods, Table S4, ESI†). Parameters that are used to define intra-basepair78 (shear, stretch, stagger, buckle, propeller, and opening) and inter-basepairs78 (shift, slide, rise, roll, tilt, and twist) and dihedral angles (backbone: α, β, γ, δ, ε, ζ; glycosidic dihedral angle χ; and sugar: ν0–ν4) follow the anticipated distribution about the canonical B-DNA geometry without any exceptions. No differences between average helical bending (within the measurement uncertainty and structural noise) and major groove widths were observed between DNAcontrol and DNAN#.
Sugar pucker analysis of the refined structures agrees with the inferences derived from one-bond scalar and residual dipolar coupling measurements. Sugar puckers in B-DNA are known to sample conformations about the C2′-endo puckers, with drifts commonly observed towards O4′-endo. This expectation is preserved for DNAcontrol and DNAM#/H# systems (Fig. 5A). Mainly, the AG ( for C4) versus GG ( 11–14 Hz for C6 and C8) trinucleotide step indicates a discernable difference in the pucker equilibria corroborating the RDC measurements for these steps in DNAcontrol (Fig. 5A).
In the single 5fC-modified systems, it is observed that the C6 nucleotide in DNAF6 shows more extensive excursions towards O4′-endo compared to DNAcontrol. In contrast, DNAF8 shows to a lesser extent, in agreement with the coupling measurements and highlights the difference between single hemi-modified and single fully modified 5fC systems. DNAF3 alters the pucker clearly for C6 and C8 away from C2′-endo, while C4, which is already at O4′-endo, is altered to a smaller extent. Additionally, pucker changes tend to affect the glycosidic torsional (χ-)angle, as observed for A- (C3′-endo, χ = −150°) and B-DNA (C2′-endo, χ = −110°). A correlation was plotted between sugar pucker and χ (Fig. 5B) for the refined DNA structures to see whether a similar effect persists upon 5fC modification. Indeed, for nucleotides C6 (DNAF6) and C8 (DNAF3), C4 is affected in DNAcontrol and DNAN# due to its presence in the AG step (Fig. 5B). In contrast, all complementary base-paired guanosine nucleotides (i.e., G5, G7, and G9) exist in C2′-endo with χ near −100°, pointing to the relative orientation between base and sugar changing locally at the 5fC site.
Further, to assess whether any correlated change occurs in the phosphate backbone due to alteration in the pucker, the phosphate backbone dihedral angles ε and ζ were measured from the refined structures to see whether BI (ε − ζ < 0) and BII (ε − ζ > 0) equilibria get affected. The correlation of the sugar pucker to ε − ζ indicates that all cytosine nucleotides in DNAcontrol and DNAN# are in BI backbone conformation (Fig. S6, ESI†), without exceptions. Indeed, the results are analyzed conservatively, as without 31P chemical shifts and scalar coupling ( and ) measurements the observations cannot be further refined/validated. Thus, 5fC modification in duplex DNA alters sugar pucker equilibria without significant changes to other conformational and structural properties.
Prior studies have pointed out that CHO (5fC) and COOH (5caC) modifications in cytosine change the pKa of the H-bond accepting N3 nitrogen atom that was predicted to cause a weakening of the H-bond for DNA duplexes.42,43,82 Computational studies performed on such modified cytosine duplex systems report that the calculated isotropic chemical shift of both the imino proton (1H) and nitrogen (15N) shows a correlated change with the increasing or decreasing H-bond distance in the C–G base pair.48 Geometry optimized and energy minimized structures of C–G pairs predict an increase in the G:N1–H1⋯N3:C distance upon varying C from 5mC to 5hmC, 5fC, and 5caC, the longest being for the 5fC–G base pair.59 Such a weakening of the H-bond is attributed to enhanced base-pair opening rates35 and increased population of single-stranded DNA.41,44 However, direct measurement of structural changes in duplex DNA upon 5fC modification would be convenient and aid in characterizing other pertinent modifications in nucleic acids.
Our results of 15N/1H chemical shifts of the guanosine base paired with the modified cytosine provide an unbiased way of assessing local structural changes. Notably, the measurements are made without the need for 15N-isotopically enriched samples, demonstrating 13C/15N chemical shift measurements to be a viable approach to studying modified nucleotides – an unexplored treasure trove in terms of epigenetics, damage/lesion, and epitranscriptomics. 15N/1H chemical shifts measured from the complementary G paired to 5mC, and 5fC modified nucleotides show significant downfield and upfield shifts, respectively, indicating the strengthening and weakening of the H-bond. In addition, the weakening of the 5fC–G base-pair propagates beyond the modification site, as reported for DNAF3, substantiating the previous findings that 5fC destabilizes the whole DNA duplex.44,82 Thus, measurement of 15N chemical shifts could proxy as an indicator of strengthening/weakening akin to the chemical exchange saturation transfer type experiments. This also explains that 5fC containing DNA templates display reduced substrate specificity of dGTP incorporation as observed experimentally.30 The insertion of dGMP opposite to 5fC is less efficient in comparison with the insertion of dGMP opposite to unmodified C, with dAMP/dTMP being more frequently misincorporated.83
DNA duplexes are known to exhibit exchange across lowly populated conformational states (such as Hoogsteen and tautomeric forms) that have been implicated in various functional roles.84–88 As G–C pair Hoogsteen pair formation requires C–N3 protonation, we speculate that lowered pKa for cytosine (4.5 units) upon 5-formyl incorporation (2.1 units) would reduce the Hoogsteen population. Also, prior studies have indicated that 5-formyl substitution could potentially drive cytosine to a lesser-known imino tautomer rather than the conventional amino form.89 To keep the three H-bonds between the G–C pair, then such a change would force the paired guanosine to sample the enol (Genol) form away from the keto form. Interestingly, the formation of Genol has been documented to shift the G–N1 chemical shift (in the context of the dG·dT wobble pair) downfield by 30–50 ppm.90,91 However, we observe for the 5fC–G pair a moderate 0.8 ppm upfield shift of the 15N–N1 paired guanosine indicating that such a tautomeric base pair formation appears less likely.
Crystal structures of the DNA duplex containing 5mC36 and 5fC37 have reported significant deviations from B-DNA. However, prior solution NMR studies refuted such claims based on NOE-based distances, indicating only subtle differences in the 5fC-modified nucleotides.39 In our studies, complementing NOEs, heteronuclear 13C/15N chemical shifts, and coupling-based measurements aid in confirming that the overall structure of 5m/5hm/5fC DNA does not deviate from that of canonical B-DNA. RDCs are effective probes for global structural perturbations and our results provide no evidence favoring the presence of E- or F-DNA forms under solution conditions. Heteronuclear scalar and residual dipolar couplings aid in capturing subtle variations in the local structure upon 5fC incorporation. Combined analysis across various NMR parameters shows that 5fC influences the local nucleotide structure in the sugar pucker and the glycosyl dihedral angle.
Contrary to common misconception, the DNA duplex embeds subtle differences on top of the uniform double-helix structure based on the primary sequence. For instance, sequence-specific variation in structure is essential for indirect DNA readout carried out by regulatory proteins.92 Conformational flexibility of DNA allows for the torsion angles to sample sparsely populated states and is often functionally relevant. Hoogsteen base pair formation for A–T and C–G pairs is a good example and is known to induce helical bending and increase the propensity of DNA damage in the Watson–Crick phase.57,88,93 Similarly, in B-DNA, 2′-deoxyriboses sugar moieties primarily pucker proximal to the C2′-endo region, transgressing to the C3′-endo conformation at 5–20% population based on the nucleobase type.94 This is not surprising given that the C2′-endo form in B-form DNA is only marginally more stable than the C3′-endo form by ∼1 kcal mol−1, with transitions occurring in the pico-nanoseconds timescale (energy barrier 2–5 kcal mol−1).68,95–97 Molecular dynamics simulation shows that C2′-endo to C3′-endo transitions occur stochastically and are uncooperative.94 Hence, individual sugar puckering is rapid and such effects cannot be directly studied by spectroscopy as they do not dramatically impact the average duplex structure. Importantly, C3′-endo conformations are more commonly observed in pyrimidine (especially for C) nucleotides than in purine.45,46 The lifetime and population of C3′-endo conformation increase to 20% for C located in the CG, CA, and TG steps compared to other dinucleotide steps, with CA, TG, TA, and CG being the most flexible steps in the DNA duplex.46,985fC exploits this unique property of C, enhances the flexibility of DNA and establishes itself as a distinct cytosine modification over the other 5mC and 5hmC. Such a facet of 5fC, in addition to weakened H-bonds, enables duplex DNA containing the modification to transiently sample locally melted and flexible states that results in faster duplex cyclization rates for 5fC in comparison to C/5mC/5hmC. The rate increases with multiple 5fC modifications in the sequence.40
It is well documented now that the chemical structure of the modifications in the 5th position of the cytosine base serves as a mode of recognition and binding of proteins.25,99–102 For instance, 5fC modification strongly interacts with transcriptional regulators, DNA repair factors and chromatin regulators.25 The CHO group present in 5fC is known to form covalent interactions with the amine groups present in proteins such as methyltransferases103 and histones.104 The motivation in our study was to interrogate the plausible effects that transcend the chemical structure and potentially drive conformational changes that modulate the properties of the double helical DNA structure. Our results unequivocally indicate that 5fC introduction into the DNA duplex results in the sampling of C–G conformations that are not accessible within any sequence context. Hence, the weakening of H-bond strength achieved due to the formyl modification in the 5fC–G pair enhances the base opening rate,35 local fluctuations,41 and double-strand DNA dissociation constant resulting in reduced DNA duplex stability44 in comparison to any possible canonical primary sequence containing Watson–Crick base pairs. This is important as transcription factors are known to exploit the weakened base pair towards recognition.105 Hence, because of base-pair wobbling around the 5fC–G base-pair, the duplex achieves an enhanced degree of flexibility. Weakening of the 5fC–G H-bond increases the probability of 5fC base flipping and un-base stacking over the other 5mC and 5hmC, which may assist TDG in recognizing. Therefore, the base flipping into the catalytic pocket of the thymidine DNA glycosylase/base-excision repair106 enzymes is plausibly facilitated.
Another factor to highlight here is the difference between epigenetic and damage modifications in duplex DNA. For instance, 1-methyadenine (m1A) is a known form of DNA damage with a methyl group inhibiting Watson–Crick pairing and facilitating Hoogsteen pairing.57 Such a modification is found to enhance local fluctuations in the millisecond time scale. In contrast, 5fC epigenetic modification enhances conformational flexibility in the faster pico-nanosecond time scale motion (as no appreciable resonance broadening is observed in the NMR spectra of DNAF#) contrasting the effect of epigenetic versus a damage (m1A) modification in the conformational landscape of DNA duplexes. This potentially underlines the fact that damage modifications that severely affect the function of DNA duplexes cause more alarming conformational changes in comparison to epigenetic modifications that play more than one given role in the biological context. A thorough structural mapping of damage and natural modifications would aid in testing/refining this hypothesis.
Data were acquired using TopSpin 3.6pl5, with sparse Poisson-Gap110 sampling scheduling done using the macro ‘nusPGSv3’ (PGS_TS3.2 distribution) obtained from the Wagner's lab (gwagner.med.harvard.edu). Two-dimensional (2D) heteronuclear correlations 13C–1H and 15N-1H were obtained using the sensitivity-enhanced adiabatic heteronuclear single quantum coherence (HSQC with 13C adiabatic pulses with water flip-back)111 and band-Selective Optimized Flip Angle Short Transient (SOFAST-) heteronuclear multiple quantum coherence (HMQC)112,113 spectroscopy, respectively, from the Bruker pulse program library. The 13C and 15N spectral widths (with carrier position) were optimized to obtain maximal resolution (64 ms t1,max) to 8 (83) and 16 (153) ppm, respectively, by spectral aliasing with minimal signal overlap/loss. The scheduling lists were generated with 5–30% (5% increments), 50%, 75%, and 95% sampling to obtain the optimum level of sampling, providing a robust measurement of chemical shifts and scalar couplings. Data were then processed using multi-dimensional decomposition114 (qMDD 2.5 v3b) followed by NMRPipe115 and analyzed using NMRFAM-SPARKY.116 The details of the performance of sparse sampling methodology to measure chemical shifts and couplings robustly and reliably are provided in the ESI.†
The 2D nuclear Overhauser effect (NOESY, 100, 150, and 200 ms mixing time) and double-quantum filtered correlation (DQF-COSY) spectra were acquired with the 3-9-19 WATERGATE water suppression scheme and uniform sampling with an inter-scan delay of 2.5 and 1.5 s, respectively.1111H–1H correlation 2D data were acquired using conventional Nyquist sampling. 1JCH and 1DCH couplings were measured for samples under isotropic and anisotropic conditions, respectively, from the frequency difference between the doublets obtained from 13C–1H 2D HSQC without decoupling in the direct detect 1H dimension.
XPLOR-NIH77 version 2.41 was used for structure refinement following a simulated annealing protocol. As DNAcontrol and DNAN# are palindromic in nature, the C2-axis of symmetry was input as a constraint. While data for the modified systems were used, the unmodified cytosine base was employed for the structure refinement protocols as a proxy for 5mC, 5hmC, and 5fC modifications, as only the trends of structural perturbations were sought from such refinements. Alignment tensor parameters (Da and Dr – the axial and rhombic components of the tensor) were optimized for the DNA duplexes based on the measured RDC datasets.54 As imino 1H shifts were observed in the characteristic 12–14 ppm region indicative of Watson–Crick base pairs, H-bond constraints were incorporated in the structure refinement protocol. Dihedral angles (except for ε and ζ angles) were constrained as described earlier. Phosphate backbone dihedral angles were not constrained to assess changes in the BI/BII populations upon modified cytosine incorporation. Fifty structures were annealed starting from the idealized B-DNA geometry, and the five structures having no restraint violations were used for further structural analysis. The number of restraints and the summary of structure refinement for each system are listed in Table S3 (ESI†).
Structural analysis of the refined conformers was performed to determine inter- and intra-base pair parameters using 3DNA,89 while helical bending was assessed using CURVES+.19 RDC comparisons (Table S5, ESI†) were generated by fitting experimental RDCs to refined DNA structures with the module calcTensor (single value decomposition for best-fitting experimental measurements to back-predicted values) present in XPLOR-NIH.77
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cp04837j |
‡ Contributed equally to this work. |
This journal is © the Owner Societies 2023 |