Open Access Article
Shuntaro
Takahashi
*ab,
Lutan
Liu
a and
Naoki
Sugimoto
*a
aFIBER (Frontier Institute for Biomolecular Engineering Research), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe 650-0047, Japan, shtakaha@konan-u.ac.jp
bFIRST (Graduate School of Frontiers of Innovative Research in Science and Technology), Konan University, 7-1-20 Minatojima-Minamimachi, Chuo-ku, Kobe 650-0047, Japan
First published on 23rd December 2025
Nucleic acids form various helical structures through base-pair formation. The most fundamental base pairing is Watson–Crick, which establishes a complementary rule in nucleic acids. According to this rule, living systems can replicate their genes to propagate them correctly to their daughter organisms. The complementary rule can be interpreted in chemistry, as the Watson–Crick base pairing is the most stable. On the other hand, non-Watson–Crick base pairings, termed mismatch base pairings, are also frequently found. Mismatched base pairings formed during gene replication lead to mutations, which can cause evolution of life or diseases such as cancer. Such metastable non-Watson–Crick base pairings are considered to be randomly occurring events, and their underlying chemistry has been neglected. However, the stability of Watson–Crick base pairs can be modulated by the environments, and sometimes non-Watson–Crick base pairs indicate higher stability than Watson–Crick base pairs. Moreover, the formation of non-Watson–Crick base pairs in the template strand creates non-duplex structures that can cause replication errors. Therefore, a quantitative study of non-Watson–Crick base pairing by changing the environments of the solutions can provide novel insights into genetic mutations regulated by chemistry-validated “non-Watson–Crick rules”. In this review, we summarise the basic and recent studies on the chemistry regulating replication by non-Watson–Crick base pairs and state how genetic mutations are chemically controllable. Furthermore, we discuss potential databases for predicting gene mutations under various solution conditions and their integration for future applications.
![]() | ||
| Fig. 1 Chemical factors for determining the stability of nucleic acids in (a) the formed duplex and (b) the construction of duplexes. | ||
Watson–Crick base pairings determine the complementarity of the duplex of nucleic acids.2 This is chemically defined as the complementarity and orthogonality of donor and acceptor hydrogen bonds between A and T (U) or G and C bases, forming A·T and G·C base pairs which are considered to be more stable than others. In cells, genes are correctly replicated by the polymerisation of nucleic acid monomers (NTPs and dNTPs) along the template nucleic acid according to Watson–Crick base pairings (Fig. 1(b)).2 The complementarity of the base pairs of the four nucleic acid bases, which is uniquely determined, results in a duplex. Nucleobase sequences can be treated as digital information in living systems because of their duplex. Living systems digitalise and store genetic information by sequencing four types of nucleotides. Consequently, genetic information is stored and replicated with extreme accuracy. Genes must have digital information because genetic information is preserved.
Base-pair mismatches or mutations occur when an incorrect nucleic acid monomer is incorporated during replication. It is indisputable that genetic mutations are the main cause of threats to human health, such as the global pandemic of viral infections as exemplified by the new coronavirus,3 as well as cancer which is one of the leading causes of death worldwide.4 Therefore, elucidating the mechanisms of genetic mutations and developing technologies to predict and control them are important issues with high social demand. Genetic mutations have long been regarded as non-digital (i.e. analogue, containing complex elements, vulnerable to noise, and unsuitable for the maintenance and transmission of information) and occur randomly in the genome. The sites of mutagenesis associated with DNA damage and repair due to radiation and other factors are considered random.5,6 However, recent studies in genome biology have begun to show that the sequence pattern of gene mutations is biased by the intracellular environments influenced by chemicals.7,8 These studies suggest that a chemically metastable state in which genetic mutations are likely to occur may arise depending on both the sequence and surrounding chemical environments. Hoogsteen base pairings, a type of non-Watson–Crick base pairing, are observed in base-pair formations comprising A and U monomers.9,10 This indicates that Hoogsteen base pairings are more stable than Watson–Crick base pairings under monomer-to-monomer conditions. Hoogsteen-type base pairs have been shown to form transiently not only at the monomeric level but also in the duplex structure.11 Furthermore, Hoogsteen base pairings have been shown to cause formation of non-duplex nucleic acid structures, such as triplexes, guanine quadruplexes (G4), or i-motif structures (iM), which regulate gene replication and gene expression.12 Thus, owing to the structural dynamics of their backbone, the digital information in nucleic acids may also encode gene mutations and higher-order gene regulation by behaving in an analogue manner. Therefore, it is desirable to elucidate the influence of chemical factors that form metastable non-Watson–Crick base pairs, rather than genomic mutations that occur randomly during gene replication (Fig. 1).
Based on this background, it is important to analyse quantitatively and understand the effects of non-Watson–Crick-type base-pair formation on gene replication at the energetic level. The structural stability of nucleic acids is influenced by the solution environments, such as crowded intracellular environments. These solution environments affect the hydrogen bonding and stacking interactions of the base pairs and change the energy levels of base-pair formation. In this article, we present quantitative analyses of the solution environment effects on nucleic acid replication reactions. We also outline the new scientific perspectives and medical engineering technologies that can be expected from these studies.
As shown in the energy diagram, the fidelity of replication depends not only on the dNTPs but also on the enzyme structure. Replicative enzymes include proteinaceous polymerases, such as DNA polymerase (DNAP), RNA polymerase (RNAP; its catalysis is mainly for transcription), and RNA-dependent RNA polymerase (RdRp). Furthermore, RNA polymerases based on RNA enzymes (ribozymes) have been developed.14 The error frequency per nucleotide of one of the original ribozymes, R18, was 4.3 × 10−2 at 17 °C,15 and some improved ribozymes, tC19 and tC19Z, showed 2.7 × 10−2 and 8.8 × 10−3 at 17 °C, respectively.16 For the proteinaceous polymerases, the reported error frequencies vary depending on the polymerase family. Polymerases that play a role in the replication of genomic DNAs include family A (e.g. T7 DNAP), and family B (e.g. T4 DNAP, human Polδ, and Polε) and family C (e.g. E. coli Pol III) without consideration of these proofreading activities show low-error frequencies ranging from 10−4 to 10−6 (e.g. 3.4 × 10−5 for T7 DNAP at 37 °C). Following these classes, DNAPs related to DNA repair reactions showed relatively lower error frequencies (for example, 1.3 × 10−4 for Klenow fragment at 37 °C and 2.1 × 10−4 for Taq polymerases at 70 °C).17 The proteinaceous DNAP show lower error frequencies than ribozymes, which suggests that the evolved proteins have relatively higher enzymatic performance than ribozymes that existed in the prebiotic RNA world. However, some DNAP, such as translesion synthesis (TLS) polymerases (e.g. E. coli Pol IV and V and human Polη, Polκ and so on) that enable bypass of DNA lesions during DNA replication showed the highest error frequencies (10−1–10−3) at 37 °C. Interestingly, SARS-CoV2 RdRp (Nsp12/7/8) has similar high error frequencies (10−1–10−3) at 37 °C,18 which could generate various mutants of the viruses in a short period.
In solutions, the efficiency of incorporation varies depending on the identity of the mismatch.19,20 Besides canonical Watson–Crick base pairs, there are eight single mismatches which occur in DNA with varying frequencies and stabilities, namely A·A, A·C (or C·A), A·G (or G·A), C·C, C·T (or T·C), G·G, G·T (or T·G), and T·T.21 The crystal structure of the complex of DNAP and substrate DNA with mismatch dNTP were solved and found that A·G, C·T, G·G, G·T, T·G and T·T (the former letter in the pair means a base at the primer end and the latter comes from the template) are placed at the post-insertion site and are well ordered.22 These structural data suggest that certain mismatched base pairs can adopt thermodynamically stable conformations. From the perspective of thermodynamics, the differences in free energy between mismatched and matched base pairs in aqueous solutions (ΔΔG° = ΔG°(mismatch) − ΔG°(match)) can be quantitatively understood as the contribution of hydrogen bonding between mismatched base pairs. The calculation approach using the nearest-neighbour (NN) parameters explains that the
ranges less than approximately 3 kcal mol−1 at 37 °C.23 For example, the GC-rich base pairs shows relatively large deviation (
), whereas the G·T wobble shows relatively small difference (
).23 The measurements of the
of the primer–template DNA strand reproducing the DNA structure during replication was also reported, in which
between strands containing a terminal mismatch and a matched terminus (ΔG°(mismatch) − ΔG°(match)) was less than 0.4 kcal mol−1 in the cases of the DNAs containing either correct (A·T) or incorrect (G·T, C·T or T·T) base pairs at the primer 3′ terminus.24 The
of less than 3 kcal mol−1 can account for one incorrect insertion for about 10 to a few hundred correct insertions, according to the assumption of the error frequency (f = e−ΔΔG°/RT; where f is the error frequency, T is temperature and R is the universal gas constant. The f value is 7.7 × 10−3 when
equals 3.).25 This error frequency corresponds well to polymerases with high error frequencies of 10−1–10−3. Thus, polymerases with high error frequencies, such as TLS polymerases and SARS-CoV2 RdRp, depend on the stability of matched or mismatched base pairs during replication. However, polymerases of the low-error frequency type should have additional and different mechanisms to incorporate (d)NTPs correctly, because the error frequency is within the range of 10−3–10−6 corresponding to 4–8 kcal mol−1 of ΔΔG° value.26 The
calculated from the equilibrium constant (Ka) of matched dNTP incorporation that form the state of Pol*·DNA+1·PPi in step 4 (Fig. 2(B)) at 37 °C shows a large negative value compared to those of mismatched dNTP incorporation (
ranged 5.5–7 kcal mol−1).26 Thus, besides the ΔΔG° value predicted and measured from the ΔG° between matched and mismatched base pairing, the environment in the polymerase active site should be considered as an additional source of ΔΔG° value for the low-error polymerisation.
As shown in Fig. 1, the stability of base pairings is affected by the environmental factors. One potential explanation for the energetic source of low-error polymerases is the importance of water exclusion from the active site and geometric selection of the (d)NTPs caused by better fitting of the incorporated (d)NTPs and primer base and/or enzyme residue.27–29 Given that the stabilising hydrogen bonds formed with the solvent are included in the calculations, the interactions from hydrogen bonding and base stacking in DNA or RNA can generate enough ΔΔG° to explain the low-error rate of the polymerase reaction. One report indicated that the non-linear analysis of the enthalpy–entropy compensation for NN parameters of DNA duplexes provides information about the solvent effect on the thermodynamic parameters.30 As shown in Fig. 3(a), the relationship between the enthalpy and entropy changes of the NN base pairs (
and
), including matched and mismatched ones, was non-linear and hyperbolic. This phenomenon implies the inclusion of solvent organisation, as observed in a report on the influence of water as a solvent in protein interactions.31 Thus,
does not change simply with the ΔH° value. To account for the effect of solvent surrounding the base pair on the thermodynamic parameters, the relationship between
and
was analysed based on a hyperbolic function by introducing the solvation-dependent constant T0 as a component of Tm,
, where a is an entropy constant. According to this relationship,
was obtained, which reproduced a better trend between
and
than that obtained by the linear progression (Fig. 3(a)). Using the database of matched NN parameters obtained in 1 M NaCl solution, the values of constants a = 80 kcal mol−1 K−1 and T0 = 273 K−1 were obtained.30 From these treatments, the parameters including solvent environments (noted as NN + e) around the base pair
can be calculated from the relationship
. For example,
of
corresponds well to ΔGinc at 37 °C.28 Furthermore,
can be calculated to be equal to
. The average
(match) was −8.33 kcal mol−1, whereas
(mismatch) was −0.31 kcal mol−1. Although the parameters were estimated using the data of 1 M NaCl solution, which is far from the physiological solution condition, the magnitude of
can account for the high fidelity of low-error polymerases. The relevance of the ΔΔG° values (obtained from NN-based thermodynamics added with solvent factor) to the
values (derived from polymerase kinetics) originates from the retention of hydrogen bonding and stacking primer–template interactions within the DNAP active site. Through induced-fit mechanisms, the active site enforces geometric selection by properly orienting the cognate (d)NTP and minimizing the entropic penalties arising from conformational and chemical transitions. In this way, the enzymes take full advantage of the different ΔH° values associated with the installation of matches and mismatches. Therefore, the fidelity of polymerases can be rationalised in terms of the thermodynamics of base-pair formation, wherein the polymerase active site modulates the energetics of hydrogen bonding, base stacking, and conformational entropy through finely tuned (de)hydration, preferentially stabilising the correctly paired bases according to the ΔG° of duplex formation (Fig. 3(b)).
![]() | ||
Fig. 3 Energetic contribution of base pairing in the polymerase active site to the fidelity of the polymerase reaction. (a) Relationship between and . The blue dotted line indicates the linear regression from the matched parameters.23 The pink line shows the fitting curve for the matched parameters by .30 (b) Schematic illustration of base pairing in the enzyme active centre. | ||
From the perspective of synthetic biology, the genetic alphabet has been expanded to develop orthogonal unnatural base pairs (UBPs) (Fig. 4(b)). One scheme to design UBPs is making de novo hydrogen bonding like Watson–Crick base pairings, reported firstly with iso-G·iso-C called S·B base pair.38,39 The other approach for UBPs is to utilise a hydrophobic interface without using hydrogen bonding-based base pairs.40–43 The first generation of UBPs based on this concept was Z·F (Z, 4-methylbenzimidazole; F, difluorotoluene), Q·F (Q, 9-methylimidazo[(4,5)-b]pyridine), 7AI·7AI (7-azaindole nucleosides) and Q·Pa (Pa, pyrrole-2-carbaldehyde). All (d)NTPs of the UBPs were efficiently incorporated into DNA and RNA polymerases, similar to the natural substrates. However, early reports revealed several limitations in the first-generation designs, including low nucleotide incorporation efficiency, poor extension kinetics and mispairing with natural bases.40,42,43 To address these issues, the second generation of UBPs such as Z·P (Z, 6-amino-3-(1′-β-D-2′-deoxy ribofuranosyl)-5-nitro-(1H)-pyridin-2-one; P, 2-amino-8-(1′-β-D-2′-deoxyribofuranosyl)-imidazo-[1,2a]-1,3,5-triazin-(8H)-4-one),44 TPT3·NaM,45 5SICS·NaM46 and Ds·Px47 have been developed to enhance catalytic efficiency and fidelity. Z·P base pairs are more thermodynamically stable than G·C base pairs, which enhances the selectivity of these base pairs.44,48 This technique has now been expanded to eight letters (Hachimoji) with S·B and Z·P base pairs.48 Other second-generation UBPs, such as TPT3·NAM and Ds·Px, achieved high selectivity using natural DNAPs (99.98% selectivity per doubling by polymerase chain reaction (PCR) using OneTaq DNAP and 99.97% selectivity per doubling by PCR using Deep Vent DNA polymerase, respectively).49,50 Similar structures have also been observed for four hydrogen bonding UBPs51,52 and other hydrophobic base pairs.53–55 Furthermore, 5-substituted pyrimidine or 7-substituted 7-deazapurine dNTPs are good substrates for DNAPs and can be used for enzymatic synthesis of base-modified DNA.56 DNA and RNA polymerases usually allow large and bulky structures to be modified on (d)NTPs for incorporation. Recent advances have enabled the incorporation of all four (d)NTPs modified with a fluorescent moiety to produce site-specific or fully modified DNA and RNA strands.57,58 Interestingly, these modified substrates are incorporated with higher fidelity than natural substrates. These studies indicate that the catalysis and fidelity of the polymerase can be regulated by the geometry of base pairing and the structure of the (d)NTPs.
Modification of the backbone of (d)NTPs is also a fascinating approach for developing novel nucleic acid systems, termed xeno-nucleic acids (XNAs), to create new nucleic acid drug modalities (Fig. 4(c)). Native DNAP and RNAP can catalyse reactions with modified substrates. For example, commercially available Therminator DNA polymerase can polymerise TNA synthesis on DNA template.59,60 However, numerous cases face difficulties because of low affinity or steric hindrance with the enzymes. Various polymerase mutants have been developed by directed evolution to incorporate efficiently XNA substrates for replication and transcription. Engineering polymerases by rational mutation on specificity determining residues improved the efficacy of TNA synthesis.61 Engineered Tgo polymerases can incorporate RNA,62 FANA,63 HNA63 and TNA.64 More recently, LNA synthesis and 2′-OMe RNA synthesis were demonstrated.65 Although the incorporation of XNA has succeeded, the error frequency of replication is on the order of 10−2–10−3,66 which is higher than that for native substrates. These findings indicate that replication fidelity does not require significant hydrogen bonding but is dominated by other factors. Crystallographic analysis suggests an imperfection in the geometry of the active site of XNA polymerase with its substrate.67 Therefore, the polymerase can accept relatively broad substrates. However, the efficiency of the polymerase reaction can affect the fidelity in the opposite direction because of the regulation of structural factors of nucleic acid stability (Fig. 1(a)). Moreover, environmental factors also affect significantly replication fidelity, as described in Section 3.
A growing body of evidence indicates that Mn2+ can positively influence some DNAPs by conferring translesion synthesis activity or altering substrate specificity. For example, Polβ, which acts as a repair enzyme of abasic sites through the base excision repair (BER) process,72 has efficient polymerase activity in both Mg2+ and Mn2+ as cofactors.73 For the cisplatin-lesioned template DNA, Mn2+ promoted an eightfold enhancement in the correct lesion bypass activity of Polβ, which is achieved through a fourfold decrease in the Michaelis–Menten constant (Km), reflecting greater substrate affinity, and a twofold increase in the catalytic rate constant (kcat).74 Similar correct lesion bypasses have been observed in the cases of the template DNA containing oxidised lesions such as methylated guanine and thymine glycol.75,76 Despite the modest enhancement observed in most cases, its effect can be significant, as lesion bypass catalysed by Polβ is intrinsically inefficient in the presence of Mg2+. These findings suggest a close relationship between efficiency and fidelity, which can be regulated by the chemistry of the DNAP's active site. Another type of chemical that affects the polymerase reaction is a denaturant of DNA structures, such as dimethyl sulfoxide (DMSO) and urea. These chemicals are widely used for efficient PCR yields from GC-rich sequences.77 Although there is no direct evidence, the attenuation of the polymerase progression caused by the secondary structure may affect indirectly the fidelity; thus, these denaturants can control the replication fidelity of highly structured templates (for example, see Section 4 about the effect of quadruplex structures on replication).
The geometric perturbation of the relationship between the primer–template structure and the active site of the polymerase affects fidelity directly. The mutation of the Klenow fragment DNAP (KF) was investigated, and various mutations around the exposed surface of the polymerase cleft near the polymerase active site, which are highly conserved residues, increased drastically the error frequency.78 The engineering of the template has also been studied by introducing variably sized atoms (H, F, Cl, Br, and I) to replace the oxygen molecules of thymine.79 Interestingly, the maximum fidelity and efficiency were found at a base pair size significantly larger than the natural size, both in vitro and in cells. Thus, a tight steric fit between the substrate and polymerase active site is favourable for high fidelity. Similar engineering of the RNA polymerase reaction was studied using hydrogen bond-deficient nucleoside analogues in the template DNA.80 This study showed that the replication fidelity depended strongly on the discrimination of an incorrect pattern of hydrogen bonds, although the efficiency did not depend on hydrogen bonding. Remarkably, the deficiency in U–T wobble hydrogen bonding increased the error frequency by ∼1000-fold. Thus, hydrogen bonding, stacking, and steric compatibility maintain fidelity highly delicately.
Although chemical perturbations can be effective, they are rarely observed in biological systems. Therefore, the biological significance of cellular metabolism remains unclear. However, if replication fidelity–related perturbations are induced by an endogenous cellular component, they may be closely associated with mismatched replication events in cells. One potential trigger is molecular crowding, an environmental factor that alters the physicochemical properties of the intracellular environments and indirectly affects the stability of biomolecules, particularly nucleic acids.
Thus, tC9Y, which can polymerise NTP and dNTP, was activated by PEG200 during both NTP and dNTP polymerisation (Fig. 5(a)).100 However, T7 RNAP lost its NTP polymerisation activity with increasing PEG200 and simultaneously polymerised dNTPs (Fig. 5(b)). For KF, PEG200 only promoted dNTP polymerisation (Fig. 5(c)). The effect of PEGs on the fidelity of each polymerase was investigated using single-primer extension. In the presence of 20 wt% PEG200, tC9Y was more efficient at adding both matched and mismatched NTPs and certain dNTPs to RNA primer G than in the absence of PEG200 (Fig. 5(d)). Enhanced electrostatic interactions between the 2′-OH and the substrate-binding site in the presence of 20 wt% PEG200 resulted in the polymerisation of mismatched NTPs. The polymerisation of dGTP likely occurred because of the thermodynamic stability of the G·A mismatch.101 For T7 RNAP, the polymerisation of template-complementary UTP was observed at higher levels in the absence of 20 wt% PEG200 than in its presence, and mismatched NTPs were also polymerised at higher levels in 20 wt% PEG200 (Fig. 5(e)). Polymerisation of template-complementary dTTP was facilitated by 20 wt% PEG200, whereas polymerisation of mismatched dNTPs did not change significantly. Therefore, molecular crowding enhanced the accuracy of T7 RNAP DNA polymerisation. With KF, the presence of 20 wt% PEG200 increased the percentage of extended primers; however, incorrect dATP, dGTP, and UTP were also polymerised, indicating lower fidelity (Fig. 5(f)). When mismatched NTPs are incorporated, primer extension by T7 RNAP along the RNA template accelerates further misincorporation.102 These results indicate that molecular crowding can affect the hydrogen bonding and stacking of dNTPs and NTPs with the template and primers, resulting in increased activity and decreased fidelity.
![]() | ||
| Fig. 5 RNA and DNA polymerisation in 0–20 wt% PEG200 by different polymerases. (a)–(c) Percentage of primers extended by (a) tC9Y, (b) T7 RNAP and (c) KF using denaturing PAGE. (f) Percentage of primers extended by KF. In the graphs, the percentage of primers extended in reactions with NTPs is indicated in green, and the percentage extension in reactions with dNTPs is indicated in blue. (d)–(f) Efficiency of polymerisation of a single nucleotide by (d) tC9Y, (e) T7 RNAP and (f) KF without PEG (white) or with 20 wt% PEG200 (black) for 12 h. The original data has been published previously.100 Reproduced from ref. 100 with permission from American Chemical Society, copyright (2019). | ||
Based on these findings, a quantitative analysis of the incorporation of dNTP along non-natural DNA templates was performed using a template containing different unnatural bases (inosine: Ino, 5-methyl-isocytosine: isoCMe, and isoguanine: isoG) and different sugars (deoxyribonucleic acids: DNA, hexitol nucleic acids: HNA, and arabinose nucleic acids: ANA) (Fig. 6(a)).104 Although dNTPs were non-cognate substrates against the unnatural nucleobases on the template, KF preferred to polymerise a certain dNTP. The efficiency of replication and fidelity were negatively correlated, which differed in the presence of PEG200 (Fig. 6(b)). The polymerisation trend indicated the high efficiency of the incorporation of preferred pyrimidine dNTPs with low fidelity (high error) but the low efficiency of the incorporation of preferred purine dNTPs with high fidelity (low error). However, in the presence of 20 wt% PEG 200, the efficiency of incorporation of the preferred pyrimidine dNTPs decreased, whereas that of the preferred purine dNTPs increased, resulting in similar efficiencies despite the chemical structure of the templates. These findings indicate that the preferred pyrimidine dNTPs depend on hydrogen bond formation, which is destabilised by molecular crowding due to decreased water activity. However, molecular crowding facilitates the incorporation of preferred purine dNTPs through base-stacking interactions. More importantly, molecular crowding can affect hydrogen bonding and base-stacking interactions in the base pairs of the incorporated natural dNTPs and the nucleobase of the unnatural template, which occurs in the active centre of the reacting DNAPs. These studies indicate that the fidelity of polymerase reactions, which is maintained by the chemistry of base pairing in the active site of the polymerase (Fig. 4), can be regulated by the environments of the solution. The solution environments can influence dynamically the global structures of DNA and RNA, affecting the processivity of polymerases and the fidelity and efficiency of polymerisation. Therefore, in the next section, we discuss the effect of the template strand on polymerase reactions.
![]() | ||
| Fig. 6 Effect of molecular crowding on efficiency and preference of replication along XNA templates.104 (a) Setup of the structures of XNA template for the primer extension assay. (b) Plots of the efficiency versus preference of the primer extension by KF in the absence (blue plots) and presence (red plots) of PEG 200. | ||
More recently, the G-quadruplex (G4) and i-motif (iM) have been identified as regulators of gene replication and expression.113 G4 is a tetraplex structure composed of a repeat of guanines assembled via Hoogsteen base pairs, whereas iM is formed by two intercalated parallel hairpin structures from the hemi-protonated C–C base pair. The potential forming sequences of these structures are briefly denoted as (GnXm)4 or (CnXm)4 (X: any base, n ≥ 2, and m ≥ 1). As these sequences are frequently found in the genome and are relatively rich in the promoter region of genes, their roles in cells may be more general than those of triplet repeats. The most interesting aspect of G4 and iM is the stabilisation mechanism of these structures. G4s have specific sites between the G-quartets for the binding of Na+ and K+ to stabilise the structure.114 iMs require the protonation of cytosine and thus prefer acidic conditions to form.115 Moreover, molecular crowding facilitates the formation of G4s and iMs due to the water exclusion and compaction effect of the cosolute.82,83 Hence, these structures are highly environment-dependent in solution (see our previous reviews82,83). Therefore, G4s and iMs form depending on the cellular environments and regulate dynamically gene manipulation in cells via environmental factors affecting nucleic acid stability (Fig. 1(a)).
Regarding the effects of G4 and iM on replication, these structures on the template strand prevented DNAPs from undergoing a smooth processive reaction (Fig. 7(b)). This replication stall can cause double-strand breaks (DSB) during replication (Fig. 7(c)). DSBs formed at stalled forks are typically repaired by homologous recombination (HR) or, occasionally, break-induced replication (BIR). However, failure of the repair process causes mutations and/or lesions in genomic information and results in genomic instability.116 Replication stall is directly related to the frequency of formation and resolution of the G4/iM structure. Therefore, thermodynamic stability (
) should be one of the dominant parameters for the fidelity of replication along the G4/iM forming template.117 The stability of these structures can be tuned using ligands that specifically bind to G4/iMs. To stabilise G4 structures in human cells, various G4 binders have been tested. As expected, G4 stabilising ligand further inhibits the processivity of the polymerase, which promotes genomic instability.118 Therefore, the efficiency of genomic instability can be described by the
of the G4/iM. However, some relatively unstable G4s containing only two G-quartets on the leading strand also caused genomic instability, suggesting that there is another factor that determines genomic instability by G4, independent of its thermal stability.118 To investigate the mechanism of replication stall by G4/iM, a quantitative analysis of the replication stall is required.
000 G4 DNA sites in human B lymphocytes. Further analyses were subsequently conducted in several species, and the formation of G4 DNA was confirmed in all 12 species analysed.123 This approach is powerful for identifying G4 formation in the genome with a stability dependency of the structures. Conventional PCR has also been used to evaluate the frequency of G4 formation from the delay in DNA amplification by G4 formation.124 Recently, a high-throughput primer extension assay was developed to quantitatively measure how DNAPs stall at over 20
000 short tandem repeat (STR) sequences.125 In this study, without relying on prior secondary structure predictions, structured DNA motifs, such as G4s and hairpins, caused distinct replication-stalling patterns. Persistent stalling correlates with reduced STR expansion, suggesting that polymerase stalling in structured DNA serves as a natural constraint on repeat expansion, which is related to genomic stability and repeat expansion diseases. As shown in these studies, the polymerase stop assay is a useful technique for evaluating the formation of G4 and is related to the biological response from a thermodynamic point of view.
A more quantitative study using the polymerase stop assay could elucidate the mechanism of gene regulation by G4/iM formation using physicochemical approaches. To study replication stalls, we quantitatively investigated how G4/iMs affect the replication efficiency of DNA strands with different sequences showing topological differences, including anti-parallel G4, hybrid G4, parallel G4, iM, and hairpin.126 The iM derived from the Hif1a gene, a cancer-related gene, is stably formed with a stability (
) of 3.1 kcal mol−1 at pH 6.0. The replication rate constant required to overcome the stall and complete the reaction (ks) was 0.39 min−1 at 37 °C. In contrast, the G4 from human telomeres showed similar stability (
) but a larger ks of 2.6 min−1. Moreover, the hairpin structure with a relatively higher stability (
) showed a much larger ks of 3.7 min−1. To analyse quantitatively the effects of the stability and topology of the DNA structure on replication efficiency, we developed a method called “quantitative study of topology-dependent replication (QSTR)” to determine a phase diagram of the replication rate vs. G4 stability and to reveal replication properties depending on the template DNA topology (Fig. 8(a)).126 When QSTR plots were generated from the results of various structures with different stabilities and topologies, including G4s, different linearity plots were obtained depending on the topology (Fig. 8(b)). Because the activation free energy ΔG‡ of the reaction is expressed by –RT
ln
ks, the linearity of the plots indicates that the ratio of ΔG‡ and
is the same for replication when DNAs with the same topology is replicated via the same unfolding mechanism of non-duplexes. The slope of the QSTR plot indicates that iM and the anti-parallel and parallel G4s had the greatest effect on replication stalling among the tested structures. However, this trend in topology-dependent replication changes dramatically under crowded conditions (Fig. 8(c)). The human telomere G4, transformed from a hybrid to a parallel topology under crowding conditions, effectively repressed replication, as observed for iMs in the absence of crowders.
![]() | ||
| Fig. 8 Quantitative study of topology-dependent replication (QSTR) and its applications. (a) Parameters of the QSTR plot targeting the stability of the non-duplex (−ΔG°) and the activation energy required for polymerases to overcome the structure (ΔG‡). (b), (c) Plots of QSTR showing the relationship between the stability of the non-double helix structure and replication efficiency in the absence (b) or presence (c) of crowders at pH 6.0.126,127 The topology and dynamics dependency of replication inhibition is indicated by the difference in the slope of the QSTR plots. All the replication reactions were performed at 37 °C. | ||
Molecular crowding also affects dynamics of G4 and iM, regulating replication stalls. In the presence of 20 wt% PEG1000, the replication of iMs was effectively repressed (Fig. 8(c)). To study the effect of iM dynamics on the different responses to PEGs, MD simulations and NMR were conducted to investigate the structural changes in iM.127 As a result, MD simulations elucidated that the twisting of the iM strand was affected by the PEG size. This indicates that the twisting reaction, estimated to occur on the order of microseconds or less from the NMR and MD simulation data, affects the polymerase reaction, which should be slower than the twisting motion of iM. This may be because dynamic changes in DNA cause changes in the mobility and direction of motion, which perturb DNA recognition by the protein.128 These results suggest that the twisting dynamics triggered by molecular crowding increase or decrease the energy barrier for polymerase-mediated iM recognition, regulating the subsequent iM unfolding process. Therefore, each crowding condition differentially regulates the processivity of DNAP along a template DNA based on the activation free energy for unwinding by altering the stability and topology of the DNA structure. These energetic treatments provide an index for quantitatively interpreting the effect of the environments on gene replication and expression, depending on the stability and topology of the template DNA.
For ligand-based assays, the QSTR method provides unique information about G4/iM binders. We found that the plant flavonol fisetin bound specifically to the iM derived from the promoter region of the human VEGF gene.135 This binding affected dramatically the photoinduction of the excited-state intramolecular proton transfer reaction, which significantly enhanced the intensity of the tautomer band of fisetin fluorescence.136 This unique response was due to the coincidence of the structural change from the iM to the hairpin structure by putative Watson–Crick base-pair formation between some guanines within the loop region of the iM and cytosines. The QSTR plot indicated that the replication property of iMs (Hoogsteen-type) shifted to that of hairpins (Watson–Crick-type) by fisetin. The VEGF iM did not block replication in the presence of fisetin, indicating that fisetin inhibits VEGF gene expression by altering the secondary structure of DNA from Hoogsteen to the Watson–Crick type.
The QSTR technique has also been used to design G4 binders rationally. Various compounds were analysed by systematically changing their functional groups. The QSTR plots suggested a relationship between the functional group on the G4 binder and its effect on both replication stall and G4 stabilisation. The systematic QSTR data could provide the design of specific binding to the human telomere G4, in which the naphthalene diamide compound binds simultaneously to the G-quartet surface and loop region.137 The newly designed compound had drastic stability and replication stall effects compared to the original compound.137
In another study, we investigated the chemical recovery of G4 formation using oxidised human VEGF. Guanine bases in G4 are sensitive to oxidation, which results in their transformation to 8-oxo-7,8-dihydroguanine (8-oxoG). Because G4 formation regulates the expression of some cancer genes, 8-oxoG in a G4 sequence may affect epigenetic modifications of the genome and cancer progression.138 We found that 8-oxoG-containing G4 derived from the promoter region of the human VEGF gene had a different topology from the unoxidised G4 structure and did not block replication, as shown in the QSTR plots.139 To recover the G4 function, we developed an oligonucleotide comprising a pyrene-modified guanine tract (5′-pyrene-UGGGT-3′) to replace the oxidised guanine tract and form stable intermolecular G4s with other intact guanine-tracts.139 The QSTR plots indicate that the function of G4 to stall replication was recovered by the modified oligonucleotide. As shown here, the unique point of QSTR is the discovery of the effect of G4/iM on replication, depending not only on the stability (–ΔG°) but also on other factors, such as the structural dynamics of the polymerase-G4/iM complex. Therefore, these quantitative outputs can be used to understand the dynamic regulation of replication by G4/iM and to develop novel materials that control the dynamics of G4/iM for specific biological behaviours of gene expression.
This article highlights that genomic manipulation is dominated by the steric structure and structural stability of nucleic acids, which are determined by a combination of structural factors, such as hydrogen bonding, stacking interactions between base pairs, and the conformational entropy of the backbone, and environmental factors, such as cations, hydration, and molecular crowding. These stability factors have been individually investigated; however, a comprehensive understanding of the energetic contributions to replication efficiency and fidelity remains unclear, particularly the contribution of conformational entropy. Interestingly, the polymerase active site in the current era is not tightly packed with substrate DNAs, although the fidelity and efficiency depend on the packing.79 This implies that living systems maintain room for conformational entropy to differentiate the functions of biomolecules, including DNA, RNA, and proteins. Understanding the energetic contribution of each factor to the replication process will open up new avenues in the field of genomic mutagenesis and functional materials.
| This journal is © The Royal Society of Chemistry 2026 |