Open Access Article
Luciana M.
Oliveira‡
a,
Adam S.
Long§
b,
Tom
Brown
c,
Keith R.
Fox
b and
Gerald
Weber
*a
aDepartamento de Física, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil. E-mail: gweberbh@gmail.com; Fax: +55 31 3409 5600; Tel: +55 31 3409 6616
bSchool of Biological Sciences, University of Southampton, Life Sciences Building 85, Southampton SO17 1BJ, UK
cDepartment of Chemistry, University of Oxford, Oxford, UK
First published on 23rd July 2020
Unlike the canonical base pairs AT and GC, the molecular properties of mismatches such as hydrogen bonding and stacking interactions are strongly dependent on the identity of the neighbouring base pairs. As a result, due to the sheer number of possible combinations of mismatches and flanking base pairs, only a fraction of these have been studied in varying experiments or theoretical models. Here, we report on the melting temperature measurement and mesoscopic analysis of contiguous DNA mismatches in nearest-neighbours and next-nearest neighbour contexts. A total of 4032 different mismatch combinations, including single, double and triple mismatches were covered. These were compared with 64 sequences containing all combinations of canonical base pairs in the same location under the same conditions. For a substantial number of single mismatch configurations, 15%, the measured melting temperatures were higher than the least stable AT base pair. The mesoscopic calculation, using the Peyrard–Bishop model, was performed on the set of 4096 sequences, and resulted in estimates of on-site and nearest-neighbour interactions that can be correlated to hydrogen bonding and base stacking. Our results confirm many of the known properties of mismatches, including the peculiar sheared stacking of tandem GA mismatches. More intriguingly, it also reveals that a number of mismatches present strong hydrogen bonding when flanked on both sites by other mismatches. To highlight the applicability of our results, we discuss a number of practical situations such as enzyme binding affinities, thymine DNA glycosylase repair activity, and trinucleotide repeat expansions.
Mismatches can occur in genomic DNA and are produced by a range of factors, such as replication errors,2 misincorporation3 and cytosine methylation.4 When they do occur, they are checked and corrected by an extensive array of repair mechanisms.5 However, if left uncorrected they give rise to mutations. A central aspect for repairing a mismatch defect is its recognition by specialized enzymes such as MutS,6–8 Msh2–Msh6,9,10 and Rad4/XPC.11 Mismatch recognition is also known to be important for base pair substitution in Cas9-induced DNA breaks.12 Furthermore, mismatch recognition can be performed by a substantial number of small organic molecules and metal complexes with the potential for acting as drugs.13 In most cases, the efficiency of mismatch recognition depends strongly on the type of mismatch as well as on its neighbouring base pairs.14,15 Similarly, mismatch repair may depend on the type of flanking base pair. For instance, the thymine excision efficiency of GT mismatches, due to thymine-DNA glycosylase, has a well known dependency on the type of base-pairs neighbouring GT.16–18
Evaluating the dependence of the thermal and structural properties of mismatches with nearly all possible nearest neighbours, which we will refer to as the context, is a challenging problem. Few theoretical models can deal simultaneously with the large amount of sequences that would cover that many mismatch contexts. These models need to be computationally efficient which requires a considerable level of simplification. For instance the nearest-neighbour (NN) model is simple enough to be numerically efficient, but does not provide the desired level of structural information. However, mesoscopic models have a comparable numerical efficiency to NN models, yet can provide details on intramolecular interactions.19,20
In the NN model, the parameters for single mismatches can be derived from a relatively small set of melting temperatures21–24 and are generally sufficient for melting temperature prediction. However, in general, NN models provide little insight into the detailed intramolecular interactions. More elaborate models, such as mesoscopic models,19,20 can provide some information about intramolecular interactions, but they require a much more complete and diverse set of melting temperatures. Therefore, the existing set of published melting temperatures has been insufficient for applying mesoscopic models to the task of providing information on hydrogen bonds and stacking interactions. Larger sets for specific conditions do exist as for instance, a microarray probe set by Hooyberghs et al.25 with single mismatches. However, hybridization to immobilized probes in microarrays can affect the melting temperatures,26 and to date there are no validated mesoscopic models for this experimental situation.
Early studies on melting temperatures established that guanine mismatches (GT, GG, and AG) are the most stable and cytosine (AC, CC) the least stable base pair.27–29 Werntges et al.,30 using melting temperatures, classified mismatched base pairs as wobble pairs (GT, GG, AC, AA and AG), open pairs (TT, CT, TC and CC) and weak base pairs (GT, AC, and AG). Tandem AG mismatches were found to be particularly stable,31 and the influence of flanking base pairs and the terminal position is also well established.32–35 In an important work, a larger set of melting temperatures was published21–24 covering all eight single mismatch types, including a few sequences from previous studies.28,36 Further thermodynamic studies focused on terminal mismatched base pairs,37 changes in buffer conditions,38–41 and the use of scanning differential calorimetry (DSC).42–44
While melting temperatures are mostly calculated with NN-type models,21–24 it is possible to use more elaborate statistical physics approaches such as the Peyrard–Bishop (PB) model.19 The PB models use simple potentials for the basic intramolecular interactions. Specifically, a Morse potential is used to describe base-pair dependent interactions, which are mostly hydrogen bonds, and an elastic potential that mimics the stacking interactions. Both are effective potentials, that is, they cover all interactions that are either base-pair or nearest-neighbour dependent. The model Hamiltonian describes the energy contributions of these two potentials and is used to evaluate the classical partition function over all possible DNA configurations.45 We showed that it is possible, with suitable parameters, to derive an index from the partition function which can be used to calculate melting temperatures.19 We demonstrated that it is also possible to run the procedure in reverse, starting from melting temperatures to calculate model parameters.20 The parameters obtained from melting temperatures in this way were shown to be consistent with existing knowledge of hydrogen bonds and stacking interactions in a number of situations: RNA,46 GU mismatches in RNA,47 deoxyinosine,48 and more recently DNA–RNA hybrids.49 In all cases the temperature-derived parameters reproduced all the main characteristics of these nucleic acids.
Here we report the measurement and mesoscopic analysis of 4096 fixed length sequences where the central three base pairs cover all possible combinations of the bases A, C, G and T. Therefore, as well as the 64 sequences that only contain the canonical base pairs AT and CG, we include 576 single, 1728 double and 1728 triple mismatches. Single and double mismatches are covered in all nearest-neighbour contexts, except for terminal mismatches which are not considered here. The mesoscopic analysis is performed with the Peyrard–Bishop (PB) statistical physics model using microscopic potentials, taking into account the effective hydrogen bonding and stacking interactions.45 Despite the computational efficiency of the PB model calculation,46 the large number of context dependent nucleotide configurations, the large number of sequences and the large number of parameters to evaluate has required a considerable computational effort. The final parameters, are comprised of 440 Morse potentials and 3084 elastic constants which can be related to hydrogen bonds and stacking interactions, respectively. The analysis of these parameters has confirmed several well-known properties of DNA mismatches, but has also yielded numerous unknown results, of which perhaps the most intriguing one is the surprising stability of several mismatch triplets. In terms of stacking parameters we observed very large interactions for the well-known GA–AG mismatches in a sheared stacking conformation. To exemplify some possible applications of the new parameters we evaluated triplet mismatches related to enzyme binding, trinucleotide repeats, and thymine DNA glycosylase, and found positive correlations with experimental results.
The main components of this model are the hydrogen bond represented by a Morse potential,
| V(yi) = Di (e−yi/λi − 1)2 | (1) |
![]() | (2) |
The two potentials, eqn (1) and (2), are combined into the configurational part of the Hamiltonian
| U(yi,yi+1) = w(yi,yi+1) + V(yi). | (3) |
The two main model parameters are the Morse potential depth D and the stacking constant k. Fig. 1 shows an example of these interactions, represented by their main parameters, for the case of a double mismatch CC and CT. Eqn (3) is summed over all base pairs N in the partition function
![]() | (4) |
The partition function, eqn (4), is used to calculate an adimensional index τ which can be correlated to experimental melting temperatures, as described in the next section. For the integration of the partition function (see for instance eqn (14) of ref. 51) we used 400 points over the interval ymin = −0.1 nm to ymax = 20.0 nm, and a cut-off of P = 10 from eqn (22) of ref. 51. The calculation of the thermal index τ is carried out at 370 K. Please note that this temperature is unrelated to the temperatures obtained from the regression method. For further details on the model implementation please see ref. 20,51 and 52.
For the analysis of some particular sequences we calculated the average displacement 〈ym〉, at the mth position in the sequence which is obtained from
![]() | (5) |
, resulting from the tentative set of parameters P, is then obtained from the following linear equation,![]() | (6) |
| 5′-CGACGTGCN1N3N5ATGTGCTG-3′ |
| 3′-GCTGCACGN2N4N6TACACGAC-5′ |
| 5′-N1BN3-3′ |
| 3′-N2PN4-5′ |
The context represented by the flanking base pairs N1N2 and N3N4 will be given by an index, say α, such that BPα means a BP base pair in this particular context, and its Morse potential will be represented by DBPα. For example, in the three base pair sequence 5′-ATG-3′/3′-TGT-5′, the central GT mismatch will be labelled as GTav (BP = GT, α = av), according to the rules laid out in the ESI Table S2.†
850 parameters would be required, exceeding by far the number of available sequences. To reduce this excessive number of parameters, while still considering context-dependence in broader sense, we regroup similar context base pairs into a single index as follows| N1BN3/N2PN4, N2BN3/N1PN4, N2BN4/N1PN3, N1BN4/N2PN3, N1PN3/N2BN4, N2PN3/N1BN4, N2PN4/N1BN3, N1PN4/N2BN3 |
N3/N2
N4}, with the central base pair underlined for clarity. To exemplify, all the context trimers of base pair GTah (BP = GT) are grouped together as{C a/G g} = Cca/Gtg, Cta/Gcg, Gca/Ctg, acC/gtG, acG/gtC, atC/gcG, atG/gcC, gcC/atG |
In this way, we need to consider only 440 Morse potentials each in their specific context. Due to symmetry considerations, context groups may contain less than 8 trimer contexts, or even just one as for instance aaa/aaa (AAa). The complete set of context dependence groups is shown in ESI Table S2.†
| N1B1B2N3/N2P1P2N4 |
In a similar way as for BP context groups, we will combine all possible NN contexts into a group
and
. In this way, the 440 different BP groups result in 3084 NN context groups. The NN groups will be represented in square brackets, underlining two base pairs to distinguish them from the BP group notation. For example,
refers to the intersection of the {A
c/T
t} and {C
c/G
c} trimer groups related to CCy–CTe nearest neighbours, see Fig. 2. Note that in some cases, when one of the base pairs is canonical, without its own trimer context, the NN context reduces to a group of trimers, as for instance
for ACau–AT. Note that the group
does not contain all trimer of ACau {A
g/T
g} but only the ones where the nearest neighbour is AC–TA. The NN context groups are given in ESI Table S3.†
![]() | ||
| Fig. 2 Schematic diagram of context groups, exemplified for the same double mismatch of Fig. 1. BP context groups are shown in the upper part and respective Morse potentials are displayed in the same colour. Similarly, NN context groups are shown in the lower part and their associated stacking potential constants k are colour coded accordingly. | ||
For each tentative set of model parameters Pj we calculated the predicted melting temperatures
, eqn (6), and compared them to the experimental temperatures Ti. The model parameters (Pj) were then varied until we minimized the squared differences
![]() | (7) |
The minimization was implemented numerically by the Nelder–Mead or downhill simplex method,20 using eqn (7) as objective function and finding its minimum in the multidimensional space represented by the model parameters Pj. Due to the large number of possible mismatch contexts the minimization procedure of eqn (7) was carried out in several separate minimization rounds, as will be discussed in the next sections.
We also refer in this work to an average melting temperature deviation
![]() | (8) |
| pi ∈ [(1 − f)si, (1 + f)si] | (9) |
600 °C2, corresponding to 〈ΔT〉 = 2.35 °C. After optimization we obtained χ2 = 11
000 °C2 and 〈ΔT〉 = 1.20 °C. Here we performed 400 minimization rounds, in parallel, which required a total of 4.3 years single-processor equivalent computing time operating at 2.6 GHz.
![]() | (10) |
The logarithmic rank, shown in Table 1, provides a simple measure to evaluate the thermal stability trend
| GA ≈ GT > GG > AG > TG > AA ≈ TT > AC ≈ TC > CA > CT > CC |
| BP | R | 〈T〉 ± std(T) | BP | R | 〈T〉 ± std(T) |
|---|---|---|---|---|---|
| GC | 6.1 | 66.1 ± 1.9 | CG | 6.3 | 65.8 ± 2.2 |
| AT | 6.8 | 64.1 ± 1.8 | TA | 7.0 | 63.8 ± 1.8 |
| GA | 8.3 | 60.2 ± 2.4 | GT | 8.3 | 59.9 ± 2.2 |
| GG | 8.5 | 59.9 ± 3.0 | AG | 8.6 | 59.7 ± 3.0 |
| TG | 8.7 | 59.2 ± 2.8 | AA | 8.9 | 57.8 ± 1.9 |
| TT | 8.9 | 57.9 ± 2.0 | AC | 9.2 | 57.1 ± 2.2 |
| TC | 9.2 | 56.8 ± 1.8 | CA | 9.3 | 56.7 ± 2.7 |
| CT | 9.4 | 56.4 ± 2.2 | CC | 9.8 | 54.7 ± 2.6 |
Having 15% of all possible single mismatches with higher melting temperatures than the canonical ATT/TAA has important consequences for applications such as SNP detection, which rely critically on the ability to distinguish mismatches and canonical base pairs.56,57
![]() | ||
| Fig. 4 Calculated Morse potentials for CI (red bullets) type base pairs. The dashed grey line is the value of the Morse potential of the canonical AT base pair. See also ESI Table S4.† | ||
The CI stacking parameters k, eqn (2), are shown as a heat map in Fig. 5. The rows and columns were ordered in such a way that the rows with largest sum of k are displayed towards the top, and the columns are ordered with the largest sum of k displayed to the right. In this way, most of the largest k values are clustered towards the top-right corner of the heat map, highlighted by the dashed area in Fig. 5. Several of the stacking parameters are much larger than typically found for non-mismatched base pairs, but there is also an equally large amount of very low stacking parameters, as represented by the many dark-red boxes in Fig. 5. For the larger stacking parameters, one case that stands out is the single blue box in Fig. 5, which is GA–AG with 17.3 eV nm−2. In general, nearest neighbours involving AG mismatches are amongst those with highest stacking interactions. We will return to this later when we analyse the context dependent (CD) results. It is interesting to note that there are very few cases of large stacking parameters involving canonical base pairs.
![]() | ||
| Fig. 5 Heat map of stacking interactions k of CI nearest-neighbours in form BP1–BP2, that is, first base pair followed by second base pair. Lower case letters refer to mismatched base pairs. The matrix was ordered by row and column such that the highest values are clustered in the top-right corner of the map, represented by the dashed box. Note that the matrix is symmetrical towards the antidiagonal (bottom-right to top-left), for instance aa–gg is the same as gg–aa, therefore we left the lower part empty for clarity. The actual values are shown in ESI Table S5.† | ||
![]() | ||
| Fig. 6 Context dependent (CD) Morse potentials for (a) AA, (b) CC, (c) GG and (d) TT mismatches. Shown are only those Morse potentials that deviate by more than 30% from the seed CI potentials, shown as dashed grey lines. Also shown are the transition/transversion characteristic and the BP context group α. Colour coding is as follows: vvv (black); tvv and vtv (brown); tvt and ttv (blue); Tvt, Tvv and TTv (red to orange); TvT and TtT (green to lime). The complete set is shown in ESI Fig. S1–S4 and ESI Table S6, and the full context groups are given in ESI Table S2.† | ||
![]() | ||
| Fig. 7 Context dependent (CD) Morse potentials for (a) AC, (b) CT, (c) AG and (d) GT mismatches. Shown are only those Morse potentials that deviate by more than 30% (panels a–c), or 50% for panel (d), from the seed CI potentials, shown as dashed grey lines. Colour coding for the transition/transversion characteristic is as follows: vvv (black); tvv and vtv (brown); tvt and ttv (blue); Tvt, Tvv and TTv (red to orange); TvT and TtT (green to lime). The complete set is shown in ESI Fig. S5–S8 and ESI Table S6.† | ||
The majority of the 440 calculated mismatch Morse potentials are quite small, 314 are smaller than 10 meV, of which 209 are less than 5 meV. This is to be expected, given that the general nature of the mismatches is to destabilize the duplex. In addition, this low potential is essentially equivalent to a flat Morse potential, giving support to studies using the PB model which had considered a flat potential for mismatches.59,60
The potentials exceeded an AT-like value of 25 meV only in 19 cases, of which 5 are in the region of CG-like potentials. However, the result that stands out very clearly is that almost all the largest Morse potentials occur for mismatches that are flanked on both sides by other mismatches. The single exception is GT where the largest Morse potential is flanked by canonical base pairs, Fig. 7d.
In Fig. 6 we show the potentials for like-with-like type mismatches, which are all transversions. For three of these mismatches the highest potentials are of type vvv, that is a transversion flanked on both sides also by transversions, and one case of type tvv. In all these cases the Morse potentials are substantially larger than the seed CI parameters. In several cases the Morse potentials are very strong, as for instance in {c
c/c
t} context, Fig. 6c, which exceeds 60 meV and is almost equivalent in strength to an ordinary CG base pair. Note that this does not mean that such a cgc/cgt trimer is particularly stable as a whole, but that a GG mismatch when surrounded by a CC and CT has by itself a large potential. Consider for example the duplex ACGCA/TCGTT, the GGaq mismatch, {c
c/c
t}, with Morse potential depth 65.6 meV, which is flanked by CCag with 0.800 meV and a CTau with 0.620 meV. These results indicate that in many cases, in a trimer consisting only of mismatches, the central mismatch appears to have enough freedom to arrange itself in a very stable configuration when flanked by highly unstable mismatches. Indeed, there is only one case, the {A
C/T
G} context, where the GGx mismatch has an increased Morse potential of 27.4 meV, while being flanked by two canonical base pairs, see Fig. 6. A very similar conclusion can be drawn from Fig. 7 where we show AC, AG, CT and GT. Here, AC and GT are the only two transition base pairs. For GT in particular, Fig. 7d, we find the only case {A
C/T
G} where the highest potential is not flanked solely by mismatches, though its value of 17.5 meV is still comparatively small. For the transversions CT and AG, Fig. 7b and c, we observe again that the vvv patterns have the highest potentials.
The analysis of 3084 different stacking potentials is difficult, not only because of the large number of parameters, but also for the complexity introduced by the context dependence, see Fig. 2. Here we employed a similar analysis as for the CI parameters, but using the average and standard deviation of groups of stacking parameters. We collected all parameters matching the pattern notation BP1α–BP2β as set out in the Methods section, and then calculated the average 〈k〉 and arranged these in a BP1 × BP2 matrix as shown in Fig. 8. Similarly, we calculated the associated standard deviation std(k) which is shown in Fig. 9. The largest average stacking interaction is for GA–AG of 21.8 eV nm−2 shown by a single blue box in Fig. 8, and its standard deviation is in the region of 6.7 eV nm−2 in Fig. 8.
C/G
G} and {A
C/T
G} were found to have a single hydrogen bond61,62 which compares to Morse potentials of 4.66 meV and 7.73 meV, respectively. Other contexts do not indicate substantial hydrogen bonding, for instance the context {A
a/T
g} with 4.70 meV, was found not to form hydrogen bonds.63 MD of poly-dA duplexes has suggested several possible configurations for the AA mismatch with either Watson–Crick or Hoogsteen hydrogen bonds.40 In our case the AA mismatch flanked by other AAs, {a
a/a
a}, resulted in a small potential of 5.62 meV. The smallest Morse potential found for AA was 0.989 meV, in the context {A
g/T
g}, which clearly suggests the absence of any important hydrogen bond or other localized inter-strand interactions. At the other extreme there are several situations where two hydrogen bonds would seem plausible, for example with {c
c/c
t}, an AA mismatch flanked by CC and CT, with 44.9 meV, see also Fig. 7a. We not aware of any experimental results for an AA mismatch in this particular context.
C/T
G} we obtained 2.29 meV, which is supported by reports of negligible bonding.64,65 DFT calculations also suggest weak hydrogen bonding due to steric repulsion.58 On the other hand, there are some reports of single24 and double hydrogen bonds66 for {A
C/T
G}, which is in contrast to the other experimental results. Other contexts, such as {A
A/T
T} (2.26 meV) and {C
C/G
G} (2.62 meV) were also reported as negligibly bonded.65 In general the Morse potentials for CC shown in Fig. 6b clearly show a complete absence of any sizeable hydrogen bond strength, except for a moderate potential of 19.2 meV {a
g/g
g} and 12.3 meV {g
g/g
t}, in both cases flanked by a GG mismatch. Also note the common gcg motif in this case.
C/G
G}67,68 corresponding to 18.8 meV and {A
A/T
T}69 with 19.4 meV. For {A
C/T
G} the experimental findings are mixed with weak,70 single,71 double72 and bifurcated hydrogen bonds.73,74 In our case we found 27.4 meV for {A
C/T
G}, which is a strong Morse potential, comparable to an AT base pair, which would be consistent with double hydrogen bonding.
C/G
G}, are known, from NMR measurements, not to be hydrogen bonded,32 or to contain only a single hydrogen bond.62 This is consistent with the low Morse potentials 6.80 meV. In {A
C/T
G} a stacked mismatch with weaker bonding was measured,61,64 for which we obtained 6.47 meV. The stacking interactions found were
with 2.34 eV nm−2 and
with 2.64 eV nm−2, which are in the same range as for canonical DNA. On the other hand, for {A
C/T
G} that has been reported not to show wobble conformations due to increased CH/π interactions (W2 sequence of ref. 75), we obtained 6.47 meV and moderately larger stacking interactions of 3.44 meV nm−2 and 3.93 meV nm−2, for
and
, respectively. Specific TT hydrogen pairing modes were observed at low temperature for {C
C/G
G}, {A
C/T
G} and {A
A/T
T},76 however all these contexts resulted in very similar Morse potentials, 6.80, 6.47 and 6.60 meV, respectively.
C/T
G},78,81,83 2.30 meV; {C
C/G
G},67,77,79,80 2.45 meV; {c
g/t
t},27 2.22 meV; and {A
A/T
T},82 2.82 meV. Some moderate Morse potentials were observed only for AC when flanked by other mismatches, the largest being 16.9 meV {c
c/c
t}, see Fig. 7a, typically in vtv or ttv patterns.
a/T
c}, which does not correlate with quantum mechanical calculations that reported several interstrand interactions.86 The {A
C/T
G} context measured by NMR66 was reported to have two hydrogen bonds, though we determined a Morse potential of only 1.15 meV with stacking interactions of 1.24 eV nm−2 and 1.45 eV nm−2 for,
and
. The only context with larger potential has 34.4 meV for {a
g/a
g}, see Fig. 7b. Stacking interactions are usually not very large, with the notable exception of
with the extreme value of 71.8 eV nm−2. CG flanked stacking however can be very small as for instance 0.245 eV nm−2 for
. The measurements by Tibanyenda et al.27 involve a {c
g/t
t} mismatch, for which we obtained 2.22 meV, but the authors were not conclusive about its hydrogen bonding.
C/T
G}, for which we obtained 18.8 meV, have been previously studied by X-ray diffraction87–89 and were reported to be in an A(syn).G(anti) or A(anti).G(syn)90 conformation with two hydrogen bonds. In some circumstances a looped out structure has been shown by NMR.91 In another context {C
C/G
G}, with a Morse potential of 18.2 meV, a double hydrogen bonded A(anti).G(anti) conformation was observed for neutral pH.92 In both cases, the calculated Morse potentials are consistent with a double hydrogen bond. Tandem GA–AG stacking is a very special case, which we will discuss later.
C/T
G}, 17.5 meV, was found in a context in which it is flanked by canonical bases pairs (TtT). Since most studies on mismatches have been performed for this type of context, there is a substantial body of research related to the {A
C/T
G} mismatch, especially concerning its interaction with mismatch repair enzymes such as MutS.6,93 X-ray diffraction94 and NMR83,95–98 established the existence of two hydrogen bonds which seems consistent with the Morse potential of 17.5 meV found for this GT context. The {A
C/T
G} context also has some large stacking interactions depending on its neighbours. For instance for the step
we found 6.72 eV nm−2. This step occurs in the sequences used by Isaacs and Spielmann97 and the average displacement profile is shown in Fig. 10 where we also show the corresponding canonical profile. Fig. 10 bears some qualitative similarities to the helical parameters calculated from the molecular dynamics trajectories from Isaacs and Spielmann,97 see Fig. 5 of ref. 97. They also noted that in sequences containing GT, all base pairs display lower kinetic stability resulting in larger displacements than their canonical counterparts, which we also observe in Fig. 10. The large stacking of step
correlates with a larger stacking overlap seen in the structure of this sequence, shown in Fig. S9.† Apart from {A
C/T
G}, all remaining contexts show Morse potentials that are in the range of 5–10 meV, though stacking interactions can be as high as 37.6 meV nm−2 for
. An important occurrence of GT mismatches in genomic DNA, is the deamination of 5-methylcytosine, which are repaired by thymine DNA glycosylase (TDG) through base excision.99 The repair efficiency of TDG depends on the 5′ base pair flanking GT, which is much lower for neighbouring AT base pairs on the 5′ side.16–18 This is the context {A
C/T
G} with highest Morse potential depth, suggesting that a stronger hydrogen bonding could play a role in slowing the TDG repair activity.
![]() | ||
Fig. 10 Average displacements for sequences with GTal mismatches {A C/T G} (red curves) and corresponding canonical base pairs (dark grey curves). The calculation was carried out at 180 K, which has no relation to the melting temperatures. Sequences from ref. 97. | ||
c/T
c} and {A
g/T
g}, which have Morse potentials of 4.21 meV and 0.800 meV, respectively. For this sequence, the stacking interactions are also very small 0.293 meV nm−2 for steps
. The molecular dynamics for this particular case suggested100 that the mismatched base pairs rearranged to gain stability through hydrogen bonding and increased stacking. In particular it was observed that GG base pairs interact with one of the C bases of the adjacent CC mismatch, in a hydrogen bonding triad.100 The extremely low Morse and stacking potentials involving the CC mismatch suggest that there is no interaction between the two C bases in this situation.
. The Morse potentials in this case are of moderate intensity, 16.3 meV for both context groups, see also Fig. 7. A stacking stability of this magnitude was not observed in our previous work. Until now, the largest stacking potential found with the mesoscopic model was 12.5 eV nm−2 for inosine–guanosine stacked onto CG, which was correlated to its inosine(syn)–guanosine(anti) configuration.48 A very large stacking potential suggests a correlation with the sheared G(anti)–A(anti) configuration in which they are often observed.108,109 For the
context, for which the sheared stacking has been observed by NMR,102 we obtained 17.3 eV nm−2. We determined a value of 25.4 eV nm−2 for
for which sheared stacking has been observed in X-ray diffraction.110 For
the value was 18.3 eV nm−2.111 GA–AA mismatches with sheared stacking have been observed in quadruple mismatches,112 involving the context
, with moderately increased stacking of 6.84 eV nm−2. AA–AA mismatches were also reported in this work,112 however these were in a context that is not covered by the CD type parameters. For the CI parameters we obtained a stacking parameter of 9.23 eV nm−2.
While the stacking potential of the GA–AG double mismatch is very large, it is not the largest. The largest stacking was 71.8 eV nm−2, for
, which suggest that a similar sheared stacking might be taking place. A considerable number of double mismatches resulted in a stacking potential of the magnitude as that of observed for GA–AG. In ESI Table S3† we highlight all stacking parameters that exceed their average values by twice the standard deviation, which may be useful as candidate sequences for further experimental studies. However we are not aware of any experimental results for this or other mismatch configurations with large stacking parameters. Note that there is also a substantial number of very low stacking potentials as for instance AA–CC, CC–TT and CA–GC.
C/T
G}. For six type of mismatches, AA, CC, GG, CT and AG this corresponds to triplet transversions (vvv), and for TT a mixed triplet mismatch of type tvv. For AC, a transition mismatch, the three highest are flanked by transversions (vtv). To our knowledge, this unexpected pattern of triple transversions, presenting such large Morse potentials and hence a likely strong hydrogen bonding has not previously been reported. We not are aware of any experimental studies with these triplet mismatches, with which we could cross-correlate our findings.
There have been a few experimental studies on binding affinities for some types of triple mismatches where the affinity is dependent on the mismatch stability. Using the average displacement for specific sequences it is possible to attempt to correlate those to the reported binding affinities. Tests on Rad4 nucleotide excision repair complex, that recognizes diverse DNA lesions,11,113 is one of very few studies comparing the binding efficiency of different triplet mismatches, ccc/ccc, tat/tat and ttt/ttt. Fig. 11 shows the average displacement profiles for these three mismatches, with higher Rad4 binding specificity correlating to larger opening profiles. Triplet mismatches have also been studied with MutS recognition, for which it was found that certain types of mismatches are better recognized than others.14 In Fig. 12 we show a few examples containing triplet mismatches using the sequences from ref. 14. The best recognition was for AC triplets with very low Morse potentials of 1.43 meV, while poorest recognition was for AG triplet mismatches with moderate Morse potentials around 15.9 meV. Therefore, similarly to the Rad4 binding affinities, better MutS recognition appears to be correlated to larger hydrogen bond displacements and lower Morse potentials.
![]() | ||
| Fig. 11 Average displacements of the sequences with central triple mismatches, ccc/ccc (red bullets), ttt/ttt (blue squares), tat/tat (green boxes) and their reported Rad4 binding specificities.11 For comparison, a sequence without a mismatch is also shown (grey bullets). The calculation was carried out at 150 K, which has no relation to the melting temperatures. Sequences are TGACTCGACATCCMMMGCTACAA/ACTGAGCTGTAGGCMMMGATGTT based on ref. 11 and, only the central part around the mismatched region MMM/MMM is shown. | ||
![]() | ||
| Fig. 12 Average displacements of the sequences with triple mismatches, aaa/ccc (red bullets), aaa/aaa (blue squares), aaa/ggg (green boxes) and their reported MutS recognition.14 For comparison, a sequence without a mismatch is also shown (grey bullets). The calculation was carried out at 150 K, which has no relation to the melting temperatures. Sequences used are K (red bullets), L (blue squares), M (green boxes) and WT (grey bullets), from Table 1 of ref. 14. | ||
Trinucleotide repeat expansions are an important source of inherited neurological diseases, mostly formed by repetitions of type (CNG)n, with N = A, C, G or T, which may form DNA hairpins with central NN mismatches.114,115 Interestingly, repeats of (GNC)5 are less frequent or completely absent in genomic DNA.115,116 In Fig. 13a we show the average displacement profiles for (CNG)5 which are generally very stable with small strand displacements at the mismatch positions. The stability trends of (CNG)n are the same as observed in other melting experiments.117–119 However, for (GNC)5 repeats with N = A, C, or T, Fig. 13b, the displacements are very large. The exception is (GGC)5 (N = G) with very low displacements, and which is known to form stable hairpins.120 Another important type of trinucleotide repeat is (GAA)n,114 shown in Fig. 13c for n = 5, which is calculated with CI-type parameters, as our CD parameters do not cover the mismatch context for this particular sequence. In this case, the central part of the sequence is stabilised by the large stacking interaction of the AA–AG step with 8.37 eV nm−2 (CI), therefore it is possible that this repeat may form stable hairpins.117 Our stability results for the trinucleotide repeats appear to be consistent with their presence in genomic DNA when the repeat has low displacements, and its absence in case of highly unstable mismatches with large displacements.
The majority of the measured melting temperatures confirmed the destabilizing nature of mismatches, however a substantial 15% of single mismatches had higher melting temperatures than the least stable canonical AT base pair, a finding that has important implications for applications such as PCR primer and probe design.121 The mesoscopic calculation, used for interpretation of the measured melting temperatures, revealed a number of unexpected results. One of these is the common occurrence of stable mismatches, when they are flanked on both sides by other mismatches. In several tandem configurations we found very large stacking interaction potentials. This happens for the well known GA–AG tandem mismatch which presents a sheared stacking configuration. From our results, we speculate that there may be other tandem mismatches that possess this unusual stacking configuration. In most cases, the Morse potentials which represent the hydrogen bonds in the model, correlate with known hydrogen bond configurations from NMR or X-ray diffraction studies. However, for CT mismatches we obtained very low Morse potentials which are at odds with the double hydrogen bond configurations that were reported elsewhere. The reasons for the discrepancy with this particular mismatch are unclear. For triple mismatches we found good correlations with Rad4 binding affinities and MutS recognition which suggests that our results could be used for a more extensive analysis of this kind. Another potential application is for understanding the stability of trinucleotide repeats, such as (CNG)n and (GNC)n, which appears to correlate with its frequency in genomic DNA.
| PB | Peyrard–Bishop |
| NMR | Nuclear magnetic resonance |
| NN | Nearest-neighbour |
| BP | Base pair |
| MD | Molecular dynamics |
| SNP | Single-nucleotide polymorphism |
Footnotes |
| † Electronic supplementary information (ESI) available: All sequences and melting temperatures used in this work are given in Table S1. Tables S4 and S5 show Morse and stacking potentials, respectively, for canonical base pairs and context independent mismatches. Context groups are given in Table S2. Context dependent Morse and stacking potentials are shown in Tables S6 and S3, respectively. Fig. S1–S8 show the context dependent Morse potentials for mismatches. Fig. S9 shows some stacking steps of the sequence from ref. 96. See DOI: 10.1039/d0sc01700k |
| ‡ Present address RNA Biology of Fungal Pathogens Unit, Department of Mycology, Pasteur Institute, France. |
| § Present Address: School of Human Sciences, University of Derby, Kedleston Road, Derby, DE22 1GB, UK. |
| This journal is © The Royal Society of Chemistry 2020 |