Jacob B.
Swadling
a,
Kunihiko
Ishii
b,
Tahei
Tahara
b and
Akio
Kitao
*a
aSchool of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, M6-13, Meguro, Tokyo 152-8550, Japan. E-mail: akitao@bio.titech.ac.jp; Fax: +81-3-5734-3372; Tel: +81-3-5734-3373
bRIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
First published on 3rd January 2018
Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) have remarkably similar chemical structures, but despite this, they play significantly different roles in modern biology. In this article, we explore the possible conformations of DNA and RNA hairpins to better understand the fundamental differences in structure formation and stability. We use large parallel temperature replica exchange molecular dynamics ensembles to sample the full conformational landscape of these hairpin molecules so that we can identify the stable structures formed by the hairpin sequence. Our simulations show RNA adopts a narrower distribution of folded structures compared to DNA at room temperature, which forms both hairpins and many unfolded conformations. RNA is capable of forming twice as many hydrogen bonds than DNA which results in a higher melting temperature. We see that local chemical differences lead to emergent molecular properties such as increased persistence length in RNA that is weakly temperature dependant. These discoveries provide fundamental insight into how RNA forms complex folded tertiary structures which confer enzymatic-like function in ribozymes, whereas DNA retains structural motifs in order to facilitate function such as translation of sequence.
Nucleic acids are ubiquitous in modern biological phenomena. DNA stores information in cells, which enables each cellular constituent to be synthesised, assembled and regulated. RNA is involved in the building of ribosomes, in protein synthesis, and in regulatory mechanisms. Viruses are primitive entities containing DNA or RNA, while viroids are merely RNA fragments.
The structural model of DNA proposed by Watson and Crick3 was a decisive event, triggering a dramatic development in molecular biology. Since then and particularly over the last 20 years, the significance of nucleic acids in the living world has become even more evident. The possibility of determining the sequence of several hundred nucleotides either by purely chemical methods4 or by chemical and enzymatic methods5 has opened the way to understand the genome. The sequencing of the plasmid pBR322 was the first great success using these methods.6
The B-helix form of DNA, which we often think of, accounts for most of the behaviour of DNA. Nevertheless, DNA is not always present in this canonical structure but can also form alternatives such as Z-DNA, triple-helix DNA, quadruplex DNA, and slipped-strand DNA.7 One such type of non-canonical DNA which has been much overlooked, but has an active role in cell biology, is single-stranded hairpin DNA.
There are at least three different families of proteins in which specific DNA hairpin binding activities occur. (i) In prokaryotes and their viruses. Single-stranded phages have been found to use DNA hairpins in nearly all steps of their life cycle, such as the origin of replication of E. coli.8 (ii) Cruciform DNA, a type of DNA which contains hairpin motifs has been demonstrated at the RCR dso and for N4 phage promoters. In eukaryotes, cruciform binding proteins have recently been identified and have been suggested to play a major role in genome translocation9 and replication initiation.10 (iii) Finally, the evolution of functions involving single-stranded DNA is implicated in horizontal gene transfer, response to stress, and genome plasticity.11
RNA hairpins originate by two mechanisms: (i) transcription by DNA dependent RNA polymerase of an inverted repeat DNA resulting in the RNA folding into a hairpin loop structure, and (ii) an RNA molecule formed as a folded-back template for RNA-dependent RNA polymerase, which synthesises the second strand of the stem. The second mechanism, which produces perfect long double-stranded RNA hairpins, is not widespread in nature and is most likely restricted to a ‘copy-back’ mechanism of replication in certain viruses.12,13
Functionally, RNA hairpins can regulate gene expression in cis or trans, i.e., an RNA hairpin within an RNA molecule can regulate just that molecule (cis) or it can induce effects on other RNAs or pathways (trans). Hairpins serve as binding sites for a variety of proteins, act as substrates for enzymatic reactions as well as display intrinsic enzymatic activities.14
Alongside the limitless possibilities offered in molecular biology, in biotechnology and soon in genetic therapy, we should not forget the role of genetic engineering in the physicochemical study of nucleic acids. The discovery of an enzymatic role for certain RNAs, while extending the enzymatic concept to on-protein structures (ribozymes), has thrown new light of origins of life and has given a new impetus to methods for modelling the tertiary structures of RNA.15
Structurally, DNA and RNA are very similar, but they have distinctively different and divergent roles in common biological processes. The most naturally prevalent nucleic acids are comprised of either a ribose sugar, in RNA, or a deoxyribose sugar, in DNA. The primary structure of a nucleic acid is conventionally written as a set of bases going from left to right such that each phosphodiester bond is linked to the 3′ of the sugar on the left and to the 5′ of the sugar on the right.1
Since only one of the DNA strands is transcribed into RNA, the latter no longer exhibits the regular complementarity of the bases on each strand that allows very long double helix structures to be formed. However, the phosphodiester chain of RNA can fold on itself and create double helix regions separated by single stranded loops of varying sizes. The difference between the geometry and the structure of DNA and RNA is accentuated still further by the replacement of deoxyribose with ribose. All the regions of RNA with a double-helix structure take on the A-form and the ribose has the C3′-endo conformation. Moreover, the 2′-OH group can form a hydrogen bond with the O4′ atom of the neighbouring 3′ ribose, which stabilises and stiffens the structure.1
The function and dynamics of nucleic acids are intimately tied to the conformational states at a given temperature. Accurately characterising the complete conformational space of biomolecules is a problem of fundamental importance in physical chemistry and computational biology.
Understanding the structural and dynamical differences between DNA and RNA not only give us information on their function in modern biology, but it gives us clues as to the nature of the first biomolecules at the time of the origins of life.16,17
There have been a number of studies utilising temperature replica exchange methods to understand the folding of short RNA hairpins,18,19 and of DNA hairpins20 in the past. Previous studies have largely focused on short sequences (around 8 nucleotides in length), and to our knowledge, they have not made a direct comparison between analogous DNA and RNA hairpin sequences.
In this study, we have made the first direct comparison between 29 nucleotide single-stranded DNA and RNA hairpin loops of analogous sequence, using the enhanced sampling nature of replica exchange simulations. To our knowledge, it is the largest computational study of DNA and RNA hairpin loops to date. The purpose of this study is to understand how the minor local difference in structure between DNA and RNA can lead to major global structural/dynamical differences. The observed differences can give us an understanding of how these local chemical distinctions lead to emergent properties that lead to divergent roles in modern biology.
Fig. 1 The ideal structure of the (A) RNA and (B) DNA hairpin loop secondary structural motifs. Results were taken from the mfold web server.24 |
The starting structure of single-stranded DNA and RNA (Fig. 1) were made using the Nucleic Acid Builder (NAB)23 a program which is part of AmberTools. The NAB produces a single strand of the nucleic acid B-form, which we refer to as elongated, or unfolded. Elongated/unfolded structures (each consisting of 29 individual nucleotides) of sequence 5′-UUUAACC(U)18GGUU-3′ and 5′-TTTAACC(T)18GGTT-3′ were constructed for DNA and RNA respectively, where U, T, A, G and C abbreviations correspond to uracil, thymine, adenine, guanine and cytosine nucleic acid bases.
The nucleic acid structures were solvated with ∼10000 SPC/E water molecules25 in a truncated octahedral box and Na+/Cl− to give an ionic concentration of 0.4 M. The total number of atoms per replica are 32398 for DNA and 32522 for RNA.
Simulations were performed using the GROMACS code26,27 and the ff14SB force field.28 All bonds were constrained using the LINCS algorithm.29 The leap-frog algorithm with a time-step of 2 fs was used for integrating Newton's equations of motion. Fast smooth Particle-Mesh Ewald (SPME) method was used to treat the long-range electrostatic interactions, with a cut-off of 10 Å. Temperature coupling was handled using velocity rescaling with a stochastic term.30 The stochastic term ensures that a proper canonical ensemble is generated. Pressure coupling was handled using the Parrinello–Rahman method.31
Short 20 ns simulations of each nucleic acid were run at a temperature of 500 K in explicit water in order to obtain a compact globular form which was subsequently used to build the Replica Exchange Molecular Dynamics (REMD) starting structure. Our previous work shows that RNA rapidly forms a compact/globular conformation over a few nanoseconds.17 Conformations with the lowest radius of gyration were selected as starting structures for REMD, see Fig. S1 in the ESI.† Independent 100 ns molecular dynamics simulations were performed at each temperature to equilibrate the individual replicas at each temperature, before running REMD for 1 μs (a cumulative total of 108 μs simulation time).
REMD is a widely adopted method for the study of protein and RNA folding.32 REMD consists of M noninteracting copies (or, replicas) of the original system in the canonical ensemble at M different temperatures Tm (m = 0,…,M1). The replicas are arranged so that there is always one replica at each temperature. The trajectory of each independent replica is computed using MD. Adjacent replicas (replicas i and i + 1) exchange temperatures according to a Boltzmann probability distribution. REMD essentially runs N copies of the system, randomly initialised, at different temperatures. Then, on the basis of the Metropolis criterion, configurations are exchanged at different temperatures. The idea of this method is to make configurations at higher temperatures available to the simulations at lower temperatures and vice versa. This results in a very robust ensemble that is able to sample both low- and high-energy configurations. REMD produces enhanced sampling over single trajectory MD because fixed-temperature conformations are much more easily trapped in local energy minima.
In order to achieve the desired range of temperatures and the optimal uniform exchange probability between adjacent replicas, we employed the “Temperature generator for REMD-simulations” web-server.33 108 temperature replicas were made spanning temperatures 273.00–502.46 K. See Table S1 in the ESI† for a full list of temperatures. Additional consideration was also given to the number of replicas and the architecture of the computing resources used in this study. The exchange probability was on average 30% in all simulations. Fig. 2 shows us that a single replica traverses all of the temperature space, which tells us that the high energy conformations are available to low-temperature replicas, and thus there is a converged sampling of conformational space. We also provide in the ESI,† Fig. S2. Values of the average number of hydrogen bonds formed, bootstrapped over a number of simulation time frames. This visualises the convergence of values and provides evidence, along with Fig. 2, that 1 μs of simulation per replica is sufficient.
End-to-end distance is a useful quantity for characterising the formation of hairpin loop structures, which can be measured by FRET spectroscopy. FRET is a well-established photo-physical phenomenon by which energy transfer from a donor fluorophore to an acceptor molecule (chromophore/fluorophore) occurs over various distances (typically from 1 nm and up to 10 nm). The energy transfer efficiency often measures the relative distance between the donor and acceptor (FRET pair) and therefore, is popularly known as a “molecular ruler”. FRET is often applied to investigate changes during molecular interaction as a function of time due to its noninvasive nature. This technique provides advantages, including increased sensitivity, short observation timescale in the nanosecond, the working range of distances over which most of the biomolecular processes occur, making it an ideal experimental technique for the understanding of nucleic acid folding processes.36 FRET intensity is directly related to simulated virtual intensity, which we can calculate using the molecules end-to-end distance. Below, we derive the virtual intensity from end-to-end distances and describe how it relates to experimentally derived FRET intensity.
FRET efficiency varies as the sixth power of the distance (R) between the donor/acceptor pair attached to the molecule(s) and can be determined by the following equation:36
(1) |
(2) |
By combining the above equations we can calculate the virtual intensity Ivirtual from simulation by measuring the end-to-end distance using the following equation:
(3) |
Consider N segments of length a each making a small angle θ with the previous one (lying on a cone of vertical semi-angle θ around the previous segment). The mean value 〈h〉 of the projection on the first segment of the end-to-end distance is given by
(4) |
Lp = a/(1 − x) | (5) |
Lp = 2a/θ2 | (6) |
Note that the persistence length does not depend on the length L along the curve, but is an intrinsic property of the polymer in a given medium.
The radius of gyration RG is defined as the square root of the mean square of the distance ρ between the atoms and the centre of gravity of the chain:
(7) |
By calculating the average radius of gyration of the hairpins at each temperature we constructed melting curves of the nucleic acids shown in Fig. 3. The melting curves show how the effective size of the nucleic acid changes with temperature. Higher temperatures lead to larger sizes around 1.7 nm, and lower temperatures give rise to smaller sizes of approximately 1.5 nm. Fig. 3 shows RNA has a higher melting temperature than DNA.
The virtual intensity is a simulated property that calculates the fluorescence intensity, often used in an experiment to measure folding rates and melting temperatures. It uses the end-to-end distances calculated at each replica temperature to calculate a “virtual intensity”, as we described in the Methods section. Fig. 4 shows the normalised virtual intensity at each temperature. The simulated melting curves for DNA and RNA display the two phase trend that is commonly seen in melting curves of biomolecules.21 The plot in Fig. 4 depict a difference in melting temperature between DNA and RNA of 6.84 K. Of the sequences we have simulated, RNA has a higher melting temperature than DNA.
The temperature dependence of the distribution of end-to-end for DNA and RNA is shown in Fig. 5. The figure shows the type of dependence we see in Maxwell–Boltzmann distributions of molecular kinetic energies at different temperatures, given by the Arrhenius equation: . As the temperature increases the average end-to-end distance increase (from 1.9 to 2.5 nm) and the standard deviation increases, as shown by the broader distribution at higher temperatures.
Fig. 5 Normalised density distribution of end-to-end distances for (A) DNA and (B) RNA at three different temperatures, 300.00, 387.76 and 475.97 K. |
Fig. 6 shows the average number of intra-molecular hydrogen bonds at each temperature. Hydrogen bonds were determined based on cutoffs for the angle hydrogen–donor–acceptor and the distance donor–acceptor. OH and NH groups are regarded as donors, O and N as acceptors. RNA, on average, has over twice the number of hydrogen bonds than DNA. At 300.00 K RNA has 26 and DNA has 12. The increased number of hydrogen bonds can be attributed to the –OH donor group present in RNA in the 2′ position of the ribose, that is not replaced by a –H acceptor group in DNA. The increase in the number of hydrogen bonds can also be ascribed to the increased stability of adenine–uracil pairs over adenine–thymine pairs in DNA.38 The increased number of hydrogen bonds in RNA, and the added stability of adenine–uracil base pairs counters the electrostatic repulsion between phosphate backbone groups allowing RNA to have a smaller, more compact, conformation, which is reflected in the smaller radius of gyration of RNA in Fig. 3.
ΔG(R) = −kBT[lnP(R) − lnPmax] | (8) |
An initial inspection of these free energy landscapes indicates DNA and RNA occupy visibly different portions of conformational space. In terms of end-to-end distances and radius of gyration, DNA can possess relatively small globular structures (of Rg 1.25 nm and e2e of 0.1 nm) and long elongated forms (of Rg 3.5 nm and e2e of 11 nm), as well as a range of lengths and sizes in between. RNA on the other hand occupies a comparatively smaller portion of conformational space, with Fig. 7 displaying a range of accessible structures indicative of a folded polymer with a short length and a small size. In terms of free energy, the most stable conformations appear between ΔG = 0–2 kJ mol−1 (visible as black regions in the free energy surface in Fig. 7). This low energy region is visibly much smaller in Fig. 7A than B. This indicates that RNA can adopt many more low energy (ΔG = 0 − 1 kJ mol−1) conformations than DNA. Alternatively, DNA can adopt many more higher energy conformations that appear between ΔG = 2–6 kJ mol−1 (visible as purple regions) than RNA, which correspond to longer end-to-end distance and larger radius of gyration. This trend continues at higher temperatures.
Both the free energy landscapes for DNA and RNA resemble that which we traditionally think of for folded proteins. Proteins show similar properties to nucleic acids and historically there has been more effort devoted to their understanding. The landscapes resolved for proteins will often show a single area of high density surrounded by less dense regions.39 This often appears as a “funnel” type landscape with a low energy minimum. Closer inspection of the free energy landscapes reveals subtle differences in the density around the minimum. The low energy, dense region (seen as red-to-black in Fig. 7) exhibits a much smoother basin for DNA. RNA has a comparatively rugged energy minimum. In effect, this means RNA has a propensity to form a variety of meta-stable conformations with low barriers between states. DNA conversely, has a single low energy conformation and therefore less diversity in low energy conformations.
RNA has the ability to form a variety of low energy, secondary structures through Watson–Crick hydrogen bonding, non-Watson–Crick hydrogen bonding, and π−π stacking between adjacent and non-adjacent bases.
In Fig. 8 we show the mean smallest distance between residue pairs. We also give the probability of binding between the residue pairs within the stem region of the hairpin for DNA in eqn (9) and RNA in eqn (10). The mean distance matrix in Fig. 8 and probabilities in eqn (9) and (10) give us an understanding as to types of hairpin loop structures formed by these two analogous nucleic acid sequences. The probabilities, on the most part, are higher for RNA in this stem region compared to DNA, with the GC pairs appearing more favourable. The off-centre elements of Fig. 8 are not symmetrical in these systems, which comes as no surprise given the ideal structures theorised in Fig. 1, where the stem region is not between the end residues but instead there is a tail of 3 residues at the 5′ end.
Fig. 9 (A) Average persistence lengths or DNA and RNA at each temperature and (B) distribution of persistence lengths at 273.00 K. |
DNA cluster | Population |
---|---|
1 | 0.2942 |
2 | 0.3265 |
3 | 0.3793 |
RNA cluster | Population |
---|---|
1 | 0.6263 |
2 | 0.2850 |
3 | 0.0891 |
Each cluster has been shown as the contribution to the distribution of radius of gyration in Fig. 11, along with the total distribution. The integrated values compare directly to the cluster populations given in Table 1.
The structure that has the lowest root mean-squared deviation from the centre of the cluster is shown in Fig. 12. The three structures displayed for DNA and RNA in Fig. 12 are representative structures of each cluster, which we refer to as the ‘best member’.
Fig. 12 “Best member” structures, which correspond to the structure with the lowest RMSD from the centre of the cluster, for DNA (A–C) and RNA (D–F). Cluster 2 of DNA (B above) shows the structure closest to the ideal structure shown in Fig. 1. Cluster 3 of DNA (C) shows an unfolded/elongated structure. Atoms belonging to base residues in the four stem region pairs have been displayed as van der Waals radii, and phosphate backbone displayed as a tube. |
The clusters in Fig. 12 for DNA show far more distinct (non-overlapping) clusters than for RNA. This suggests that the formation of the aforementioned folded and unfolded structures are coupled to the radius of gyration of DNA, but less so for RNA. The ideal structure of the hairpin loops sequences we are studying here have been identified theoretically based on the base pairing and secondary structural prediction41 using the mfold web server.24 The theoretical secondary structure can be seen in Fig. 1. Both DNA and RNA hairpin loops exhibit a loop region consisting of 18 uracil or thymine bases, stabilised by a four base pair stem. Each structure has a tail of three bases at the 5′ end of the strand.
The best member conformations shown in Fig. 12 give us a much better view of the conformations formed by these nucleic acid sequences than the theorised ideal structure. This provides realistic high-resolution structures that can add insight into experimentally derived structures using X-ray crystallography to compliment FRET studies. As shown by the structure of Cluster 3 (Fig. 12 and Table 1), there is a significant population of completely unfolded structure even at room temperature, which may be related to the fact that DNA Cluster 1 is less stabilised by hydrogen bonds. On the other hand, RNA is confined to a more compact structure, which is stabilised by base pairing.
In eqn (9) and (10) we tabulate the probability of binding between residue pairs in the stem region of the hairpin loop (see Fig. 1 for numbering scheme). The values were calculated as the probability of the two residues being within 1.2 nm distance of one another, for example, the probability of C7 and G26 binding in RNA is 18.42%. Overall, the binding probabilities in RNA are higher than those of DNA, which corroborates the trend in hydrogen bonding we see in Fig. 6 and the difference in melting temperature observed in Fig. 3.
(9) |
(10) |
(11) |
(12) |
(13) |
(14) |
(15) |
(16) |
In eqn (11)–(16), we calculate the binding probabilities of residue pairs in the stem region for DNA and RNA with each of the structural clusters calculated previously. As shown in Fig. 11, the size of the structures in Cluster 1 are smaller then those in Cluster 3, based on the radius of gyration. This is in agreement with the binding probabilities for Cluster 1 (eqn (11) and (12)), which is higher than Cluster 2 (eqn (13) and (14)) and than Cluster 3 (eqn (15) and (16)). The binding probabilities follow the trend Cluster 1 > Cluster 2 > Cluster 3.
Eqn (12) shows significantly large “off diagonal” probabilities, which suggest existence of multiple energy minima in the RNA hairpin within confined native state, represented by Cluster 1. In contrast, the DNA hairpin has a broader conformational distribution even in the native state as shown in Table 1. One possible reason for this is that DNA can form less base pairs, as shown in eqn (11) and is less stabilised enthalpically compared to the RNA hairpin.
Given the binding probabilities and the identified cluster structures, plus the distribution of persistence length, it is likely that the hairpin formation in RNA is stabilised by the formation of base pairing in the stem region, whereas the formation of a hairpin structure in DNA is formed because of the mechanical rigidity of the polymer. This would corroborate the difference in melting temperature between DNA and RNA, where a higher number of hydrogen bonds would need to broken within the stem region of RNA, meaning higher temperatures are needed to break up the bonding network leading to an increase in both end-to-end distance and radius of gyration.
The shorter persistence length in DNA generates a helical structure that we are more used to seeing in duplex DNA. The helical structure we can observe in the best member structures for DNA in Fig. 12B and C exhibit a right handed B-form which we can see from the incline of the base pairs to the axis (−1.2°), the number of base pairs per turn (10.5) and the size of the major and minor grooves (22, 12 Å).45 RNA, with its longer persistence length, does not display any A- or B-form helical style, but rather adopts many shorter kinks and grooves (as shown in Fig. 12D–F).
The stiffness of the DNA polymer allows it to form well defined helical structures, ideal for forming complementary duplex structures and conserving base sequence, as well as forming higher level tertiary supercoiled structures.46 The RNA structure, on the other hand is ideal for forming complexly folded conformations for the purpose of carrying out catalytic biochemical processes, i.e. RNA enzymes (ribozymes).15
A popular theory in origins of life studies is the “RNA world hypothesis”,47 in which it is thought that there was a biology based entirely on RNA, with DNA and proteins occurring later on. It is an attractive theory as RNA has the ability to replicate genetic information and catalyse biochemical reactions – the roles of DNA and proteins in modern biology. Our findings suggest that if the RNA world did once exist, DNA may have evolved in such a way as to have a shorter persistence length in order to maintain well defined B-form helical structure to make the duplex form more stable. RNA and proteins, with a longer persistence length, better able to form complex folded tertiary structures which can perform biochemical reactions, such as phosphodiester bond cleavage reactions in the Hammerhead ribozyme.15
Footnote |
† Electronic supplementary information (ESI) available: Replica temperatures, convergence data and system set-up. See DOI: 10.1039/c7cp06355e |
This journal is © the Owner Societies 2018 |