SerraNA : a program to infer elastic constants from local to global using nucleic acids simulation data

The resistance of DNA to stretch, twist and bend is broadly well estimated by experiments and is important for gene regulation and chromosome packing. However, their sequence-dependence and how bulk elastic constants emerge from local ﬂuctuations is less understood


Introduction
For genomes to function properly, chromosomes need to fold into a hierarchy of structures, causing, for example, expression correlation of genes located within the same topological domain. 1 Besides, it is widely known that DNA looping is a fundamental structure for gene regulation that facilitates long-range communication between a promoter and its distal regulatory elements. 2,3 Moreover, DNA can be subjected to forces up to tens of pN approximately in cells due to the activity of protein motors. 4 And finally, on the shortest scale, DNA distortion has been detected as determining the formation of diverse DNA:protein complexes like nucleosomes, some transcription factors or bacterial nucleoid association proteins. 5 Therefore, it is important to measure the mechanical response of DNA to bending, stretching and torsion, which is well established to have average values close to 50 nm for the persistence length, 6-10 between 1100-1500 pN for the stretch modulus 11,12 and ranging from 90 to 120 nm for the torsion elastic constants 8,13,14 (for a good summary of experimental values see Lipfert et al. 15 ).
What is less clear from experimental data is the spread of elastic properties depending on sequence and which local elements build up the bulk flexibility of long DNA fragments. There have been several attempts to deduce the particular values associated to a sequence from cyclization probabilities, although these methodologies are not unambiguous and require the use of theoretical models. 16,17 In addition, it has been very difficult to identify the mechanisms through which some short sequence motifs, like A-tracts, originate extraordinary bending. 18 On these matters, molecular dynamics (MD) simulations at atomic resolution have become an impressive source of new important information, 19 that have provided (i) a systematic analysis of elasticity at the dinucleotide levels, 20,21 (ii) an evaluation of the influence of nearest flanking basepairs up to the our knowledge, there is no other program that estimates bulk flexibility constants from ensembles obtained by numerical simulations and that uncovers systematically how these properties emerge from local fluctuations. We also believe that for the first time, dependence of elasticity on sequence is exhaustive evaluated in this paper, particularly at the tetranucleotide level thanks to analysing the ABC trajectory database. 22 This paper describes the theoretical background behind the LDEM and it provides estimations for the different elastic constants by using MD simulations over a series of DNA fragments between 32 to 62 bp. In addition, SerraNA is used to determine how bulk elastic constants emerge from local fluctuations using the persistence length as an example.

The Length-Dependent Elastic Model (LDEM)
Geometric description for different fragments lengths. The two bending angles, roll and tilt, and the rotational angle twist at the bp-step level are adapted to evaluate the relative orientation of a pair of bp spaced by an increasing number of nucleotides. The vertical displacement, which is associated with stretch, is characterized by end-to-end distance but a fragment's contour length is also calculated for a more comprehensive description of the polymer structure (see below and The spatial configuration of a bp i is specified by giving the location of a reference point (r i ) and the orientation of a right-handed orthonormal reference triad (T i ) following the mathematical procedure of the 3DNA program, 28 whereŷ i points to the backbone of first strand,x i points to the major groove andẑ i marks the molecular direction at that particular point ( Figure 1b). Then, the CEHS scheme is applied for obtaining the molecular twist and the roll/tilt contributions to bend 27,37 (Figure 1). The algorithm is used to calculate the mid-step triad T mst between bp i and j that define an oligomer whose length ranges from 2 bp to N (Figure 1b). N is the total number of bp in the DNA fragment minus the two for each end, which have been discarded in order to avoid temporary loss of base pairing and other end effects.
The bending angle θ is obtained directly from the direction correlation (θ = cos −1 (ẑ i ·ẑ j )) and the corresponding bending axisrt is calculated byrt =ẑ i ×ẑ j (Figure 1b). Next, T i and T j are Figure 1: Schematic diagrams of the algorithm implemented in SerraNA for calculating the geometric parameters at different fragment lengths. (a) Vertical displacement is characterize by endto-end distance (in red) and contour length (in blue). (b) Twist and bend angles between bp i and j are obtained via the mid-base triad (T mst ) positioned at the mid-point. (c) Bending angle θ and bending axisrt are defined by directional vectorsẑ i andẑ j . Co-planar vectorsŷ i ,ŷ j ,x mst and y mst define twist angle Ω (d) and roll and tilt bending angles (e). (f)-(h) T mst between bp (in red) separated by 4, 8 and 12 bp, respectively. Roll and tilt consist on bending towards grooves and backbone, respectively, at the fragment midpoint for the different sub-fragment lengths. Angles and T mst are highlighted in blue. rotated aroundrt by half of θ for obtaining T i = R rt (+θ /2)T i and T j = R rt (−θ /2)T j , where the transformed x-y planes are now parallel with each other and their z-axes coincide (see Figure   1c and 1d). T mst is directly built by averaging and normalizing T i and T j . The corresponding 3 rotations (tilt τ, roll ρ, twist Ω) are defined as: where φ is the angle betweenrt and theŷ mst (Figure 1e). Note that roll and tilt variables in lengths longer than a dinucleotide denote bending towards grooves and backbone direction, respectively, according the T mst i.e. the fragment midpoint (see Figure 1).
For each DNA sub-fragment, end-to-end distance (L) and contour length (L CL ) are defined as: For completeness, the three rigid-body translation variables at the dinucleotide level (shift X i,i+1 , slide Y i,i+1 and rise Z i,i+1 ) are calculated by: and the extrapolation to longer scales can be designated by: where added-shift X 0 , added-slide Y 0 and added-rise Z 0 can be interpreted structurally as the three pseudo components of L CL . The length-dependent model of DNA elasticity. Under the assumption that distribution of values adopted by a variable X is fully Gaussian and non-correlated with the rest of deformation parameters, the corresponding elastic constant K for a particular length can be easily derived from its variance Var(X) estimated during a MD trajectory: 25,30 where fragment length or sub-lengths are specified by N dinucleotide steps with rise b = 0.34 nm.
However, the four distortion variables chosen to describe DNA flexibility on this model (roll, tilt, twist and end-to-end) are non-orthogonal. This effect is specifically taken into consideration by   determining elastic constants as the diagonal terms of the inverse covariance matrix V −1 or elastic matrix F: 29 Correspondingly, the diagonal terms of V −1 can be understood as the reciprocal of the partial variances, (1/Var p (X)). Var p (X) is a measure of the residual variance associated with a deformation after removing the linear effects caused by other variables. 25,38 All terms from the different Fs calculated using all possible sub-fragments are printed in the 'elastic_parameters.out' output file for a complete dynamic description of the NA molecule (see Figure 2).
Estimation of bulk twist elastic constant. The twist elastic constant for a singular sub-fragment k (C k ) is the diagonal term of F k corresponding to twist, F k being the elastic matrix associated to that particular DNA sub-fragment. Then, the twist elastic modulus as a function of length where N ranges from 11 bp-steps to N * , N * being the maximum sub-fragment length considered.
By default SerraNA discards the ten longest sub-fragment lengths for counting N * in order to have at least ten different values in averaging C N , but this is an option that can be modified in the program (see Figure 2).
Estimation of the long-range persistence length with its dynamic and static contributions.
SerraNA calculates the persistence length A of a particular DNA fragment by means of (i) the linear fitting of the directional correlation decay or (ii) the inverse-covariance matrix method.  . Values reported here are averages over all possible sub-fragments with a particular length, discarding the ten longest stretches, and the corresponding standard deviations are given as shade areas.
Mimicking the worm-like chain model (WLC), A is quantified by the linear approximation of the directional correlation decay between two bp tangent vectors,ẑ i andẑ j separated by an increasing number of bp steps N with a distance rise b = 0.34 nm along the DNA: 25 assuming a sufficiently weakly bending rod and where N ranges from 1 to N * nucleotides, N * being the longest sub-fragment considered on the fitting (see above paragraph) ( Figure 4a). The static and dynamic contributions to θ 2 can be partitioned by A s and A d are combined using 1/A = 1/A s + 1/A d 39 to obtain A again, which should be compatible with the direct linear fit to the full bending angle correlation decay.
The inverse-covariance method provides a second estimation of the dynamic persistence length (A d ) by directly combining the diagonal terms of F corresponding to the tilt and roll elastic constants (A τ and A ρ , respectively) for any pair of bp ( Figure 3): Then, the global A d emerged from the entire DNA fragment is calculated following the methodology used for C (see above): where A d,N are averages at a particular sub-fragment length with N bp-steps ranging from 11 to N * , as the crossover from local to global dynamics occurs within the first DNA-turn (Figure 3d).  partial variances associated with tilt and roll (1/Var p (τ) and 1/Var p (ρ), see above) after removing their linear correlations with the other deformation variables of F. A d is combined with the previously calculated A s to obtain a second prediction of persistence length (A ) using the expression In like manner, A is stiffer than A as this value dismisses contributions from twist and stretch.
Estimation of bulk stretch modulus. In a similar way to twist, stretch moduli for all subfragments k (B k ) are acquired from the corresponding F k 's diagonal term associated to the endto-end distance. As described before, 25  built using NAB module implemented in Amber16, 41 AMBER parm99 force-field 42 together with parmbsc0 and parmbsc1 corrections. 26,43 Fragments are named as 32mer, 42mer, 52mer and 62mer for the rest of the article. The 32-bp oligomer was also constructed using parmOL15 44,45 (named 32ol15 from now on). Structures were solvated in 200 mM Na + and Cl − counter-ions 46 and in TIP3P octahedral boxes 47 with a buffer of 1.2 nm. Systems were energy-minimized, thermalized (T = 298 K) and equilibrated using standard protocols. 48,49 The final structures were subject to 1 µs of productive MD simulation at constant temperature (298 K) and pressure (1 atm) 50 using periodic boundary conditions, particle mesh Ewald 51 and an integration time step of 2 fs. 52 Principal component analysis was done with pyPcazip 53 and fast Fourier transforms were done with an in-house program written in python.
Trajectories obtained from BIGNASim and ABC simulation databases. An extra simulation 1 µs long for a DNA oligomer containing a random sequence of 32 bp (ATGGATCCAT AGACCA-GAAC ATGATGTTCT CA) was obtained from the BIGNASim database 54 and analyzed together with the above (labelled as 32random from now on). Simulation was obtained with bsc1 parameters, 26 TIP3P water model and with neutralizing Na + . 46 Elastic properties for all distinct 136 tetranucleotides were obtained by analyzing MD simulations from the ABC consortium, 22 which are constituted of 39 oligomers of 18 bp, modeled for 1 µs, using parmbsc0 force-field, 43 SPC/E water 55

Results and Discussion
SerraNA is a program written in Fortran that is freely accessible at https://github.com/ agnesnoy/SerraNA under GNU Lesser General Public Licence and whose general workflow is shown in Figure 2. The program builds upon the LDEM described by Noy and Golestanian 25 and it streamlines the procedure of calculating the persistence length, twist and stretch modulus of a DNA molecule or other double-stranded, helicoidal nucleic acids using an ensemble generated by MD or MC simulations.
Torsion elastic modulus. Elastic profiles as a function of length for the whole set of simulations are presented in Figure 3. The calculated torsional modulus for all oligomers shows a crossover from the relatively soft value of around 30-60 nm at the single base-pair level to a large-scale asymptotic value between 90 and 100 nm (see Table 1), which is in agreement with previous study. 25 While softer values at short length scales are consistent with fluorescence polarization anisotropy measurements, 58,59 small-angle X-ray scattering (SAXS), 31 Table 1), achieving an overall good convergence on the microsecond-long trajectories ( Figure S1) Persistence length. Persistence length (A), as well as its static (A s ) and dynamic (A d ) components, were deduced following the principles of the WLC model. Persistence lengths calculated by the fitting of directional decays are in general higher than the corresponding experimental data 40 and coarse-grained modelling 35 (see Table 1 25 Our calculations indicate A s is much stiffer than A d , even though A s is the main source of variability (A s = 576 ± 191 nm; A d = 64.7 ± 1.4 nm; see Table 1 and Figure   4). This trend was already observed on MC simulations 35  imprecise. These sources of error are exposed by the broad confidence intervals of A s compared to A d (Table 1) and the relative lack of convergence in some of A s measurements e.g. for the 62mer ( Figure S1). Another example is the discrepancy of A obtained by two different DNA force-fields (BSC1 and OL15), which is mainly caused by A s and not A d (see Table 1), being complicated to judge whether error comes from force-fields or the linear fit.
The inverse-covariance method yields an increased dynamic persistence length A d of 68 nm, and a resulting persistence length A of 60 nm, as it only considers fluctuations not correlated with other deformation variables (ie partial variances, see methods). A d is calculated through the combination of roll and tilt elastic constants, A ρ and A τ , which produce periodic and antisymmetric profiles as a function of fragment length due to bending anisotropy towards grooves and backbone (see Figure 3). On lengths containing half and complete helical turns, A ρ and A τ are equivalent because grooves and backbone face equitably towards both bending axes, whereas, at intermediate lengths, there is an imbalance between them (see Figure 1).
Stretch Modulus. Stretch modulus deduced from all unconstrained simulations present a nonmonotonic dependence on length similar to the one previously described by Noy and Golestanian 25 (see Figure 3e). Base-stacking interactions cause stiffening at short scales up to 7 bp length as elastic constants present similar values associated with contour-lengths (see Figure S2). For longer sub-fragments, cooperativity emerges due to coordinated motion, softening the stretch modulus in two stages: (i) towards a plateau that would correspond to the regime captured by force-extension experiments 11,12 after incorporating an internal mode 13 bp long 25 and (ii) towards much more flexible magnitudes originated by long-ranged end-effects. 25 Principal component analysis reveals a mode that essentially captures vibration from edges and that produces a proportionate influence over the different oligomers containing gradually more bp (see Figure 5). This fact shows that the characteristic length of the stretching-end mode is longer than five DNA-turns, still not reached for our atomistic simulations. In contrast, L increments are uniformly distributed along the molecule in the simulation where DNA is actively pulled (see Figure 5), which shows that the end-stretching motion is just a vibrational mode not relevant for extracting the intrinsic stretch modulus of DNA.
We estimate stretch modulus via linear fitting of Var p (L) just using the central 18 bp, since they constitute the molecular domain significantly unaltered by end-effects (see Figure 5). Results give an overall average of 1779 ± 88 pN (Table 1), which is reasonably close to the experimental value ca. 1500 pN. 12 From local to global elastic behaviour. By analyzing elastic and structural length-dependence, SerraNA can also reveal how global elastic constants build up from the dynamics of smaller scales.
For example, Figure 6 compares the length-evolution on bending angles of the more bendable fragment (52mer) with the less one (62mer). Interestingly, bending is comparable between the two DNA a Elastic constants obtained using OL15 force-field are in italics. Persistence lengths on sequences over 100 bp, from which short fragments has been extracted from (see Methods), are in underlined text when they come from MC simulations 35 and in bold when they come from experiments. 40 b Overall averages and standard deviations for elastic constants obtained through the inverse-covariance method (twist C and persistence length A and A d ) are calculated using the different sub-fragment lengths between 11 bp-steps to N * , N * being the maximum number of bp considered (see Methods). c Overall values and confidence levels at 70% for persistence lengths following the WLC model (A, A s and A d ) and stretch modulus (B) are obtained through linear fits (see Methods).
sequences at the single bp-step level (7.1±1.5 and 7.2±1.1 degrees, respectively), but are able to cause distinct values at the longer scale of 38 bp (35.6±1.6 and 33.0± 0.7 degrees). The main difference at intermediate lengths (8,16 and 28 bp) is the higher degree of periodicity, which is in phase with DNA helicoidal shape, presented by the curved oligomer compared to the straight one (see Figure 6). Our data suggests that for creating a regular pattern characteristic of the curved fragment, a frequency with an exact number of cycles per DNA-turn at the single bp-step level is needed (3 cycles per DNA-turn for the 52mer in front of 3.5 for the 62mer, see Figure 6)   Figure 7). In general, we can observe a high degree of variability with flexible sequences twice as soft as rigid ones for all elastic constants.
The static persistence length is the most variable parameter in sequence space, spanning almost two orders of magnitude: from <25 nm in the case of TGGG, TGCA and CATG to >1000 nm for AATT (see Tables S1-S10). In general we observe that the majority of the tetramers are very flexible and just 13 sequences (9%) have values >200 nm. The less curved tetramers involve central AA or AT steps, with ATAT and AAAA being the top two ( Figure 7). This is in agreement with previous studies and with the idea of A-tracts being so stiff that they impair nucleosome wrapping 67 but facilitate looping and gene regulation when they are placed in phase. 67  bp constitute an intermediate length in the transition from local to bulk behaviour (see Figure 3), so the levels of correlation between bp-step fluctuations tend to diverge (see Figure 2c), making sequence-dependence analysis very convoluted. Broadly, the most rigid sequences for this parameter are the ones with a central YR step (Figure 7), which strikingly is the most flexible bp-step type at the dinucleotide level (see Figure S3) in agreement with previous studies. 20,21,48 We also observe that sequences with a bimodal behaviour in the central step 72 don't show any special flexible feature. These facts demonstrate the remarkable importance of flanking bases in building up overall fluctuations and the very complex interplay between dinucleotide steps. 73,74 Stretch modulus at the tetra-bp length are relatively high (see Figure 7 and Table S1-S10) compared with experiments at the long-range scale. The remarkable similarity between other distance definitions (ie end-to-end, contour length or added-rise, Figure 7 and S4) suggests stretch stiffness at this length is mainly influenced by the strong stacking interactions. There is also an important degree of variability among sequences with some steps like AGGG, AGGA and AAGG presenting stretch modulus <1400 pN, which are twice as flexible as others such as CGAC, TTGC and CCGG (>2700 pN). In general, we observe YYRR and RRYY steps be the most rigid and RRRR the most flexible for this parameter, being determined mainly by the vertical component (added-rise) but also with some influence from lateral displacements, in particular from slide direction.
The analysis of ABC database makes clear that there is a flexibility dependence on the sequence of DNA and that reasonably extends to sequences larger than 4 bp. Regarding tetranucleotides elastic constants, rigidity tends to increase in regions composed by RRYY, YYRR, RRRY and YRRY, whereas sequences made with YRYR, RRYR are in general flexible, although this classification strongly depends on the type of elastic parameter. Figure 7: Elastic constants at the length of 4 bp for the whole set of 136 tetra-nucleotide sequences obtained from ABC simulation database. Total persistence length together with its static and dynamic components (A, A s and A d , respectively) are calculated using the directional decay at the tetramer length. Twist (C), stretch modulus (B) and the second estimation of dynamic persistence length (A d ) are obtained directly from the inverse-covariance matrix for tetranucleotides. Vertical axis indicates middle steps, and horizontal axis flanking bases. Horizontal and vertical lines organize sequences according purine (R) or pyrimidine (Y) type. Sequence duplication is excluded through the use of white squares. AATT's A s is off the palette with a value of 1267±144 nm (see Table S6).

Concluding Remarks
In this article we present SerraNA, which is an open code that describes the elastic properties of nucleic-acids molecules with a canonical helicoidal shape (B-or A-form) using ensembles obtained from numerical simulations. We apply the program to analyze a series of atomistic MD simulations over DNA fragments and compare the extracted elastic values with available experiments.
We find reasonably good agreement on stretch and torsional modulus between computational estimations (97±3 nm and 1778±88 pN) and experimental values (around 100 nm and 1500 pN, respectively). The calculation of stretch modulus is especially challenging because of the end-stretching vibration that masks the thermal fluctuations characteristic of the experimental stretch modulus at the range of kbp. As atomistic simulations are done over relatively short DNA molecules (tens to a hundred of bp), SerraNA approximates the calculation of this elastic parameter using only the two central DNA turns.
In the case of persistence length, simulations provide a slightly more rigid measure (57±3 nm) than the generally accepted value of 50 nm, although it's hard to discern whether it is due to intrinsic problems of force-fields, to non-identical ionic conditions with experimental buffers or to trouble in measuring the static persistence. Modulations on the tangent-tangent decay caused by DNA intrinsic shape and the relative shortness of the simulated DNA fragments makes the calculation of the static component of persistence length peculiarly complicated. Moreover, our simulations indicate that DNA curvature is the main source of variability on bendability between sequences (510±210 nm), compared with 64.6±1.5 nm caused purely by thermal fluctuations.
When we analyze the whole set of tetra-bases sequences from ABC database, we observed again a higher degree in variation on the static persistence length (s.d. 159 nm) in contrast to dynamic persistence length (s.d. 5.9 nm) (see Table S11).
SerraNA also indicates how global elasticity emerges from local fluctuations by analyzing the change of mechanical properties as the length of considered fragments is systematically increased.
Because the crossover from single base-pair level to bulk elastic behavior occurs typically within one helical turn of DNA, relatively short DNA fragments, like the ones simulated here, are already useful for uncovering this effect. In the case of persistence length, our results show that periodic patterns in phase of the DNA helical turn are particularly advantageous for developing significant bendability at longer scales.
Finally, the systematic analysis of the whole set of 136 tetranucleotides reveals big differences with some sequences doubling others in all elastic parameters and, as a consequence, indicates the importance of sequence in determining DNA elastic properties. In general, we observe a much more complicated dependence for the different type of tetra steps compared with the dinucleotide level, highlighting the relevance of flanking sequences and the complex interplay between the different bp-steps. We expect that the use of SerraNA will help to clarify further how DNA elasticity can be modulated as a function of sequence, having important implications in understanding fundamental processes like DNA-protein recognition, DNA looping or packing inside the cell. 75 In particular, we anticipate using SerraNA in a range future experimental investigations 76 which will help us to unravel new physical properties of DNA at the single-molecule level. 77,78