Complexation of single stranded RNA with an ionizable lipid: an all-atom molecular dynamics simulation study

Complexation of a lipid-based ionizable cationic molecule (referred to as DML: see main text) with RNA in an aqueous medium was examined in detail by means of fully atomistic molecular dynamics simulations. The different stages of the DML-RNA association process were explored, while the structural characteristics of the final complex were described. The self-assembly process of the DML molecules was examined in the absence and in the presence of nucleotide sequences of different lengths. The formed DML clusters were described in detail in terms of their size and composition and were found to share common features in all the examined systems. Different timescales related to their self-assembly and their association with RNA were identified. It was found that beyond a time period of a few tens of ns, a conformationally stable DML-RNA complex was formed, characterized by DML clusters covering the entire contour of RNA. In a system with a 642-nucleotide sequence, the average size of the complex in the longest dimension was found to be close to 40 nm. The DML clusters were characterized by a rather low surface charge, while a propensity for the formation of larger size clusters close to RNA was noted. Apart from hydrophobic and electrostatic interactions, hydrogen bonding was found to play a key-role in the DML-DML and in the DML-RNA association. The information obtained regarding the structural features of the final complex, the timescales and the driving forces associated with the complexation and the self-assembly processes provide new insight towards a rational design of optimized lipid-based ionizable cationic gene delivery vectors.


Introduction
Gene therapies have surged as a new path towards treating diseases by addressing their genetic origins. 1,2 The potential of gene-based drug formulations for targeted expression of a beneficial protein, for removal through editing or for silencing of defective genes presents also new opportunities for vaccine applications and personalized medicine. 3 In such nucleic acidbased therapies, the genetic cargo is carried to the targeted sites though specialized delivery systems aiming at providing protection against enzymatic degradation, suppressing immune response and activating mechanisms for an effective intracellular delivery of the payload. 4,5 In this direction, recent efforts have concentrated on the development of non-viral vehicles, [6][7][8] which are free from side-effects such as the risk of genetic integration and increased immunogenicity, which are usually met in viral vectors. 9 Issues associated with relatively high toxicity levels and low potency of synthetic carriers are currently a subject of intense research efforts, towards the fabrication of systems with optimized properties. 10 In this endeavor, lipid-based formulations have emerged as rapidly developing carriers in nucleic-acid therapeutics, due to their ability to accommodate long nucleic acid cargos (e.g., mRNA or pDNA), their low immunogenicity, their multiple dose capabilities and their versatility in design and fabrication. [11][12][13][14] In particular, the development of pH-responsive ionizable cationic lipids, for which the surface charge can be modulated in a controllable manner, allowed efficient binding with the oppositely charged polynucleotides, prolonged significantly the circulation time of the formed complexes and increased their therapeutic efficacy by facilitating the endosomal escape and the final release of their cargo. 11,[15][16][17][18][19][20] Formulations based on such nanoparticles have already been FDA approved, 21 or entered the clinical development phase. 18 Although there are several past and ongoing efforts for rational design of ionizable lipid-based nanocarriers for gene therapy applications, 14,16,19,22,23 a fundamental understanding of the physicochemical processes relevant to their association with nucleic acids, which may lead to the optimization of the formed complexes in terms of their structural characteristics and their thermodynamic stability, is still limited. 23,24 To this end, computer simulations may offer a detailed description of the mechanisms involved in the complexation process, and thus shed light on the role of the different factors which control the physicochemical properties of the formed complexes.
Several simulation studies have focused on the complexation process of small-interfering RNA (siRNA) with lipid nanoparticles. 23,[25][26][27][28] The structure and function of liposome carriers used for siRNA delivery have been studied through all-atom 27 and coarse-grained models 25 as well as by using multiscale approaches. 27 The role of specific sites on a siRNA molecule in its binding to a lipid bilayer surface has been explored through all-atom molecular dynamics (MD) simulations. 28 Findings revealed a rather weak and unstable binding, where the siRNA adsorption is driven by the attractive interactions of overhanging terminal nucleotides with choline moieties of the lipid molecules. The complexation and dissociation of siRNA and propylamine (PMAL) has been studied using a coarsegrained model for PMAL, in order to examine the delivery efficiency of siRNA when PMAL is used as a carrier. 26,27 Furthermore, since dendrimers have been widely used as cationic vectors for the transport and release of biopharmaceutical substances into the cell, many experimental [29][30][31] and computational studies [30][31][32][33] concentrated on their modification, aiming at improving their affinity for nucleic acids and their efficiency for the release of their cargo in the intracellular space. A comprehensive review by Marquez-Miranda, V. et al., presents several computational techniques developed for the study and design of dendrimers as carriers for DNA 34 and RNA. 35 Amongst various cationic polymers, poly(ethylene imine) (PEI)-based carriers are frequently explored for nucleic acid delivery purposes. Various simulation techniques, based on both all-atom and coarse-grained models have been used for studying the PEI-DNA and -siRNA condensation mechanism. 25,29 A recent review provides an extensive description of computational methods, used to study RNA structure and dynamics. 36 Heterogeneous time scales are met in the RNA processes, including very fast, 37 but also very slow timescales, 38 with its folding landscape involving many metastable states. For this reason, different models are used depending on the features that need to be described. 39 Atomistic MD simulation is a valuable tool, which allows for a more detailed characterization of conformational, kinetic and thermodynamic properties of RNA molecules, taking always into consideration limitations of the accuracy of existing force fields used for the description of such systems. Until recently, only small-to medium-sized RNA molecules and relatively short time scales could be explored 40 and enhanced sampling techniques [41][42][43][44] had been employed, in order to accurately describe their properties. An alternative approach, which is based on a simplified description of molecules, is the coarsegrained models. 45,46 These models can provide information at longer length and time scales and can sufficiently describe large-scale motions, mechanical properties, native structures and thermodynamics. Various coarse-grained models have been used for RNA molecules, 36 with the MARTINI scheme being widely used. 47 Due to the inherent difficulties related to the complexity of the models and the limited availability of powerful computational resources, only a small number of studies have so far attempted a detailed description of long RNA molecules in the all-atom representation. In the present work, we employed atomistic molecular dynamics simulations to study the associative behavior between RNA (with short and long nucleotide sequences) and a lipid-based ionizable agent in an aqueous environment and at neutral pH conditions. Our main goal was to obtain detailed information regarding the different stages involved in the complexation process and to explore the potential of the formed complex to be part of a delivery mechanism for the nucleic acid cargo. The time evolution of the system towards an energetically more stable state was recorded, whereas very fast processes, at the nanoscale, related to the complexformation, were captured from the atomistic MD simulations. The physicochemical processes, which are described through this study, may serve as a starting point for a controlled formation of such complexes, which can be part of lipid-based formulations for gene delivery purposes.
The rest of the paper is organized as follows: Section 2 provides a description of the examined systems and the simulation method. Analysis methods and results follow in Section 3, divided into subsections according to the system and the examined properties. Finally, in Section 4, concluding remarks are presented.

Systems and simulation protocol
The compound that was selected as a complexation agent with the examined RNA sequences, bears a diketopiperazine core, two methyl ester end-groups and two linoleic ethyl ester endgroups (henceforth referred to as DML). Its chemical structure is shown in Fig. 1.

I. Description of the system
In total, 3 systems were simulated. An aqueous solution of DML molecules, where their self-assembly propensity was examined without the presence of RNA, was used as the reference system (it will be referred to as DMLW henceforth). The other two systems were comprised of a single RNA nucleotide sequence either of 30 nucleotides (referred to as DMLRW) or of 642 nucleotides (referred to as DMLHRW), the DML complexation agent, and water. The RNA nucleotide sequence in the DMLHRW system was based on the fragment of the Herceptin antibody (light chain), 48,49 while the DMLRW system was a part of this (see ESI, †). The reverse translation of the antibody, in order to obtain the nucleotide sequence of the DMLHRW system, was based on the consensus codons (see ESI †). The DMLRW system was used to collect information regarding the generic characteristics of the complexation behavior between an oligonucleotide RNA and DML particles, which was then compared to the analogous behavior in the presence of the long RNA sequence in the DMLHRW system.
Details about the composition of the examined systems are given in Table 1. The ionization state of the DML molecules (due to the ionization of the backbone nitrogens, see Fig. 1) was taken to correspond to neutral pH conditions, rendering a degree of ionization 50 of 0.5 for each ionizable nitrogen atom. On this basis, three kinds of DMLs, in terms of the protonation state of the two ionizable nitrogens, were considered in the simulations: electrically neutral (DML0), doubly protonated (DML2) (i.e. bearing a charge of +2|e À |, where e À symbolizes the electron charge) and singly protonated (DML1) (i.e. bearing a charge of +1|e À |). In both systems including RNA, the negative charge of the nucleotides was counterbalanced by that of the protonated DMLs, while in the DMLW system, chlorine counterions were used to ensure electrical neutrality.

II. Simulation method
Fully atomistic molecular dynamics simulations were performed using the GROMACS package. 51 Both, RNA and DML molecules were modeled through the CHARMM36 52,53 all atom force field. 54 This force field has recently been used for all-atom simulations of RNA 36 and lipids. 55,56 For DML, charge distribution was described by the Gasteiger method. 57 Solvent was described explicitly by the SPC/E model for water. 58,59 Nonbonded interactions were parameterized through a spherically truncated 6-12 Lennard-Jones potential and standard Lorentz-Berthelot mixing rules. A cut-off distance of 1 nm has been applied. The particle-mesh Ewald (PME) algorithm was used for the evaluation of the electrostatic interactions. The cut-off distance for PME was 1 nm, the PME-order was 4 and 0.12 the Fourier spacing. Our choice to use the SPC/E water model, combined with a VDW cut-off of 1 nm and PME-based electrostatics was based on a previous work, 60 where it was demonstrated that a combination of PME-based electrostatics with SPC/E water and a VDW cut-off of 1 nm, provided reliable results in systems interacting with Lennard-Jones pairwise potentials, in a computationally efficient manner, thus making it appropriate for large scale simulations such as those performed in the present work.
Molecular dynamics simulations were performed in the isothermal-isobaric (NPT) statistical ensemble. The pressure was kept constant at P = 1 atm, using the Berendsen barostat, 61 while the stochastic velocity rescaling thermostat 62 was used for keeping temperature at ambient conditions (T = 300 K). The integration time step was 1 fs and the particle-mesh Ewald (PME) algorithm was used for the evaluation of the electrostatic interactions.
The initial geometry of RNA was based on a Monte Carlo procedure 63 (selected as the minimum energy configuration among 10 000 generated), whereas lipids were uniformly distributed in the simulation box. This starting configuration was energy-minimized, while prior to the production runs all the systems were equilibrated for at least 50 ns. Within this timescale, energetic parameters and conformational properties (i.e., size distribution of the complexes, average dimensions of RNA, and average orientation of the DML molecules -see Fig. SI-1 in the ESI †) were stabilized. Ensuing the equilibration, production runs of 100-200 ns were performed. It should be noted that the secondary and tertiary structural changes of RNA and their dynamics take place at a much longer timescale than the present simulations can probe and thus possible changes in interactions with the lipid clusters formed, cannot be captured. However, as it will be described in the sections to follow, these are not expected to influence the physicochemical processes relevant to this study, since the latter take place at much shorter timescales (of the order of a few tens of ns).

Analysis methods and results
The hydrophobic nature of the diketopiperazine-derived lipids per se 22 is expected to lead to the formation of aggregates in an aqueous environment. It is therefore interesting to explore (a) how the presence of RNA affects the clustering behavior of the To monitor the cluster formation and to characterize the formed clusters, and to examine structural features of the formed complexes in the RNA-based systems, appropriate analysis methods were utilized. In this context, standard analysis tools of the GROMACS package 51 were employed. Quantities like pair radial distribution function (rdf), density and charge distributions and hydrogen bonding were calculated. For the examination of the clustering behavior, we followed the procedure described in ref. 51 and 64 as outlined below.
Initially the center of mass of each DML molecule (CM) was determined and considered as a potential center of a cluster. The condition that was checked in order to determine whether a neighboring DML molecule belonged to a cluster centered at the selected DML was whether the CM-CM distance between the two DMLs was smaller than a critical distance (2.5 nm in our case). The selection of this critical distance was based on the separation corresponding to the first minimum in the DML-DML center of mass pair radial distribution function (see Section 3.a-III, Fig. 6a). Each time a DML molecule was found to belong in a cluster, it was excluded from the list of neighboring similar molecules and the algorithm was repeated for the remaining DML particles. This procedure ensured that each DML molecule belonged only to one cluster. The same algorithm was then repeated for all such molecules and the number of clusters, the average number of DMLs per cluster and the size of the clusters were recorded.
Since the vicinity of RNA is of particular interest, in the RNAbased systems, a ''filtering'' process was also applied, keeping only those DMLs for which their centers of mass were located at a distance no further than 4.0 nm from RNA. This distance was determined from the pair radial distribution function between the RNA and the DML particles (see Fig. 6b in Section 3.a-III).
In addition, to monitor the dynamics of the clustering process and the attainment of a stabilized state in the cluster formation, a series of snapshots at different time spots throughout the trajectory were analyzed.

3.a Conformational properties and static cluster analysis
I. DML molecules in water (DMLW system). A system of DML molecules, bearing a degree of protonation of 50%, was simulated in an aqueous solution. Chlorine ions were used for the neutralization of the solution, as described in Table 1. Fig. 2 shows an equilibrated configuration of a single cluster that was formed by DML molecules in the DMLW system. The timescale relevant to cluster formation was found to be less than 30 ns. An estimation of the size of the cluster can be provided by the average radius of gyration hR g i, defined by where r i is the position of the ith atom, R C is the position of the geometric center of the cluster and N is the number of atoms. The relevant calculation rendered a value of hR g i = 16.4 AE 0.4 Å. To further elaborate on the features of the cluster, additional analysis involving the mass and the charge distribution with respect to the geometric center of the cluster was performed. The density profile is presented in Fig. 3a and a corresponding charge profile in Fig. 3b. A peak in the density distribution is observed at 0.3 nm from the center of the cluster, which indicates a rather compact  structure. A density plateau follows, up to almost 1.5 nm, beyond which the density decreases sharply. An outer surface layer of B1 nm width is formed (i.e., from the point at which the density starts dropping to the point at which it vanishes). Concerning the charge distribution, it appears that only very close to the center of mass of the cluster does a negative charge accumulate. Positive charge is distributed throughout the cluster and peaks at a distance of B1.23 nm, moderately lower compared to its radius of gyration. Close to the surface of the cluster (i.e., at a distance close to hR g i), only a weak positive charge, B0.35 |e À | is detected.
II. Aqueous solution of oligonucleotide RNA and DMLs (DMLRW system). In the DMLRW system, the RNA sequence is comprised of only 30 nucleotides and thus is much easier (in terms of required computational resources) to handle, compared to the larger DMLHRW system. Therefore, it can be used as a working model to explore general trends regarding the physicochemical behavior of the various components in the solution.
As was also observed in the DMLW system, DML molecules were found to self-assemble rapidly. A fast movement of the DML particles towards RNA was also recorded. Both processes were completed within a time period of 50-60 ns. A more detailed analysis related to the clustering kinetics will be discussed in Section 3b. A representative configuration, where DML clusters become associated with RNA, is shown in Fig. 4.
Evidently, instead of observing association of individual DML molecules with the oppositely-charged nucleotides, the former prefer to organize in clusters, which then associate with RNA. A histogram of the cluster sizes as a function of the number of DML molecules contained in each cluster is presented in Fig. 5 (it should be noted here that all DML particles in the solution were taken into account). Apart from the single-DML or double-DML state (i.e., when 1-2 DML molecules become detached for short or for longer times from a larger cluster), clusters including significantly larger numbers of DML particles are formed as well.
Excluding the small clusters (i.e., with occupancy o3), a weighted average rendered a mean number of DML particles per cluster equal to 8.9. The largest cluster detected included 17 DML molecules. Additional quantification of the size of the clusters is given by their mean radius of gyration, which was calculated to be hR g i = 15.4 AE 0.1 Å. This average size is comparable to the hR g i of the clusters formed in bulk water (see Section 3a-I). It is also worth mentioning that the surface of RNA was found to be almost entirely covered by two DML clusters, as it is illustrated in the snapshot of Fig. 4. A more comprehensive analysis of this behavior will be presented in the section to follow for the DMLHRW model.
For computational efficiency purposes (particularly for the DMLHRW system), in this work we opted for omitting the inclusion of counterions in the systems including RNA, having in mind that the presence of counterions would not play a significant role in the conditions of interest (i.e., ionic strengths of the order of 0.1 M as in the relevant experiments), since at that low salt concentration no efficient screening of electrostatic interactions between the lipids and between the lipids and RNA is expected. This effect has been demonstrated in past simulation efforts examining protein-RNA binding 65 where at salt concentrations of 0.1 M no structural changes have been practically observed in an RNA-protein complex or any destabilization of the initial RNA conformation. To check this hypothesis for our models, we have also simulated a system analogous to DMLRW at an ionic strength of 0.15 M, without noticing any significant differences regarding the associative phenomena that will be described in this manuscript. A corresponding snapshot is presented in the ESI † (Fig. SI-2a), which provides a qualitative comparison, demonstrating the formation of DML clusters and the twisting of RNA around them. The comparison of radial distribution functions arising from the centers of masses, between RNA-DML and DML-DML molecules in the two systems, is presented in Fig. SI-2b in the ESI. † The similarity of the curves corroborates the aforementioned hypothesis.
III. Aqueous solution of long RNA and DML (DMLHRW system). In the DMLHRW system, we examined in more detail whether the different conformational characteristics of the long  RNA sequence affected the aggregation propensity of DML molecules, their cluster characteristics and their associative behavior with RNA. Based on the behavior observed in the DMLRW model, in the DMLHRW system we opted for including a larger number of DML molecules compared to that roughly estimated to be necessary, for the coverage of the contour of RNA. In this manner we could perform our calculations under ''equilibrium'' conditions as far as it concerns the number of DML particles that may be exchanged, between the clusters formed close to RNA and those dispersed in the aqueous phase. All the results presented in this section originate from analysis of the part of the trajectory in which the system had reached a steady conformational state. This aspect will be further discussed in Section 3b. During the simulations in the DMLHRW model, no definite shape or other secondary-structure formation was observed for RNA in the examined time window.
To obtain information regarding the relative arrangement of the components in the studied system, we examined appropriate pair radial distribution functions between RNA and DML molecules and between only DML molecules. The radial distribution function is essentially a pair property, which is proportional to the conditional probability to find particles of one kind at a specific distance from those of the same or of another kind. 66 Here, each molecule is taken as a single particle represented by its center of mass. A peak in the rdf at a distance r, indicates higher probability to find neighboring particles separated by this distance. Fig. 6 shows the pair rdfs between DML molecules and between RNA and DML molecules, where such peaks can be observed. The intensity of the peak, which corresponds to DMLs is higher and its width is smaller compared to the analogous features of the DML-RNA rdfs. These characteristics indicate a DML-DML intermolecular arrangement with higher spatial stability, which is consistent with the cluster formation. They also imply that DML-DML interactions are more favorable than those between DML molecules and RNA. The peak in the DML-RNA rdf indicates also an associative behavior between the two molecular species and therefore implies complexation between the formed DML clusters and RNA. To check whether there was any specific trend regarding this association with respect to the protonation state of the DML particles, we computed and compared DML-specific RNA rdfs (i.e., doubly protonated, singly protonated and neutral). This comparison is shown in Fig. 6b where the DML-RNA rdf curve (shown in Fig. 6a), is also included.
Although all curves are of similar shape and width, the peak heights of the rdfs corresponding to the protonated DML particles are higher compared to that describing the neutral molecules. This can be attributed to the enhanced electrostatic interactions between RNA and the protonated DMLs, resulting in a better localization of the latter with respect to RNA. The difference in peak-height is smaller between single and doubly protonated DMLs, while the respective all-DML (i.e., non-chargespecific) rdf is an average of the charge-specific curves.
A characteristic snapshot of the DMLHRW system at an equilibrated state is presented in Fig. 7. It is shown that apart from DML clusters dispersed in the bulk aqueous phase, a significant number of DML particles have formed clusters, which practically cover the entire RNA backbone, forming thus a potentially protective layer around it. The average end-to-end distance of the RNA in the complex was found to be approximately  43.2 nm. Application of the filtering procedure described earlier, allowed the calculation of the percentage of DML molecules of the three kinds in the vicinity of RNA. It was found that this proportion ranged between 45 and 50% for all kinds of DML particles. Fig. 8 compares the histograms of the cluster sizes resulting by analyzing either all clusters, or only those close to RNA. In the case where all the clusters are taken into account, together with the single-DML state or the doubly populated aggregates, clusters comprising a considerably larger number of DML particles are present as well. A close view of the main curve, focusing on clusters constituted by more than 3 DML molecules, is shown in the inset of Fig. 8. A broad distribution of DMLs per cluster, between 5-20 molecules, is observed. Peaks are detected at DML numbers of about 6 and between 10 and 12, while clusters of even bigger size are formed. Taking into account the values beyond 3, the total number of formed clusters in the solution is B55 and a weighted average on the number of DMLs in a cluster provides a mean cluster occupancy of 9, with the most populated cluster comprised of 22 DML molecules.
Comparison to the distribution corresponding to the clusters close to RNA shows that oligomeric clusters are mostly distributed in water rather than close to RNA (note the difference in the area under the corresponding peaks). Moreover, it appears that on average, the non-oligomeric clusters, which are formed around RNA are somewhat smaller (DMLs per cluster B7.2) compared to their analogues in the bulk water phase (i.e., afar from RNA). Considering only clusters populated by at least 4 DML particles, integration of the distribution corresponding to the clusters close to RNA, renders a number close to 35. If we take into account the average number of DMLs per cluster, it follows that approximately 0.4 DMLs are needed to cover one nucleotide. Based on this crude estimation, the RNA backbone will be covered by 252 DMLs; the number of DML molecules was taken to be significantly larger compared to this estimation in the DMLHRW system.
A more detailed analysis is given in Fig. SI-3 of the ESI, † where sub-clusters formed by the 3 different kinds of DMLs are shown separately. In the corresponding histograms all clusters are taken into account. Qualitatively similar distributions are observed for the three kinds of sub-clusters. However, aggregations of protonated DML molecules in the formed clusters (i.e., charged sub-clusters) appear to be less populated compared to those of neutral molecules. Fig. 9 depicts a characteristic configuration of a large cluster comprised of 21 DMLs in total, 11 of which are neutral, 4 are singly protonated and 5 are doubly protonated. The radius of gyration of such highly populated clusters (taking into account clusters comprised of 19 to 22 DMLs) was found to be 16.9 AE 0.5 Å. For comparison, the radius of gyration of an averagesized cluster (comprised of 7-10 DMLs) was found to be 14.8 AE 0.6 Å. Notably, the average occupancy of the clusters formed in the DMLHRW system, and their average dimensions (as measured by their corresponding radii of gyration) are very close to those observed in the oligonucleotide model (see Section 3a-II).
To further characterize the formed aggregates in terms of their internal structure and their charge, we have calculated the density and the charge distribution, both for a highly populated and an average sized cluster. Due to their almost spherical shape, the analysis was based on spherical shells of increasing radius, starting from the geometric center of each cluster. The so-calculated density and charge profiles are presented in Fig. 10a and b, respectively. Focusing on Fig. 10a, for the large cluster a high intensity peak is observed in density at very short distances (i.e., B1-2 Å), which implies a rather compact core. This is followed by a broader peak around 1 nm whereas, at longer distances the density decreases gradually to zero. However, in the averagesized cluster no compact core is observed, while the density profile exhibits a more uniform distribution. This behavior is closer to that observed in the DMLW model (Fig. 3a), in terms of the existence of a plateau region of average density. In the charge profiles, intense fluctuations are present in the interior of the cluster (i.e., for distances of up to B1 nm) in both the clusters, with most of the minima reaching values close to 0. In addition, it can be noted that close to the exterior of the clusters a rather low positive charge is observed. Combining the information  from the density and the charge profiles, it appears that there is a tendency of the highly populated clusters to grow around the protonated (and somewhat more massive) DML particles. Therefore, apart from the hydrophobic interactions, electrostatic forces play a central role in the characteristics of the formed clusters.
Along with Coulombic interactions, other interactions such as hydrogen bonding are also expected to drive the associative behavior between the clusters of the DML particles and RNA. 66 To examine the propensity of the DML molecules to form hydrogen bonds with the nucleotides, we have examined hydrogen bond formation between DMLs and between DMLs and RNA. Detection of hydrogen bonding was based on the following geometric criteria associated with an acceptor A and a donor D bonded to a hydrogen (H) atom: (1) the distance r between D and A, r(DÁ Á ÁA) r 3.5 Å, and (2) the angle HDA r 301.
The so-estimated time-average number of hydrogen bonds are shown in Table 2. Values are statistical averages over the parts of the trajectory beyond which the system has reached a steady conformational state. The DML-DML hydrogen bonds shown in Table 2 refer to all clusters, whereas the corresponding numbers, if only clusters close to RNA are taken into account, are practically identical within the error margins. Note here that since hydrogen bonding is a dynamic process (i.e., hydrogen bonds are created and destroyed continuously), relatively large statistical errors are expected in their averages. The results in Table 2 confirm that hydrogen bonding contributes, to both the DML-DML and to the DML-RNA association.
All the above analysis does not reveal any substantial difference between the clusters of DMLs, which are formed in water (DMLW system) and the ones in the aqueous solution with the RNA molecule present. It is interesting to mention that the largest clusters in the solution (where a total number of 1032 DML molecules are dispersed) are comprised of 20-22 DMLs, which is the size of the clusters in bulk water as well.

3b. Dynamic properties
Self-assembly and cluster formation by the DML molecules in the aqueous environment are processes realized at very short timescales. Due to the all-atom representation of the molecules and the time resolution afforded by fully atomistic molecular dynamics simulations, such processes can be monitored in detail. The kinetics of aggregation of the DML molecules can be recorded, by starting from an initial (out of equilibrium) configuration, as the one shown in Fig. 11, and following the time evolution of the system towards an energetically stable state.
To follow the temporal evolution of this aggregation process, we have recorded appropriate measures, such as radial distribution functions at different time periods, the number of formed clusters and the extent of hydrogen bonding as a function of time. Fig. 12a portrays the rdfs between RNA and DMLs in the DMLHRW system at different time intervals within the simulation run. An abrupt increase of the first-neighbor peak is observed between 5 ns and 15 ns from the initial state, which indicates a rapid organization of DML molecules close to RNA.   At later times, a gradual enhancement of the main peak in terms of intensity and width takes place, until a convergence is observed beyond 75 ns, signifying the achievement of a stabilized conformational state. Similar stages of time-evolution are also noted in the rdf curves between DMLs, as illustrated in Fig. 12b. The number of hydrogen bonds formed between RNA and DMLs was also found to be an increasing function of time at short temporal scales. At later stages a stabilization of this number was observed, as shown in Fig. 13. It can be inferred that attainment of a stable average in this case was realized at a shorter timescale compared to that for the convergence of the rdf curves. In other words, hydrogen bond formation has already been completed (i.e., reached an equilibrium), while the DML aggregation process continues. Fig. 14 shows the time evolution of the number of formed DML clusters. Since at the starting configuration no clusters were present, the number of clusters increases initially with time, it develops a broad maximum at a timescale close to 12 ns, which is followed by a gradual decrease towards stabilization, after about 70 ns. The descent of the curve after the maximum, can be attributed to the coalescence of smaller aggregates of DMLs into larger clusters, which eventually reach an equilibrium size.
As it is described in Section 3a-III the percentage of DML molecules in the vicinity of RNA is 45-50% for all three kinds of DMLs, in the stable state. Analysis of the entire trajectory reveals that this quantity is an additional measure, which shows convergence after about 75 ns from the initial state, (data not shown here).
Summarizing the above information, it can be argued that the RNA-DML association mechanism involves many steps with different characteristic timescales. The self-assembly of DML molecules in clusters is the initial process performed within the first B10 ns of the simulation run, as it is revealed from the abrupt increase of the number of clusters in Fig. 14, which reaches a maximum at about 12 ns. Association of the clusters with RNA follows, as it can be inferred from the increase of the hydrogen bonding between RNA and DMLs, which reaches an almost constant value after 40-50 ns (Fig. 13). A further restructuring of the associated clusters towards an equilibrium size is the last process, which is accomplished beyond B70 ns, (Fig. 14).
Therefore, the characteristic timescale at which conformational equilibrium is practically reached, can be taken to be close to 75 ns. While all measures were found to have converged beyond this time window, faster processes have attained   a stabilized state at even shorter timescales. All statistical analyses presented in Section 3a, involving equilibrium properties, have been performed in the trajectory beyond that characteristic timescale. The corresponding timescales in the DMLRW system for the attainment of stabilized conditions as those described in Fig. 12-14 were found to be similar.
Although the formed DML clusters seem to come to a stable state in terms of size and composition, the molecules which constitute them, may move within the cluster volume, while particle migration events between clusters cannot be excluded. Such cluster restructuring takes place within a finite timescale. To get an estimation of this timescale, one can probe the collective motion of DML molecules and estimate the time it takes for the position of the center of mass of an DML particle to become uncorrelated with that of a different DML particle. An analysis based on the distinct van Hove correlation function 32 and a corresponding discussion is presented in the ESI. † This quantity indicates a gradual loss of memory of the initial local arrangement assumed by the DML molecules. It was found that internal cluster restructuring takes place at about B50 ns. Fig. SI-4 in the ESI, † presents distinct van Hove functions for the clusters formed by the three different kinds of DML particles for the DMLRW system. It must be noticed that this property concerns the dynamic behavior within the stabilized state (i.e., beyond 75 ns); therefore it provides another characteristic time of the system. Based on the similarity of the dynamic behavior noted between the DMLHRW and the DMLRW systems, a similar timescale is expected to characterize internal cluster restructuring in the DMLHRW model, as well.

Summary and conclusions
In this work we have employed fully atomistic molecular dynamics simulations, in order to explore the associative behavior between RNA fragments and a complexation agent bearing cationic ionizable sites (DML), in an aqueous environment and at neutral pH conditions. Three kinds of DML molecules, in terms of the protonation state of the two ionizable nitrogens, were considered in the simulations: electrically neutral, doubly protonated and singly protonated. Our goal was twofold. On one hand we intended to check whether such particles were capable of forming a structurally and energetically stable construct by complexing with RNA, in order to assess their potential for RNA delivery purposes. On the other hand, we aimed at understanding the role of different aspects related to the DML-DML and the DML-RNA self-assembly processes and to the equilibrium structural characteristics of the formed aggregates.
Due to the computationally demanding task of simulating fully atomistic systems in high water content, with RNA fragments bearing several hundreds of nucleotides, we also constructed an oligonucleotide-based system. Our objective was to check whether this might serve as a working model, through which generic behavior can be studied using only limited computational resources. For comparison purposes, an RNA-free aqueous solution of the DML molecules was simulated as well.
In all the systems, DML particles were found to self-assemble within a few tens of ns, forming clusters with comparable average sizes and composition. A notable observation is that similar clusters of DMLs are formed in bulk water (DMLW system) and in the aqueous solutions with the RNA molecule (DMLRW and DMLHRW systems), with the largest ones to be populated by 20-22 molecules in all cases. Their internal structure, in terms of density and charge distribution varied depending on the size of the cluster. Large-sized clusters were characterized by a denser core compared to low populated aggregates. For clusters in all the systems though, it was observed that the surface charge was kept rather low (below 0.5 |e À |). This near-neutral surface charge of the clusters might result in a prolonged half-life of the DML-RNA complex in the circulation, 67 mitigating at the same time against increased toxicity levels related to high-surface charge densities. 68 This rather low surface charge is expected to increase at more acidic conditions (such as in the endosomal lower pH environment) due to the sufficiently high pK a (B7) of the DML particles, and thus promote the endosomal membrane disruption and release. 68 This protonation mechanism of the carrier molecules was found to be of particular importance in mRNA delivery. 16,69 In the RNA-based models, electrostatic interactions and hydrogen bonding acted as the driving forces for the association of the formed DML clusters with RNA. The number of hydrogen bonds between DML molecules and between DMLs and RNA are comparable, which indicates that hydrogen bonding contributes both to the DML-DML and to the DML-RNA association. However, on average, the DML-RNA association was found to be weaker compared to the DML-DML interactions. Such a moderate RNA-DML binding strength could potentially be beneficial for the RNA release at the final stages of the delivery process. The percentage of DML molecules, which are close to RNA, is similar for all three kinds of DMLs, i.e., 45-50% of the total number for each case. The larger-sized clusters were found to be formed close to RNA, whereas those comprising 3 or less DML molecules were preferably formed in the aqueous phase afar from RNA.
The RNA-DML association mechanism involved many steps with different characteristic timescales. Analysis of various dynamic properties indicated that instead of the observed association of individual DMLs with the oppositely-charged nucleotides, self-assembly of these molecules in clusters was the initial process performed within the first B10 ns. Association of the clusters with RNA follows, while a further enlargement of the formed clusters towards an equilibrium size is the final process, which is realized at timescales beyond B70 ns. The final, conformationally stable DML-RNA complex comprised several clusters around which RNA was wrapped, in such a manner that the entire RNA contour was covered by them. This form of the RNA-DML construct might be promising for providing protection e.g., against enzymatic attack, during an actual gene-delivery process.
Although the above-described behavior was recorded for the systems including an excess of the neutral-lipid molecules (see Table 1), and with different relative lipid/water concentrations, it appears that, at least within the examined degree of variation, the lipid-lipid and the lipid-RNA associative phenomena are not very sensitive to the stoichiometry of the systems. For instance, it appears that the percentage of participation of DML0 and DML2 lipids in the clusters is somewhat lower compared to that of DML1 molecules (see Fig. SI-3, ESI †), despite the fact that in absolute numbers, DML1 lipids are well outnumbered by their DML2 analogues (see Table 1). In addition, the percentage of all different kinds of lipids close to RNA was found to lie between 45% and 50% irrespective of their charged state and of their stoichiometry (as follows from the comparison between the DMLRW and the DMLHRW systems). This also implies that the cluster composition close to RNA is not dictated solely by the stoichiometry of the lipids with different charged states, but also by the balance of the relevant interactions (in the vicinity of RNA, electrostatic and hydrogen bonding interactions come into play as well). However, this balance (and thus features of the clustering behavior and the affinity between the lipid cluster and RNA) might be affected if very different stoichiometries from those considered here are used.
It should also be noted that in all the different properties checked between the DMLRW and the DMLHRW systems, the two models exhibited similar behavior. It can therefore be inferred that smaller in size RNA fragments may serve as computationally affordable models, for examining basic aspects of the physicochemical behavior related to the association of RNA with ionizable cationic complexation agents, similar to those studied here.

Conflicts of interest
The authors declare that there is no conflict of interest regarding the publication of this article.