Role of conformational heterogeneity in ligand recognition by viral RNA molecules †

Ribonucleic acid (RNA) molecules are known to undergo conformational changes in response to various environmental stimuli including temperature, pH, and ligands. In particular, viral RNA molecules are a key example of conformationally adapting molecules that have evolved to switch between many functional conformations. The transactivation response element (TAR) RNA from the type-1 human immunodeficiency virus (HIV-1) is a viral RNA molecule that is being increasingly explored as a potential therapeutic target due to its role in the viral replication process. In this work, we have studied the dynamics in TAR RNA in apo and liganded states by performing explicit-solvent molecular dynamics (MD) simulations initiated with 27 distinct structures. We determined that the TAR RNA structure is significantly stabilized on ligand binding with especially decreased fluctuations in its two helices. This rigidity is further coupled with the decreased flipping of bulge nucleotides, which were observed to flip more frequently in the absence of ligands. We found that initially-distinct structures of TAR RNA converged to similar conformations on removing ligands. We also report that conformational dynamics in unliganded TAR structures leads to the formation of binding pockets capable of accommodating ligands of various sizes.


Introduction
RNA molecules have long been considered primarily as passive carriers of genetic information but this conception has changed in recent years due to enhanced understanding of the roles of RNA in different cellular processes including translation and transcription, 1 regulation of gene expression, 2 and protein synthesis. 3 RNA is also implicated in various diseases, including cancers, neurological disorders, and viral infections. [4][5][6][7] This involvement of RNA presents an opportunity to target RNA by small-molecules for influencing the progression of various diseases. 8 Viral RNA molecules are a compelling target for small-molecule therapeutics since many viruses have RNA genomes, for example, human immunodeficiency virus (HIV), hepatitis C virus (HCV), influenza virus, and severe acute respiratory syndrome coronavirus (SARS CoV/CoV2). [9][10][11] The variability in functions of RNA is rooted in its ability to undergo conformational changes that often lead to complex three-dimensional folds and an ensemble of structures which determine the function of RNA. 12,13 Conformational changes in RNA may be due to changes in physiological conditions or due to binding of ligands (proteins, small ligands, and ions). [14][15][16] The conformational flexibility in RNA can also lead to formation of transient binding pockets that can be exploited for drug design. 9,17,18 While viral genomes encode for a limited number of protein targets, often considered ''undruggable'', 8 conserved and structured RNA motifs in viral genomes are viable targets to discover binding pockets for small molecules. 8,9 Although our understanding of RNA dynamics has increased via different experimental techniques, 8,[19][20][21] especially those of RNA structure determination, the role of dynamics of all structural motifs in RNA and their coupling to ligand binding has not been fully explored to date. 22 Several experimental techniques including X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy can provide information on the dynamical properties of nucleic acids, [23][24][25] but even these methods are often limited in probing all possible parameters that can fully describe the dynamics in RNA molecules. 16,26 However, computational tools can further enhance our understanding of RNA dynamics by providing additional insights at the atomic level. Moreover, these tools can potentially assist in identifying binding pockets that can be explored in drug design.
TAR RNA (Fig. 1A) from HIV-1 is a key model system to study conformational transitions in RNA molecules due to its ability to adapt multiple states. 12,25 The TAR RNA is located at the 5 0 end of HIV-1 transcripts where it interacts with the viral transactivator (Tat) protein and the host cofactor cyclin T1 to promote efficient transcription of the downstream genome and is therefore considered to be an important drug target. Several previous studies have shown that TAR RNA can bind to peptide mimics, [27][28][29][30][31][32] to small molecules, [33][34][35][36] to proteins, [37][38][39][40] and to divalent cations 41 (examples are shown in Table S1, Fig. S1, ESI †). TAR RNA has been studied using NMR methods, 25,27,[42][43][44][45][46][47][48][49][50][51] coarse-grained MD simulations, 52 electron paramagnetic resonance (EPR), 53 gel mobility, 54 combinations of NMR and MD methods, 24,44 and combinations of NMR and structure prediction software. 55 Collectively, these studies have shown that TAR RNA undergoes complex dynamics by sampling different interhelical conformations around the bulge junction, thus forming various conformational ensembles. 12,43 While detailed dynamics in TAR RNA have not been characterized in the presence of all known ligands, it has been suggested that ligands can potentially induce structural transitions in TAR by stabilizing pre-existing conformers or an ensemble of states in the apo TAR RNA structure. 16,49,56,57 Several studies have revealed that a bulge motif in TAR RNA is especially critical for its recognition by the Tat protein. 27,42,58 Therefore, the bulge motif has been exploited in the design of inhibitors to disrupt the TAR/Tat interaction (Fig. S2A, ESI †). 32,59 As shown in Fig. S2A (ESI †), peptide ligands mostly interact with the apical loop (orange in Fig. 1A), helix II (blue in Fig. 1A), and the bulge motif (red in Fig. 1A), while small molecules are scattered between helices I and II (cyan and blue in Fig. 1A). The ligands differ in size and the charge value, which results in an increased buried surface area (BSA) as the size of the ligand increases but the structural changes in TAR RNA are not correlated with the ligand size or BSA (Fig. S2B, ESI †).
In this work, we studied dynamics in TAR RNA by conducting long time-scale MD simulations that were initiated from 13 different initial conformations of TAR with ligands, and from 14 conformations without these ligands, 13 conformations after removing ligands and one conformation based on the experimental apo structure (Fig. 1B, Fig. S3 and Table S2, ESI †). By utilizing several different initial structures, we aimed to obtain a broader conformational mapping of TAR RNA which has not been carried out yet. Moreover, we reveal the effect of ligand binding on the dynamics in TAR RNA by comparing unliganded and liganded simulations.

System preparation
We have studied the dynamics in HIV-1 TAR RNA using 27 different initial conformations with/without ligands ( Fig. 1B and Fig. S3, ESI †). The initial coordinates for these conformations were obtained from the Protein Data Bank (PDB codes:  1ANR, 1ARJ, 1LVJ, 1QD3, 1UTS, 1UUD, 1UUI, 2KDQ, 2KX5,  2L8H, 5J0M, 5J1O, 5J2W, 6D2U). [27][28][29][30][31][32][33][34][35][36]42 Several of these structures had either a different type of nucleotide or a different number of atoms in the deposited structure files. We selected the 1ANR conformation as our standard set of nucleotide The apo conformation of TAR (PDB code 1ANR) is shown at the center (black cartoon) and superimposed onto other TAR RNA initial states. The initial conformations are placed in a circle such that the RMSD of the initial state relative to the apo conformation increases counterclockwise, with the 5J2W structure having the least RMSD and the 1LVJ structure having the highest RMSD. sequence and mutated or removed those nucleotides or atoms in the other 13 systems that were different from the apo structure (PDB code: 1ANR) which resulted in each TAR RNA system consistently having 29 nucleotides and 931 atoms (Table S2, ESI †). Each unliganded and liganded TAR structure  was solvated in a periodic simulation domain of TIP3P water  molecules where the overall number of atoms in various  systems ranged between 19 443 and 25 647 (Table S2, ESI †). The overall charge in simulations of unliganded systems was neutralized with 29 Na + ions while the number of Na + ions in the liganded systems varied depending on the charge of the ligand.

Simulation details
All MD simulations were carried out and analyzed using software packages AMBER, CPPTRAJ and VMD 60-62 combined with the AMBER force-field for RNA (ff99OL3) 63,64 and for peptides (ff14sb). 65 For solvent, TIP3P water model 66 and for ions the Li/Merz parameters were used. 67 The Antechamber package was used to design force fields for small molecules by using the general AMBER force-field (GAFF) with the AM1-BCC charge method. 68,69 The temperature and pressure were maintained at 300 K and 1 atm using the Langevin thermostat and the Berendsen barostat. The steepest descent minimization was performed for 1000 steps followed by 100-500 steps of conjugate gradient minimization. The periodic boundary conditions were used with a cutoff of 9.0 Å for nonbonded interactions. Each of the 27 systems was subjected to a 2 ms long MD simulation in the NPT ensemble with a 2 fs timestep, which resulted in the overall 54 ms dataset and the frames in each trajectory were saved every 20 ps.

Conformational metrics
Torsional flexibility. The overall torsional flexibility of each TAR RNA structure was investigated by computing (from MD simulation data) all backbone dihedral angles and the w dihedral angle (which describes the relative position of a nucleobase relative to the sugar group). The dihedral angles were computed for each nucleotide in each system across the entire trajectory (100 000 values per MD trajectory). The resulting values of dihedral angles were then used to compute their normalized distributions. These distributions were further compared against the typical ranges of values of dihedral angles known from experimentally determined structures of nucleic acids that were extracted from the Protein Data Bank and reported in the textbook by Tamar Schlick ( Fig. 2A and Fig. S4, ESI †). 70 Buried surface area (BSA). We calculated the BSA for each ligand in the liganded simulations. The BSA was computed using the following expression: where SASA R represents the solvent accessible surface area (SASA) of the RNA, SASA L represents the SASA of the ligand, and SASA RL represents the combined SASA of the RNA/ligand complex. The BSA values indicate the area of contact between a ligand and the TAR RNA conformation.
Root mean squared deviation (RMSD). We calculated the allatom RMSD for each system and for nucleotides in the bulge motif (U23, C24, and U25) to understand the effect of ligand binding on the overall TAR RNA structure. The alignment of each structure was performed against all non-hydrogen atoms in the initial state. The RMSD values indicate changes in the TAR RNA structure relative to a reference state. We used the initial conformations as well as the average structures computed from each simulation as our reference states.
Root mean squared fluctuation (RMSF). We computed the backbone phosphorous (P) atom based RMSF per residue to further study the flexibility of each nucleotide and DRMSF to compare the differences in dynamics between unliganded and liganded simulations. A negative value of DRMSF signifies decreased fluctuations in the presence of ligands or increased fluctuations in the absence of ligands.
Average structure. We computed the average structures of TAR RNA from each unliganded and liganded simulation using the CPPTRAJ 61 program in the Amber software. The global rotational and translational motions were removed prior to computing the average structure. We then cross-compared all average structures using the RMSD as a comparison metric.
Clustering analysis. To determine the population of similar TAR RNA conformations in MD simulations, we performed a clustering analysis using the CPPTRAJ 61 tool with DBSCAN 71 clustering algorithm. We used the P-atom based RMSD in each TAR RNA conformation as a distance metric and a minimum of 25 conformations (RMSD within B1.0-1.5 Å) were required to form a cluster. In addition to estimating clusters in individual trajectories, we performed combined cluster analysis (CCA) on the entire dataset of unliganded simulations and also on the entire dataset of liganded simulations by combining all trajectories. We used DBSCAN clustering algorithm with the minimum number of points set to 50 and the RMSD was set as a distance metric (RMSD within 1.9 Å). To conserve memory, we used every second frame of each trajectory resulting in 700 000 and 650 000 frames for combined clusters of unliganded and liganded simulations, respectively. The initial ''sieve'' value was set to 40 to form initial clusters, which means that every 40th frame was used to generate an initial cluster, resulting in 17 500 and 16 250 initial frames from unliganded and liganded simulations.
Helical dynamics. The TAR RNA structure consists of two helices (termed Helix I and II) that are linked by a flexible three nucleotide bulge motif (Fig. 1A). The dynamics in these helices were characterized using the g 1 and g 2 angles, which describe the twist of each helix around the helical axis, and by the f angle which describes the relative position of the helices. The f angle was defined between the centers of mass of the two helices. For the calculation of angles, Helix I was defined by the base pairs G17-C45, G18-C44, C19-G43, A20-U42, G21-C41, and A22-U40 (cyan in Fig. 1A), while Helix II was defined by the base pairs G26-C39, A27-U38, G28-C37, and C29-G36 (blue in Fig. 1A). The axes in helices were defined as passing through the centers of mass of the bottom and top base pairs in respective helices and the CPPTRAJ 61 program was used to compute the twist angles.
Nucleotide flipping. We characterized the flipping of nucleotides in the bulge motif (highlighted in red; Fig. 1A) using the pseudo-dihedral angle computed between the centers of mass (COM) of four groups of atoms: the nitrogenous bases of U40 and A22, or C39 and C26, sugar moiety attached to A22 or C26, sugar moiety attached to U23, C24, or U25, and the nitrogenous base of U23, C24, or U25. The definition of the flipping angle was adopted from previous studies. 72,73 In all systems, the inward (flipped-in) state of a nucleotide corresponds to pseudo-dihedral angle values between À601 and 601 but other values of the angle characterize an outward (flipped-out) state.
Binding pocket analysis. To characterize the binding pockets in TAR RNA conformations based on unliganded MD simulations, we used the MDpocket tool, which is an open-access pocket detection tool for MD trajectories. 74 Before analyzing each frame from MD simulations for pocket analysis, each trajectory was aligned to the initial structure based on the backbone P-atoms.

Assessment of torsional flexibility and ligand stability
We first assessed the overall torsional flexibility of TAR by computing all backbone or glycosidic dihedral angles from unliganded and liganded simulations ( Fig. 2A and Fig. S4, ESI †). Specifically, the distributions of these dihedral angles ( Fig. 2A) were computed from the combined data of 14 unliganded simulations and similarly from the combined data of 13 liganded simulations. We found that the values spanned by these dihedral angles are consistent with the known ranges of dihedral angles in experimental structures of nucleic acids (marked by transparent gray lines on angle distributions in Fig. 2A), thereby highlighting that the TAR RNA conformations generated with the AMBER force-field are consistent with the expected dynamics in the backbone of RNA structures.
We also assessed whether the ligands remained associated with each TAR RNA structure during MD simulations. In Fig. 2B, we show the histograms of the mean values of BSA from liganded MD simulations (darker shades) along with the initial BSA values of ligands in various TAR structures (lighter shades). These BSA data are organized into three groups based on the distribution of BSA values in MD simulations in comparison to initial BSA values. We observed that most systems exhibited similar or higher ligand BSA values (e.g. group 1 systems with PDB codes 1UTS, 1UUD, 1UUI, 2KX5, 5J1O, and 5J2W) in comparison to the initial BSA due to conformational rearrangements of ligands in the binding pocket that led to a deeper burial of some ligands in the binding pocket (e.g., see Fig. S5, ESI †). However, some systems exhibited fluctuations in nucleotides which allowed ligands to conformationally rearrange in the binding pocket and partially move out of the initial pocket, as indicated by the decreased BSA values (e.g. group 2 systems with PDB codes 1LVJ, 1QD3, and 5J0M; Fig. 2B).
A larger decrease in BSA of the ligand was observed in systems organized in group 3 (e.g. PDB codes 2KDQ, 2L8H, 6D2U, and 1ARJ) indicating that the ligands in these systems exhibited increased rearrangements in the binding pocket. For example, the ligand arginine amide in one of the TAR RNA structures (PDB code 1ARJ; marked by a red asterisk in Fig. 2B) exhibited brief dissociation for about B18 ns before binding again in the initial binding pocket (Fig. S6, ESI †). This observation is consistent with the largest dissociation constant for this ligand 27 and the smallest size of this ligand among all ligands studied (Table S1, ESI †). However, we did not observe full dissociation of any ligand during MD simulations.

Ligands rigidify TAR RNA
To assess the differences in conformations of TAR RNA in unliganded and liganded states, we first computed the RMSD values with respect to the initial structure in each simulation ( Fig. 3A; lighter and darker shades for unliganded and liganded simulations, respectively). We observed that the unliganded systems diverged from their respective initial states on average by 5.28 AE 1.12 Å and the liganded systems by 4.52 AE 1.19 Å. For several systems, we observed a significant decrease in mean RMSD values and no overlap in error bars in liganded states compared to unliganded states (group 1; Fig. 3A). About half of liganded systems fluctuated more and thus had less distinct RMSD distributions in comparison to respective unliganded systems, as characterized by overlapping error bars (group 2; Fig. 3A). Despite having less distinct distributions, these liganded simulations still showed lower mean RMSD values in comparison to unliganded simulations (group 2; Fig. 3A). Two systems (PDB codes 1UUD and 2L8H) had almost no difference in their RMSD distributions in liganded and unliganded states (group 3; Fig. 3A). These data suggest that the TAR RNA structures became conformationally more rigid when ligands were present since liganded structures deviated to a smaller extent from their initial conformations in comparison to unliganded simulations. However, one system (marked by an orange asterisk in group 3; Fig. 3A) showed increased fluctuations and a higher mean RMSD in the liganded state in comparison to the unliganded state. On further probing this structure (PDB code 1LVJ), we found that the structural deviation is likely a result of ligand rearrangements in the binding pocket that conformationally altered the TAR RNA structure from an initially bent conformation to a relaxed conformation with a higher RMSD value (Fig. S7, ESI †). The fact that the presence of a ligand caused higher perturbations in the TAR RNA structure that resulted in an unexpected conformational behavior needs to be considered in designing new inhibitors that target TAR RNA.
We further report the DRMSF values per residue based upon unliganded and liganded simulations to understand the effect of ligand binding on the flexibility of a particular residue or a motif (Fig. S8, ESI †). In Fig. S8 (ESI †), we show the difference between the RMSF values (DRMSF) of the unliganded and liganded simulations where a negative value corresponds to an increased flexibility on ligand removal. We observed that in eight systems, all of the residues became more flexible when ligands were removed (e.g. systems with PDB codes 1QD3, 1UTS, 1UUI, 2KDQ, 2KX5, 5J0M, 5J1O, and 6D2U). In four systems, we observed larger flexibility in a portion of residues in the liganded simulations (e.g. systems with PDB codes 1ARJ, 1UUD, 2L8H, and 5J1O). Importantly, one of those systems (PDB code 5J1O) had only one residue (U25) with increased flexibility in the liganded state due to the flipping out of that nucleotide during the simulation. Finally, in one system (PDB code 1LVJ) all of the residues became more rigid after ligands were removed which was consistent with the aforementioned observation that the fluctuations and mean RMSD for this specific system was higher in the liganded conformation (Fig. 3A).
Due to the significance of the bulge motif in ligand recognition, 27,58 we also separately computed the RMSD values for the bulge motif (Fig. S9, ESI †). Overall, the bulge motifs in unliganded and liganded systems deviated by 7.82 AE 1.83 Å and 5.6 AE 1.73 Å, respectively. In particular, we observed that the majority of systems had higher mean RMSD values in the unliganded states with the exception of the system with the PDB code 1LVJ that had a higher mean RMSD value of the bulge The fraction of conformations (F conf ) from a given simulation (100 000 conformations per simulation) that constitute the most populated cluster for each system in unliganded (lighter shades) and liganded (darker shades) simulations. Each bar corresponds to a unique system. The purple and orange asterisks indicate those systems in which F conf was higher in unliganded simulations than in corresponding liganded simulations. A black asterisk marks the experimental apo TAR structure (PDB code 1ANR). motif in the liganded state (Fig. S9, ESI †). This observation is also consistent with DRMSF data that showed increased flexibility of the bulge nucleotides in the system with the PDB code 1LVJ (Fig. S8, ESI †). The behavior of the bulge motif in this system is coupled with the ligand movement in the binding pocket and the local rearrangements of bulge nucleotides that result in the overall change in the conformation of TAR RNA (Fig. S7, ESI †). These data suggest that the TAR RNA structures and the bulge nucleotides in liganded systems deviated to a smaller extent from their initial conformations compared to in the unliganded systems.

Comparison of average structures and formation of conformational clusters
We further investigated the conformational variability of our data by comparing the average structures from each simulation and by performing a cluster analysis. Using RMSD as a conformational metric, we cross-compared all initial TAR RNA structures before simulations were initiated (Fig. S10A, ESI †) as well as by obtaining the average structures of TAR RNA from each unliganded (Fig. S10B, ESI †) and liganded simulation (Fig. S10C, ESI †). This cross-comparison showed that the RMSD between a pair of structures decreased in unliganded simulations (Fig. S10B, ESI †), thereby indicating that the TAR RNA structures became on average more similar to each other in unliganded simulations. In liganded simulations, the RMSD between a pair of structures also decreased on average, but to a smaller extent in comparison to the unliganded simulations and the structures were still reasonably distinct (Fig. S10C, ESI †). For example, the initial RMSD between the structures with the PDB codes 1LVJ (orange) and 2L8H (purple) was 7.91 Å (dark purple bar in Fig. S10A, ESI †). After unliganded and liganded simulations, the RMSD between these systems for average structures was 3.8 Å and 4.8 Å, respectively (dark purple bars in Fig. S10B and C, ESI †). We also observed that the average structures of all systems obtained from unliganded MD simulations adopt conformations similar to the average structure obtained from an MD simulation of the experimental apo system (1ANR) (Fig. S11, ESI †).
To further assess the fluctuations and the flexibility in TAR RNA, we computed the RMSD values in the course of each simulation with respect to the average structure of the corresponding simulation (Fig. S12, ESI †). We observed that the unliganded systems diverged from their respective average structures by 3.07 AE 0.97 Å and the liganded systems by 2.31 AE 0.57 Å. The majority of the systems had a decrease in mean RMSD values and smaller magnitude of fluctuations in the liganded states compared to the unliganded states (group 1; Fig. S12, ESI †). Three systems exhibited very similar RMSD distributions in unliganded and liganded states (group 2; Fig. S12, ESI †), however, two of them still showed decreased mean RMSD values in the liganded state (PDB codes 1ARJ and 1UUD). Finally, one system (PDB code 1LVJ; Fig. S12, ESI †) showed increased fluctuations and a higher mean RMSD value in the liganded state in comparison to the unliganded state which is consistent with the observations described earlier.
Overall, this metric showed that the fluctuations in the TAR RNA structures decreased in the presence of ligands.
In addition to comparing the average structures from each simulation, we performed clustering analysis to detect similarities among structures within each simulation and to understand the effect of presence of ligands on conformational variability in TAR RNA. In Fig. 3B, we present the fraction of conformations in the most populated clusters derived from each unliganded and liganded simulation along with more comprehensive details on the distributions of clusters in Fig. S13 and S14 (ESI †). We observed a larger variation in conformational clusters in unliganded simulations in comparison to liganded simulations. For example, only two unliganded systems (PDB codes 1UUD and 2KDQ) had a cluster that contained at least 75% of structures while eight liganded simulations (PDB codes 1QD3, 1UUD, 1UUI, 2KDQ, 2KX5, 5J0M, 5J2W, and 6D2U) had a cluster of this type (Fig. 3B). The only exceptions were the unliganded systems with the initial structures based on PDBs 1LVJ (orange histograms in Fig. 3B) and 2L8H (purple histograms in Fig. 3B) that contained clusters with a higher fraction of conformations in the most populated cluster than in corresponding liganded simulations. Importantly, all liganded systems with peptide ligands (PDB codes 2KDQ, 2KX5, 5J0M, 5J1O, 5J2W, and 6D2U), except for the initial structure with the PDB code 5J1O, had the most populated cluster containing the majority of conformations (Fig. 3B). This analysis further supports the observation that ligands in general rigidify the TAR RNA structure by restricting its motion within an ensemble of structures that constitute the most populated cluster.
We also performed the combined cluster analysis (CCA) to further investigate conformational clusters in TAR simulations that were initiated with distinct initial structures. In Fig. S15 (ESI †), we show F conf for the top clusters from the datasets of unliganded and liganded simulations. The CCA of unliganded simulations revealed three clusters that contain more than 5% of the total number of configurations each and are composed of multiple systems (Fig. S15A, ESI †). The CCA of liganded simulations, that was performed at the same value of the RMSD metric for constructing clusters as for the CCA of the unliganded simulations, revealed only one cluster which contains most of the systems (Fig. S15B, ESI †). These observations show that a number of our simulations, that were initiated from distinct initial structures, have conformations that are similar to each other.

Ligands alter helical dynamics in TAR RNA
The TAR RNA structure is comprised of two helices (termed Helix I and II in Fig. 1A) and the dynamics in these helices are described using the angles, g 1 and g 2 ( Fig. 4A and B), which describe the twist of each helix around the helical axis and by the interhelical bending angle (f) which describes the relative positioning of helices (Fig. 4C). We observed that the initial g 1 values for all systems were between 291 and 341 except for one system (PDB code 2L8H) where the angle was 66.51. Similarly, the g 2 values were between 271 and 381 for all systems except one system (PDB code 1QD3) where it was 161. Based on the angle distributions from MD simulations, we observed that g 1 was mostly confined between À901 and 901 for the unliganded and liganded systems (Fig. 4A). Most of the liganded systems showed a decrease in the width of populated angles in comparison to the analogous systems in the unliganded form with the exception of the system with the PBD  code 2L8H (depicted in purple color in Fig. 4) which exhibited a higher twisting in the Helix I with g 1 angles between 701 and 1151. The system with the PBD code 1ARJ in the liganded form showed values similar to the unliganded conformation given conformational rearrangements in the ligand and a decreased BSA ( Fig. 2 and Fig. S6, ESI †). The average g 1 values for the unliganded and liganded systems were estimated to be 19 AE 401 and 20 AE 331, respectively. Overall, the presence of ligands decreased the standard deviation in g 1 by 71, thereby leading to narrower g 1 distributions.
We also observed that Helix II showed more flexibility compared to Helix I in both liganded and unliganded simulations. In the unliganded simulations, g 2 was mostly distributed between À901 and 1051 with the exception of the structures with PDB codes 5J0M, 2KDQ, and 5J2W that spanned additional conformations between À1051 and À1401, between À1201 and À1301, and between À1051 and À1451, respectively. The g 2 angle in the liganded systems was mostly confined between À1051 and 1201 but even though the width of distributions were similar, the number of states that were populated decreased. The average g 2 values for the unliganded and liganded systems were estimated to be 12 AE 511 and 11 AE 461. Overall, the presence of ligands decreased the standard deviation in g 2 by 51.
We observed that f is mostly confined between 251 and 1051 for the unliganded systems, with the exceptions of structures with the PDB codes 1QD3 and 2KDQ, which occupied states with angles between 251 and 801, and 1UUD which occupied states between 451 and 851. The distributions of f in the liganded systems became narrower (ranging between 351 and 751) with the exception of the structures with the PDB codes 1LVJ, 1ARJ, and 1UTS which spanned angles between 251 and 1051, 251 and 1001, and 451 and 1101, respectively. Overall, the structural bending in TAR RNA decreased in the liganded systems, except for the systems with initial states based on the PDB codes 1UUD, 1UTS, 1LVJ and 1ARJ which have distributions similar to their unliganded systems. The average f for the unliganded and liganded systems was estimated to be 70 AE 141 and 60 AE 111, respectively. While previous NMR analysis has suggested high amplitude bending and twisting motions in TAR RNA helices, 24 our data further suggest that the conformations of helices in TAR RNA are altered and stabilized on ligand binding.

Ligands stabilize nucleotide flipping in TAR RNA
Beyond global motions in TAR structures, we further probed local motions in key motifs such as the bulge region, which is considered important for the viral replication process because the rearrangements in nucleotides in this region (U23, C24, and U25) determine the orientation of helical motifs (Helix I and II) in TAR. 27,42,58 Specifically, the outward flipped conformations of nucleotides C24 and U25 facilitate coaxial stacking of Helices I and II, thereby rigidifying the TAR structure. In Fig. 5, we show the nucleotides used in defining the flipping angle (y) and the time-traces of y for three bulge-nucleotides, as obtained from unliganded and liganded simulations. The inward flipping of a nucleotide is characterized by y values between À601 and 601, and the outward flipping for all other y values.
The first bulge-nucleotide U23 (Fig. 5A) was initially in a flipped-in state in most structures, except in two structures (PDB codes 1QD3 and 1UTS) where it was in a flipped-out state (as marked by a filled circle on the y-axis at t = 0 in time-traces of y in Fig. 5A). We observed that U23 flipped out during five unliganded simulations (PDB codes 1ARJ, 1UUI, 2L8H, 5J2W, and 6D2U) in which U23 was initially in a flipped-in state (traces with lighter shades in Fig. 5A and Fig. S16, ESI †). In most of these systems U23 eventually returned to its initial position after briefly transitioning to a flipped-out conformation. For example, in one of the unliganded simulations (PDB code 2L8H; light purple time-traces in Fig. 5A) U23 flipped outward at t = B0.9 ms, maintaining the flipped out state for B0.5 ms, and then flipping inward, resuming its initial position.
We observed a similar conformational behavior in systems with PDB codes 6D2U (light brown time-traces in Fig. 5A) and 5J2W (light cyan time-traces in Fig. S16A, ESI †). In another unliganded system (PDB code 1UUI), U23 flipped out at t = B1.25 ms and remained in a flipped-out state until the end of the simulation (light yellow time-traces in Fig. 5A). However, in one system (PDB code 1UTS) we observed conformational transitions in both unliganded and liganded simulations (cyan time-traces in Fig. 5A), where U23 flipped inward from an initially outward conformation with y = 130, retaining the inward position for almost the entirely of liganded simulation and flipping outward after B1 ms in the unliganded simulation. In all liganded simulations, U23 transiently flipped outward only in one system (PDB code 1ARJ; Fig. S16A, ESI †). Overall, we observed that the presence of ligands significantly decreased conformational transitions in U23. This conformational behavior is consistent with observations from experiments, where a smaller pool of ligands (arginine amide and a linear as well as a cyclic peptide) were tested. 75 However, we consistently observed conformational stabilization of U23 for several ligands with different binding affinities.
The second bulge-nucleotide C24 (Fig. 5B) was initially in a flipped-out conformation in all systems, except in three systems (PDB codes 1ANR, 1LVJ, and 1QD3). We observed that C24 flipped inward in three unliganded simulations (PDB codes 1ARJ, 5J1O, and 1UTS; lighter shade time-traces shown in red, blue, and cyan in Fig. 5B). For example, C24 flipped inward in one of the unliganded systems (PDB code 1ARJ; Fig. 5B) and remained in the inward conformation until B1.15 ms. However, in other unliganded systems (PDB codes 5J1O and 1UTS), C24 flipped inward at the beginning of simulations where it either remained flipped inward during the entire simulation (PDB code 5J1O; Fig. 5B) or flipped outward and then flipped back inward toward the end of the simulation (PDB code 1UTS; Fig. 5B). For two unliganded systems where C24 was initially in an inward flipped conformation (PDB codes 1LVJ and 1QD3), it flipped outward toward the end of simulations (Fig. 5B and Fig.  S16B, ESI †). In liganded simulations, we observed that C24 flipped inward during two simulations (PDB codes 1ARJ and 1UTS; darker red and cyan time-traces in Fig. 5B) and flipped  Fig. 5B). Overall, we observed that C24 showed conformational transitions both in unliganded and liganded simulations, but less frequently in liganded simulations. The third bulge-nucleotide U25 (Fig. 5C) was initially in a flipped-in conformation in most structures except in three structures (PDB codes 2KDQ, 5J2W, and 6D2U). We observed U25 to be significantly flexible in both unliganded and liganded simulations since it flipped inward or outward in most of the systems. For example, U25 flipped outward from an initially inward conformation in several unliganded simulations (PDB codes 1UUI, 1UTS, 2KX5, 2L8H, and 5J1O; Fig. 5C and Fig. S16C, ESI †). It also flipped inward from an initially outward conformation during unliganded simulations of several systems (PDB codes 2KDQ, 5J2W, and 6D2U) in which it remained in the inward conformation until the end of each simulation (Fig. 5C). In several liganded simulations (PDB codes 1ARJ, 1UTS, 2L8H, and 5J1O), U25 flipped outward from an initially inward conformation (Fig. S16C, ESI †), while in other systems (e.g. PDB code 6D2U) it flipped inward from an initially outward conformation (Fig. 5C). Overall, we observed that all three bulge nucleotides (U23, C24, and U25) can conformationally transition between inward and outward states although ligands decrease the frequency of these transitions.  Given that the knowledge of binding pockets is useful in developing novel inhibitors, 8 we probed all unliganded simulations for the presence of binding pockets that may form as a result of conformational dynamics. In Fig. 6A and Fig. S17 (ESI †), we show several binding pockets (depicted as cyan surfaces overlaid on the initial structure) that appear in various regions of each unliganded TAR RNA structure (labeled B, L, H1, and H2 for the bulge region, loop region, and helices I and II, respectively; see also Fig. 1A).
We also assessed whether the density of pockets observed in specific regions in a structure could accommodate ligands in conformations observed in liganded TAR RNA structures. We found that the observed pockets were sufficiently large in size to encapsulate the ligand known to bind to that specific TAR RNA structure. For example, TAR RNA is known to bind to smallmolecule ligands (acetylpromazine and RBT158) in the bulge region, where we observed binding pockets (labeled B for PDBs 1ANR and 5J1O in Fig. 6A) large enough to accommodate each ligand (Fig. 6B). Furthermore, TAR RNA is also known to bind other ligands (neomycin B and JB181) in the Helix I and the apical-loop/bulge regions, where we observed binding pockets (labeled H1 for PDB 6D2U and L/B for PDB 1UUD in Fig. 6A) large enough to accommodate respective ligands ( Fig. 6C and  D). We observed that unliganded simulations with different initial structures showed several similar binding pockets as well as previously unknown binding pockets (in the apical loop or Helices I and II) that accommodated ligands known to experimentally bind to other conformations (Fig. S18, ESI †).

Discussion
In this work, we have carried out long time-scale MD simulations (totaling 54 ms) of the HIV-1 TAR RNA structure in unliganded and liganded states. Specifically, these simulations were conducted with initial coordinates derived from the experimentally resolved apo structure of TAR RNA (PDB code 1ANR) as well as 13 other structures of TAR RNA that were bound to a variety of ligands including small-molecules and peptides. To increase the pool of unliganded simulations, we also conducted simulations of 13 liganded TAR RNA structures by removing ligands and retaining the initial coordinates for RNA atoms. We aimed to probe conformational heterogeneity in ensembles of TAR RNA structures in unliganded and liganded states to understand the predisposition of the unliganded TAR conformations to ligand binding and the effect thereafter.
We initially assessed the overall torsional flexibility of TAR RNA by computing distributions of all backbone dihedral angles from unliganded and liganded simulations ( Fig. 2A and Fig. S4, ESI †) and found these distributions to be consistent with the range of values from experimentally known structures of nucleic acids. This observation supports the ability of the interatomic potential in adequately capturing the dynamics in TAR RNA structures. By computing the buried surface area (BSA) of each ligand, we also assessed the stability of ligands during long time-scale MD simulations and found that in most TAR RNA structures ligands remained bound throughout simulations except in a few cases where ligands conformationally rearranged and/or partially dissociated.
We then probed the global dynamics in TAR RNA by comparing all unliganded and liganded conformations from MD simulations using global RMSD, DRMSF, and clustering analyses ( Fig. 3 and Fig. S8, S12, ESI †). The primary observation from the RMSD analysis was that the mean RMSD of unliganded conformations was higher than the mean RMSD of liganded conformations, thereby suggesting decreased conformational fluctuations in TAR RNA on ligand binding. The analysis of DRMSF further supported this observation since the magnitude of fluctuations in nucleotides was smaller in the  liganded simulations than in the unliganded simulations. These observations are consistent with the notion that RNA molecules are stabilized by binding of ligands because liganded TAR RNA structures were conformationally more rigid compared to unliganded structures. However, one of the small molecules, acetylpromazine, resulted in distinct perturbations in the overall structure of the liganded TAR RNA in comparison to other liganded systems and in comparison to the corresponding unliganded simulation. In the presence of acetylpromazine, the TAR RNA structure transitioned between two distinct (bent and stretched) conformations (Fig. S7, ESI †). We have previously also shown that the (un)binding process of acetylpromazine is associated with the flipping of nucleotides in the binding pocket. 76 The main structural difference between acetylpromazine and other small molecules is the presence of a sulfur moiety in one of the benzoic rings (Fig. S1, ESI †), which could be an important design feature for future development of inhibitory compounds.
The bulge motif which connects two helices in the TAR RNA structure (Fig. 1A) also became more rigid in the presence of ligands, which decreased the twisting and bending fluctuations in TAR RNA helices. The clustering analysis further supported these observations by showing a higher fraction of similar conformations in liganded structures in comparison to unliganded structures. In fact some liganded structures with peptide ligands exhibited a single cluster containing more than 90% of the RNA conformations, implying that these liganded simulations exhibited small conformational variability in the presence of peptides since the TAR RNA conformations were similar to each other (e.g. PDB code 2KX5; Fig. 3B and Fig. S14, ESI †). We also observed that various simulations have similar conformations that form combined clusters by performing combined cluster analysis (Fig. S15, ESI †) and by comparing average structures from each simulation ( Fig. S10 and S11, ESI †). These observations further support the previously proposed hypothesis 12,26,43 that despite being a highly flexible molecule, TAR RNA potentially adopts a set of conformations forming an ensemble of structures that can recognize various ligands.
We further probed the local dynamics in bulge nucleotides (U23, C24, and U25), the conformational flipping motions in which facilitate ligand binding 27,42 as well as alter global dynamics in TAR RNA. As opposed to the notion that the binding of ligands may prevent conformational transitions in nucleotides, we observed that the bulge nucleotides can transition between inward and outward conformations in both unliganded and liganded states although the frequency of transitions significantly decreases in the presence of ligands. Overall, we found bulge nucleotides C24 and U25 to be more flexible than U23.
Importantly, as a result of conformational heterogeneity in TAR RNA structures and coupling between local and global dynamics, we observed the formation of ligand binding pockets near several structural motifs (bulge region and helices I/II). This observation is consistent with the suggestion that TAR RNA may adopt conformations with pre-existing binding pockets where ligands can fit. 12 We observed that these binding pockets form consistently in all unliganded simulations with enough volume to accommodate different ligands ( Fig. 6 and Fig. S17, S18, ESI †), including larger ligands (e.g. peptides). Moreover, we observed the formation of binding pockets in other structural motifs (e.g. the apical loop) in TAR RNA which are potentially useful to future inhibitor design. As an example, Patwardhan et al. 59 showed that the amiloride ligands can bind to nucleotides in the apical loop where we observed several binding pockets.

Conclusions
Although RNA molecules are known to undergo conformational changes during various cellular processes, the conformational dynamics in RNA molecules, with and without ligands, have been studied only to a limited extent. We used explicit-solvent MD simulations to study the dynamics in a model RNA system, the HIV-1 TAR RNA, which is known to recognize several types of ligands including small-molecules and peptides. We observed that the ligands rigidified TAR RNA structures by interacting with the bulge nucleotides and decreased the overall bending and twisting motions in helical motifs in TAR RNA. Therefore, we found that ligands overall decreased conformational heterogeneity in TAR structures. While RNA is considered a highly flexible molecule, we observed that TAR RNA structures on average became more similar to each other in the unliganded and liganded simulations compared to their initial conformations. We also observed that the conformational transitions leading to flipping of nucleotides in RNA molecules likely occur irrespective of the presence of ligands although the frequency of these transitions decreases on ligand binding. As a result of conformational heterogeneity, we also showed that unliganded RNA molecules possess ligand binding pockets that may be amenable to targeting by novel inhibitory molecules. For sharing with the scientific community, we have further made our simulation data available via the Zenodo platform (https://doi.org/10.5281/zenodo.4521164).

Author contributions
L. L. and H. V. designed the research. L. L. performed the research and analyzed data. L. L. and H. V. wrote the article.

Conflicts of interest
There are no conflicts to declare.