Chemical synthesis and NMR spectroscopy of long stable isotope labelled RNA †

We showcase the high potential of the 2 0 -cyanoethoxymethyl (CEM) methodology to synthesize RNAs with naturally occurring modified residues carrying stable isotope (SI) labels for NMR spectroscopic applications. The method was applied to synthesize RNAs with sizes ranging between 60 to 80 nucleotides. The presented approach gives the possibility to selectively modify larger RNAs ( 4 60 nucleotides) with atom-specifically 13 C/ 15 N-labelled building blocks. The method harbors the unique potential to address structural as well as dynamic features of these RNAs with NMR spectroscopy but also using other biophysical methods, such as mass spectrometry (MS), or small angle neutron/X-ray scattering (SANS, SAXS).

Solution and solid state nuclear magnetic resonance (NMR) spectroscopy have proven to be highly suitable to address structural and dynamic features of RNA. [1][2][3][4] A prerequisite to apply state-of-the-art NMR experiments is the introduction of a stable isotope (SI) labelling pattern using 13 C/ 15 N labelled RNA or DNA precursors. [5][6][7][8] The most wide-spread method uses labelled (2 0 -deoxy)-ribonucleotide triphosphates and enzymes to produce the desired RNA or DNA sequence enriched with 13 C and 15 N nuclei. 1,5 This approach enables to produce sufficient amounts of RNA and DNA for NMR spectroscopic applications. This well-established method allows nucleotide specific labeling by mixing a SI-labeled with unlabeled d/rNTPs. Especially in larger RNAs (460 nt) such nucleotide specific SI-labeling can still lead to significant resonance overlap. That is why, the PLOR (position-selective labelling of RNA) method was recently introduced, which holds the promise to site-specifically label RNA using SI-labelled ribonucleotide triphosphates and T7 RNA polymerase. 9 An alternative method was concurrently developed making use of the synthesis of 2 0 -O-tri-iso-propylsilyloxymethyl (TOM)-or 2 0 -O-tert-butyl-dimethyl-silyl-(TBDMS)-SI-modified phosphoramidites and solid phase synthesis. [10][11][12][13] The approach works well for medium sized RNAs up to 50 nts and the synthetic access to the SI-labelled building blocks is well established. 10,12 Thus, the fully chemical SI-labelling protocol can be regarded as an expedient expansion to the settled enzymatic procedures to freely chose the number and positioning of SI-labeled residues into a target RNA. In our hands, however, the standard solid phase synthesis methods are not that well suited to produce larger amounts (450 nmol) and purities higher than 95% for RNAs exceeding 60 nts. Due to this restriction, large RNAs are only accessible via enzymatic ligation strategies using T4 RNA/DNA ligase making extra optimization steps necessary or introducing new problems, such as finding the optimal ligation site or issues regarding up-scaling and yield of the ligation product. [14][15][16] Thus, an improved synthetic procedure to directly address SI-labelling of larger RNAs (460 nt) at amounts suitable for NMR would be highly desirable.
We report the synthesis of SI-labelled RNAs ranging in size between 60 to 80 nts capitalizing on the 2 0 -cyanoethoxymethyl (CEM) RNA synthesis method. 17,18 As these CEM building blocks are not commercially available all phosphoramidites were produced in-house and we further synthesized 13 C-/ 15 N-labelled unmodified and naturally occurring modified RNA phosphoramidites ( Fig. 1a and b). In detail, we focused on the synthesis of 8-13 C-adenosine (1), 6-13 C-5-D-cytidine (2), 8-13 C-guanosine (3) and 6-13 C-5-D-uridine (4) building blocks. Modified RNA building blocks include a 1,3-15 N 2 -dihydrouridine (5) and a 2,8-13 C 2 -inosine (6) CEM phosphoramidite. A detailed description of the synthetic procedures is given in the ESI. † We used these monomer units to produce SI-labelled RNAs exceeding the size limitation of 60 nucleotides for NMR up to 20 nucleotides. The RNAs reported here were synthesized on a 1.3 mmol scale and on a 1000 Å controlled pore glass (CPG) solid support with 0.1 M CEM amidite solutions, i.e. a 13-fold excess of amidite was used in each coupling step. The deprotection steps followed the recommendations of Ohgi et al. but the RNA was desalted via size-exclusion chromatography and not precipitated as suggested in the original work. 17 A detailed description of the chemical RNA synthesis can be found in the ESI. † As a first target, we picked a retroviral messenger RNA, which was earlier investigated using solution NMR spectroscopy. D'Souza and co-workers found, that the murine leukaemia virus (MLV) pseudoknot (PK) is a cis-acting regulation element, which can induce alternative protein expression by the ribosomal frameshifting and the read-through mechanism. 19 The MLV PK undergoes an exchange between two conformations. At low pH, A17H + interacts with C23 and G53 leading to a release steric constraint that prevents S1-L2 tertiary interactions. At physiological pH, this state is an excited state with a population of only 6%. The pH-dependent tertiary structure transition to this excited state is a functional switch allowing the ribosome to bypass the gag stop codon ( Fig. 2a and b).
We introduced a stable isotope label by replacing A17 with its 8-13 C-labelled counterpart (Fig. 2a). A high-quality crude product was obtained as deduced from the anion-exchange (AIEX) chromatogram ( Fig. 2c left). Characterization of the AIEX-purified MLV PK by top-down Fourier-transform ion cyclotron resonance (FT-ICR) mass spectrometry (MS) 20,21 revealed a high sample homogeneity ( Fig. 2c right), the correct sequence, and localized the SI label to A17 (ESI, † Fig. S1). We exploited the SI-labelling pattern to study the RNA via NMR spectroscopy ( Fig. 2d and e). The imino proton spectrum is in agreement with a previously acquired spectrum. 19 We also conducted a 13 C CPMG relaxation dispersion experiment, but the data could not be properly fitted to give the parameters of the exchange process due to the small amplitude of the dispersion profile (data not shown).
The next target is part of the box C/D ribonucleoprotein (RNP) particle, for which a structural model was recently obtained using an integrative structural biology approach. 22 The RNP catalyzes the post-transcriptional modification of rRNA -the 2 0 -O-methylation. In the core of the complex resides a guide RNA, which recruits the substrate RNAs for processing by the fibrillarin enzyme to introduce the 2 0 -O-methyl group using S-adenosyl-L-methionine (SAM) as the co-factor. With 72 nt the partially symmetrized guide ssR26 represents a challenging target for the solid phase synthesis. The guide RNA was synthesized using the 8-13 C-A amidite 1 but even more noteworthy a 3-15 N-labelled 2 0 -O-TBDMS uridine amidite (Fig. 3a). The crude product gave a AIEX chromatogram with a main product peak (ESI, † Fig. S2). Thus, the CEM method tolerates a limited number of TBDMS RNA building blocks further increasing the versatility of the presented approach as many commercially available RNA modifiers use the standard 2 0 -O-TBDMS protecting group. So far, we introduced up to four 2 0 -O-TBDMS building blocks into 60 nt RNAs assembled via the CEM method.
The isolated yield of full-length SI-labelled 72 nt was 95 nmol (7%). Top-down FT-ICR MS confirmed the sequence and located the SI labels to residues U19 and A38 (ESI, † Fig. S3). The structure of the 72 nt ssR26 RNA was then probed using solution NMR spectroscopy (Fig. 3b-d).
The imino proton region of the 1 H-NMR spectrum indicated a well-structured RNA with several resonances in the nonstandard Watson-Crick base pair chemical shift region (Fig. 3b). A 1 H-15 N-correlation spectrum disclosed that U19 is forming a base pairing interaction, very likely a GÁU wobble base pair based on the chemical shift signature (Fig. 3c). Making use of the  8-13 C-A38, an 1 H-13 C-HSQC spectrum of ssR26 revealed conformational heterogeneity as two 1 H-13 C-correlation peaks were observed (ESI, † Fig. S4). We were then interested in the consequences induced by the addition of the substrate RNA. Upon the addition of 2.2 equivalents of substrate RNA carrying a 1-15 N-adenosine label, the spectra displayed significant changes. A 1 H-15 N-HSQC spectrum nicely reflected the structural transition of U19 to a standard A-U Watson-Crick base pair (Fig. 3c). The base pairing partner A10 0 and the intermolecular A10 0 -U19 base pair could further be unambiguously confirmed by a HNN-COSY (Fig. 3d). 23 A 1 H-13 C-correlation NMR experiment showed that the conformational heterogeneity of the apo ssR26 RNA was resolved and the binding of the substrate strand leads to a homogeneous folding state (ESI, † Fig. S4).
As the final example, the highly relevant class of tRNAs was selected. This RNA species fulfills a translator function by transferring the mRNA information into an amino acid sequence. [24][25][26] But still, certain aspects of its structure/function are not clear. The functions and roles of modified residues, such as dihydrouridine (DHU) or more complex modifiers (e.g. uridine 5-oxyacetic acid), on a molecular level are not yet fully elucidated.
An as so far largely unexplored aspect of modified RNA residues is their influence on the folding landscape. SI-labelled variants of the modified RNA residues are mandatory to characterize their influence on an RNA's folding landscape by NMR. 27 Two examples, the 1,3-15 N 2 -DHU (5) and the 2,8-13 C 2 -inosine RNA (6) amidites, are introduced here. DHU is as a dynamic hotspot as it does not form p-stacking interactions and preferentially populates the C2 0 -endo conformation. 28 Here, we report the synthesis of a DHU modified tRNA (Fig. 4a). A high-quality crude product peak was observed and after purification, the exact mass, the sequence and the location of the SI labels (D16 and D17) were confirmed by FT-ICR MS ( Fig. 4b and ESI, † Fig. S5). A 0.15 mM sample in 400 mL was obtained corresponding to a total yield of  60 nmol (4.6% yield). 1 H NMR spectra and 1 H-15 N HSQC spectra confirmed the SI labelling ( Fig. 4c and d). A folding event of the DHU modified tRNA was triggered by the addition of 10 equivalents magnesium(II) ions, which could be nicely followed via NMR spectroscopy (Fig. 4c and d).
Noteworthy, we changed the N 4 -acetyl group of the 2 0 -O-CEM-cytidine to a phenoxyacetyl moiety as the mild alkaline deprotection conditions (2 M ammonia in methanol, 37 1C, 20 h) keeping the DHU residue intact left some of the N 4 -acetyl groups untouched. We plan to use 15 N-relaxation dispersion NMR to probe the influence of the DHU residues on the tRNA's folding landscape. In analogy, we plan to address functional dynamics induced by inosine, as a recent work reports a destabilizing effect of this modification. 29 In this work, we report a synthetic access to SI-labelled RNAs using chemical solid phase synthesis. The minimal steric demand of the 2 0 -CEM protecting group and a clean deprotection procedure give high quality products, which can be purified using denaturing AIEX chromatography to yield RNAs up to 80 nt with the required 495% purity for NMR. We synthesized atom-specific 13 C-labelled 2 0 -O-CEM amidites 1-4 and modified SI-labelled RNA building blocks (5, 6) and incorporated them in various RNAs. The thus obtained nucleic acids proved to be suitable for NMR investigations to probe their structure/dynamics. We foresee several areas of applications for such SI-labelled RNAs. The most obvious utilization is the unambiguous resonance assignment in larger RNAs using atom-specific SI labelled residues. 30 Further, the isolated spin pair 1 H-X (X = 13 C/ 15 N) topologies of the building blocks 1 to 6 minimize the scalar coupling interactions and relaxation pathways and thus make the application of relaxation based NMR experiments, such as relaxation dispersion or CEST experiments, straightforward to probe functional dynamics in RNAs. 10,12,13 Besides, we anticipate a potential use of the large SI labelled RNAs in recently reported mass spectrometric methods to localize protein binding sites, which give valuable information for the 3D structure modelling of large RNP particles. 31,32 We currently also focus on the synthesis of per-deuterated RNA building blocks. The building blocks will be beneficial for NMR but also for SAXS/SANS studies in an integrative structural biology approach, as the chemical RNA synthesis allows full control over segmental deuteration. This will offer the possibility to define relative domain orientations in such larger nucleic acids by contrast-matched SANS or SAXS, as recently suggested for proteins. 33 To conclude, we are confident that SI-labelling of RNA via the 2 0 -O-CEM methodology is a competitive new approach with respect to existing chemical and enzymatic protocols to modify nucleic acids for biophysical investigations.

Conflicts of interest
There are no conflicts to declare.