Open Access Article
Aaron M.
Fleming
*,
Judy
Zhu
,
Vilhelmina K.
Done
and
Cynthia J.
Burrows
*
Department of Chemistry, University of Utah, 315 S. 1400 East, Salt Lake City, UT 84112-0850, USA. E-mail: burrows@chem.utah.edu; afleming@chem.utah.edu
First published on 23rd August 2023
Nanopore direct RNA sequencing is a technology that allows sequencing for epitranscriptomic modifications with the possibility of a quantitative assessment. In the present work, pseudouridine (Ψ) was sequenced with the nanopore before and after the pH 7 bisulfite reaction that yields stable ribose adducts at C1′ of Ψ. The adducted sites produced greater base call errors in the form of deletion signatures compared to Ψ. Sequencing studies on E. coli rRNA and tmRNA before and after the pH 7 bisulfite reaction demonstrated that using chemically-assisted nanopore sequencing has distinct advantages for minimization of false positives and false negatives in the data. The rRNA from E. coli has 19 known U/C sequence variations that give similar base call signatures as Ψ, and therefore, are false positives when inspecting base call data; however, these sites are refractory to reacting with bisulfite as is easily observed in nanopore data. The E. coli tmRNA has a low occupancy Ψ in a pyrimidine-rich sequence context that is called a U representing a false negative; partial occupancy by Ψ is revealed after the bisulfite reaction. In a final study, 5-methylcytidine (m5C) in RNA can readily be observed after the pH 5 bisulfite reaction in which the parent C deaminates to U and the modified site does not react. This locates m5C when using bisulfite-assisted nanopore direct RNA sequencing, which is otherwise challenging to observe. The advantages and challenges of the overall approach are discussed.
Sequencing RNA with the goal of locating and ideally quantifying the modifications has been approached by many methods.5 The RNA can be reverse transcribed to a cDNA followed by high-throughput sequencing; modification-specific chemical reactions (e.g., CMC alkylation of pseudouridine) can be used for the introduction of a signature, such as a stop, in the cDNA revealed during sequencing. Immunoprecipitation of RNA targeting a specific modification can generate an enriched population of strands that is converted to cDNAs for sequencing to locate the modification.6 The RNA can undergo limited nuclease digestion followed by LC-MS analysis for sequencing modifications.7 Recently, direct sequencing of RNA for modifications has become possible with new technologies. The PacBio platform can directly sequence RNA and has the potential for locating modifications as was demonstrated for N6-methyladenine in RNA;8 however, it is nanopore direct RNA sequencing that has been applied to the greatest extent for modification-aware sequencing.6,9,10
The nanopore sequencer is a two-protein platform that uses an ATP-dependent helicase to deliver the RNA 3′ to 5′ into a lipid-bilayer-embedded protein nanopore under an electrophoretic force (Fig. 1A).11 As the nucleotides (nts) pass the central constriction zone of the nanopore protein they deflect the ionic current in a sequence-dependent fashion. In RNA, approximately 5 nts, referred to as a k-mer, contribute to the current level.12 The current vs. time data are then base called using a recurrent neural network trained on canonical nucleotides to reveal their identities. Thus, RNA modifications can yield signatures in the raw ionic current vs. time data and/or in the base-called data.13 There are many advances that this technology enables for epitranscriptomic studies but some challenges remain to be resolved.
Many different epitranscriptomic modifications have been sequenced with the nanopore system.14–17 The present discussion will focus on the uridine isomer pseudouridine (Ψ) because the prior work showcases the strengths and challenges of this approach (Fig. 1B).16,18–25 Pseudouridine is the most common modification in all RNA and is the second most commonly found in eukaryotic mRNA.26 There exist many writers for catalyzing the isomerization reaction with humans having 13 of these enzymes26 that can install Ψ in nearly any sequence context.27 In nanopore direct RNA sequencing for Ψ, this modification is “miscalled” as a C with the highest frequency, and miscalls to the other bases occur with lower frequencies16,18–25 Natural U/C sequence variations will present as Ψ in base-call data analyses. These base miscall signatures have allowed sequencing for Ψ directly in rRNA,17,18,25 mRNA,20,23,25 tRNA,14,15 and vRNA19 that can have lengths >5000 nts, demonstrating the ability for long-read modification-aware sequencing with this method. A challenge to using base-called data for quantitative analysis of Ψ is that the frequency of “miscalls” is sequence-context dependent.20,23,28 The sequencer has high overall error for RNA sequencing;29 therefore, a well-matched control void in the target modification is needed to make comparisons.12,17,23
Inspection of the raw data from the nanopore sequencer (i.e., ionic current vs. time traces) can provide data to bypass some of these challenges.18,19 The current levels do change for Ψ; however, the changes are sequence-context dependent similar to the base-call data, which is expected because the current level data is used for base calling, and to compound the issue, other U modifications can yield similar current-level differences.19,28 Our work and others revealed that the helicase stalls at Ψ sites that are found when analyzing the dwell time for this modification 10–11 nts before the nucleotide reaches the nanopore protein central constriction.19,24 We proposed using a consensus of base call, ionic current, and dwell time data as an approach for greater accuracy in Ψ sequencing, which we expanded to 16 of the 17 different chemical modifications found in E. coli rRNA.17,19
In the present report, we outline an alternative approach to use nanopore direct RNA sequencing to locate Ψ and differentiate these sites from a U/C sequence variation, to identify Ψ in sequence contexts that fail to give a strong miscall, and to provide a positive signal that can differentiate Ψ from other U modifications. The approach employs the bisulfite reaction at pH 7 to form stable Ψ adducts that upon nanopore direct RNA sequencing are revealed as an insertion–deletion (indel) signature.30 Additionally, the bisulfite reaction at pH 5 is the gold-standard method for sequencing 5-methylcytosine in DNA via deamination of the parent C base to uracil while the methylated base fails to react; therefore, we used this reaction to sequence for 5-methylcytidine (m5C) in RNA, which is challenging to achieve without the assistance of this reaction.16 The advantages and challenges of using modification-specific chemistry for nanopore direct RNA sequencing are discussed.
In the E. coli 16S and 23S rRNAs, there exist 10 Ψ sites, and in the human 5.8S, 18S, and 28S rRNAs there exist 103 Ψ sites (Fig. S4, ESI†).35,36 Nearly all of these have been verified by mass spectrometry to be present at high occupancy.35,37 In the first analysis, the base calling errors for all 113 Ψ sites were determined and plotted (Fig. 2A left). The error ranged from 17% to 100% that is consistent with previous studies reporting a large range of base calling errors associated with Ψ.19,23,25,28 Ribosomal RNA has many modifications, some of which are clustered closer than 5 nts apart, and the clustered modifications could present higher base calling errors than Ψ isolated in the same unmodified sequence contexts. Next, Ψ sites isolated in k-mers that do not also possess other modifications (n = 87) were inspected to find roughly the same range of base calling errors (Fig. 2A right). These data demonstrate that when the high occupancy Ψ sites in rRNA are sequenced with the nanopore, the base calling error is highly variable with dependency on the sequence context. The sequences that give low error would be false negatives when conducting de novo sequencing for Ψ.
Our next analysis compared the nanopore base miss calls for Ψ in cellular RNA against our previously published synthetic RNAs.28 The previous report synthesized RNA by in vitro transcription (IVT) with U or Ψ in all 5′-VV(U/Ψ)VV-3′ (where V ≠ U) sequence contexts.28 The sequences analyzed had the U/Ψ sites spaced >20 nts apart to interrogate the sequencer performance on individual modifications as they pass through the sensor rather than closely spaced modifications that could influence the signals from one another, an approach that has been reported in the literature.16,38 Out of the total 113 pseudouridinylation sites in the rRNAs from E. coli and humans, there exist 28 in 5-nt k-mer contexts that could be compared to the prior data set.28 The 28 sites in the rRNA used for the comparison were first corrected for the modification abundance on the basis of prior quantitative MS data (Fig. S4, ESI†).35,37 A plot of the base calling error for the synthetic 5-nt Ψ sequence contexts (y-axis) vs. the sequence-matched biological Ψ sites (x-axis) found a poor but positive trend existed between the datasets (Fig. 2B). The synthetic RNA strands predict a higher base calling error for Ψ than was found in the biological RNA.
We reasoned the base calling error differences between the synthetic RNA with Ψ vs. biological rRNA Ψ sites has to do with the 5-nt k-mer failing to reproduce the full complexity of the sequence between the helicase and the nanopore. The distance between the helicase active site and the k-mer centered in the central constriction zone of the nanopore is ∼11 nts (Fig. 1A); this distance is based on the helicase signature being 10–11 nts from the nanopore sensor for Ψ as previously reported.17,19,24 There exist two Ψ sites in the E. coli 23S rRNA (955 and 1911) in which the Ψ is in a sequence context spanning from the helicase active site to the 3′ side of the k-mer with only A, C, or G nucleotides, thus allowing the synthesis of these standards by IVT. We synthesized RNA to include 13 nts 5′ to the Ψ that should span slightly past the helicase active site and 3 nts 3′ to the Ψ to span 1 nt past the central constriction zone of the nanopore sensor where the k-mer resides (Fig. 2C).
The two sequence contexts selected are shown with either a green or red dot in Fig. 2B, in which the green one (23S Ψ955) represents a case that the synthetic 5-nt k-mer standard reproduced the biological data well, while the red one (23S Ψ1911) represents a case that the synthetic standard poorly reproduced the biological data (Fig. 2C). For the 23S Ψ955 data, extending the standard to include the sequence between the nanopore and helicase reproduced the base calling error obtained in the biological data similar to the 5-nt sequence (Fig. 2D, green; 5-nt standard = 93%, 17-nt standard = 94%, and biological = 92%). A point regarding 23S Ψ1911 is that there is an m3Ψ in this sequence at position 1915 that would have passed through the central constriction by the point at which 1911 is centered in this region; we acknowledge the synthetic system is not a perfect reproduction of the biological data. Nonetheless, when the sequence was extended, the 17-nt synthetic standard more closely represented the biological base calling error than the 5-nt standard (Fig. 2D, red; 5-nt standard = 77%, 17-nt standard = 60%, and biological = 64%). This example demonstrates that some sequence contexts require nanopore data with a much longer synthetic sequence standard that fully spans the region between the helicase and nanopore central constriction zone to reproduce biological nanopore data accurately. The utility of IVT to generate long control RNA with and without modifications has been used by us and others for comparisons to locate epitranscriptomic modifications;12,17,23 in the present work, an alternative approach for locating Ψ while minimizing the false positives and negatives is pursued. Use of chemical reagents to selectively label modifications can give rise to new sequencing signatures, and this method is employed in other studies for sequencing epitranscriptomic modifications.22,39–45
:
1 at pH 5 and 9
:
1 at pH 7.47 This method was employed by the He laboratory in the development of BID-seq and the Yi laboratory in the development of PRAISE sequencing for the quantitative sequencing of Ψ in human RNA using Illumina sequencing of amplified cDNA as the final read-out.39,40 At pH 7, bisulfite reaction conditions are reported that result in near quantitative conversion of Ψ to a bisulfite adduct leading to quantitative sequencing in singly and multiply modified sites in mammalian transcriptomes.39,40 The success of this reaction in our laboratory and others led us to consider using bisulfite adducts at Ψ as a means to differentiate the modification from the parent U by a substantial change in structure so that nanopore direct RNA sequencing would readily identify Ψ, independent of sequence. This would demonstrate the feasibility of using modification-specific chemical reactions as a tool for inducing signatures at modification sites that are hard to identify in the nanopore data.
First, IVT-generated RNA strands with 14 model U/Ψ sites in different sequence contexts spaced >20 nts apart were synthesized and nanopore sequenced (Fig. S1, ESI†). These RNAs were successfully analyzed using the method described above that is standard for the nanopore direct RNA sequencing field.19 The Ψ-containing RNA strands were then subjected to the bisulfite reaction at either pH 5 or 7 under the reported conditions to generate stable Ψ-(SO3−) adducts consistent with our previous reports on this reaction (Fig. S5, ESI†).30,42 The alignment reference for the pH 5 reaction replaced the C nucleotides with U nucleotides because of the deamination that occurs under these conditions, while the data for the pH 7 reaction used the original alignment reference with C nucleotides. Determination that the adducts could pass through the helicase and nanopore sensors was first demonstrated using FastQC analysis of the passed reads from the sequencer to determine the mean read lengths, G:C content, and quality of the data. The Ψ-(SO3−) adducted RNA strands produced a similar population of read lengths as the U- and Ψ-containing RNAs, the read accuracy average decreased from Q20 for the U-containing RNA to Q12 for the Ψ- and Ψ-(SO3−) RNAs, and the %GC for the modified RNAs differed from the U-containing RNA in the expected direction based on our knowledge of the reactions (Fig. S6, ESI†). Next, we found that using minimap2 as the aligner software failed to map to the reference sequences. However, when the aligner software was changed to BWA MEM the mapped reads increased from the previous 0% to 42% (Fig. S2, ESI†). This change in the aligner program has been used to increase the mapped nanopore direct RNA sequencing reads for hypermodified tRNA to the reference sequences.14,15 For Ψ-containing synthetic RNA, the bisulfite reaction at pH 5 or 7 produced similar results, and those for pH 7 will be discussed.
The base calling error was analyzed using ELIGOS2, which reports an error of specific bases (ESB) for sites of interest;16,48 this computational tool also inspects base calling data for a modified RNA against a matched unmodified RNA to provide a statistical prediction of whether a modification resides at a target position or not. We used both features of the algorithm. Other programs exist for running base-calling error analysis to locate RNA modifications;20,25 however, they were not used in the present studies. When inspecting the ESB values from three nucleotides before and after the U, Ψ, or Ψ-(SO3−) sites (X in Fig. 3A and Fig. S7, ESI†), the ESB values were observed to increase from a low for the U sites, to a midrange value for Ψ sites, to a maximal value for Ψ-(SO3−) sites (Fig. 3A and Fig. S7, ESI†). The ESB values for Ψ in the contexts studied ranged from 0.2–0.9, while the Ψ-(SO3−) adducts gave ESB values ranging from 0.8–0.9; this suggests the sequence context effect on the error has been minimized for Ψ-(SO3−) adduct compared to Ψ, and in the sequence contexts studied, the adduct produced high base call errors generally in the form of deletion signatures that are reported as indels.
The U-containing RNA reads were mixed with known ratios of Ψ-containing or Ψ-(SO3−)-containing RNA (50%, 33%, 20%, or 10%) followed by ELIGOS2 analysis to determine the P-value of significance for a modification at the target sites (Fig. S8, ESI†). This examination provides insight into the bisulfite adduct to allow monitoring of sub-stochiometric levels of Ψ in the contexts studied. The P-values were negative-log transformed for visualization and plotted as box and whisker plots (Fig. 3B). At 50% occupancy of Ψ or Ψ-(SO3−), ELIGOS2 returned high values of significance for the presence of the modifications. When the level of the modifications was 20% or 33%, the significance levels reported for Ψ-(SO3−) were much greater than those for Ψ. At 10% modification, neither was predicted to be significantly modified, which is about the same level as the reported error for nanopore direct RNA sequencing.29 The bisulfite adducts to Ψ on these synthetic sequences increased the ability to detect Ψ down to ∼20% occupancy, which is lower than is possible for direct sequencing of Ψ based on the analysis (Fig. 3B).
Next, E. coli rRNA were allowed to react with bisulfite at pH 7 followed by nanopore sequencing and data analysis as described above. The 10 well-established Ψ sites were inspected for base calling error before and after the reaction to find the error increased after reaction (Fig. 3C and Fig. S9, ESI†). For example, 16S Ψ516 and 23S Ψ1911 before the reaction were base called to yield a nearly 1
:
1 ratio of C
:
U bases, and after bisulfite reaction, the ratio of U calls decreased to <10%, the C base calls stayed the same or increased, and the indel frequency increased to ∼30%. Similarly, 23S Ψ2457 before the reaction was base called as A, C, G, U, and indels, and after the reaction the U call decreased to <20% and the indels increased to >40%. The 23S Ψ2457 site before and after reaction was base called with all possible options (A, C, G, U, and indels) and one possible reason for this could be the dihydrouridine at 2449 that would influence the data because it is positioned between the nanopore and helicase when Ψ2457 is in the nanopore (Fig. 1A). The other seven Ψ sites in the E. coli rRNAs were inspected and those for which enough data was present gave similar results (Fig. S9, ESI†). These observations demonstrate on a biological sample that Ψ-(SO3−) adducts yield larger errors than Ψ or U.
The E. coli genome possesses 7 operons for expression of the rRNA strands, in which there are 62 known sequence variations of which 19 are U/C variants.17 The sites of U/C variation give high predicted values for Ψ occupancy when using base-calling error against a reference sequence with U at the variation sites instead of C (Fig. S10, ESI†). In Fig. 3D are the variant sites in the 16S rRNA at positions 90, 93, and 208. Position 16S 93 and 208 had U
:
C ratios that did not change before or after the reaction. The difference in ratios between the two sites is expected based on the frequency of this variation across the seven rRNA operons and the expression levels of the rRNAs in the cell when the RNA was harvested.17 Next, position 90 was predominantly called as a C, as expected,17 with lower levels of G and indels found that were similar before and after the reaction. This site resides centered in a homopolymer run of U nucleotides, which are known to be error prone when sequenced with the nanopore,29 as observed. Overall, this demonstrates the utility of using the bisulfite reaction for differentiation of U/C variant sites vs. Ψ sites because the variants failed to react with bisulfite at pH 7 while Ψ did react and showed a change in the base calling profile (Fig. 3C and D).
In a final study to demonstrate the utility of chemical labeling to enhance nanopore sequencing signatures, we sequenced the 363-nt tmRNA (a.k.a. 10Sa or SsrA) made by E. coli cells that functions both as a tRNA and an mRNA for labeling proteins translated without a stop codon.7 This RNA allows marking these failed proteins with an 11 amino acid tag on the C terminal end to target them for proteolysis. Prior MS and gel analysis found the tmRNA to have three modifications in the T-loop of the tRNA portion of this RNA.7 There resides a highly modified 5-methyluridine (m5U) at position 341, at 342 is a Ψ at high occupancy (>90%), and at 347 resides a lower occupancy Ψ (∼10%) predicted by gel analysis following a CMC-induced reverse transcriptase stop (Fig. 4A). Nanopore direct RNA sequencing of the tmRNA before and after the pH 7 bisulfite reaction provided base-calling data to confirm the presence of both Ψ sites in the T-loop. For the high occupancy Ψ342 site before the labeling reaction it was called as a C ∼90% of the time and after the reaction this site was called as a C or indel in nearly a 1
:
1 ratio; these data support the high penetrance of Ψ at position 342. As for position 347, before the reaction, a U was called with ∼90% frequency, and after the reaction indels were observed with an ∼25% frequency with the remainder comprised of U calls. This predicts ∼25% occupancy of Ψ at this site. This nanopore experiment predicts more Ψ at position 347 than the gel-based analysis by ∼2-fold.7 Reasons for the differences observed include the fact that the CMC reaction to locate Ψ requires treating the strands at high pH (10) and 37 °C for 3 h that results in highly degraded RNA due to its sensitivity to strand break formation under these conditions; this is why CMC sequencing for Ψ is poorly quantitative and often irreproducible resulting in bisulfite-based sequence for these sites to prevail as a better chemical reaction.39 Secondly, the E. coli previously studied and those in the present study are different strains grown under slightly different conditions. These differences likely explain the ∼2-fold difference in the occupancy of Ψ347 in the E. coli tmRNA. This result at position 347 supports the conclusion of control studies that found this approach could detect Ψ down to ∼20% (Fig. 3B). A final point regarding the pH 7 bisulfite reaction is the other nucleotides in this region (A, G, C, U, and m5U) were sequenced similarly before and after the reaction demonstrating the high degree of selectivity of the bisulfite reaction for Ψ at pH 7.
![]() | ||
| Fig. 4 Nanopore direct RNA sequencing before and after the pH 7 bisulfite reaction can detect low and high occupancy Ψ sites in E. coli tmRNA. (A) The hairpin loop containing the only modifications in the tmRNA.7 (B) The base-calling data for the nucleotides comprising part of the hairpin stem and the entire loop where the modifications reside in this RNA before and after the pH 7 bisulfite reaction. | ||
For C before the reaction we found a low error of ∼10% (Fig. 5A), and after bisulfite assisted deamination the error remained below 20% (Fig. 5B). The slightly higher base-call error after the pH 5 bisulfite reaction was comprised of ∼10% of C calls as a result of incomplete reaction, and ∼10% indels that were present before the reaction. For m5C, the base-call error was <25% before the bisulfite reaction (Fig. 5C), as expected,16,17,25 and after the reaction, the error was >70% (Fig. 5D). After the reaction, the m5C sites did not yield quantitative levels of error that likely results from low conversion of m5C deamination to m5U under these conditions. For hm5C, before the bisulfite reaction the base-call error was <40% (Fig. 5E), and after the reaction the sites gave >80% base-calling error to readily reveal these sites in RNA (Fig. 5F). The bisulfite reaction at pH 5 for m5C and hm5C is a viable way to differentiate these two modifications from the parent C nucleotide (Fig. S9, ESI†). These data suggest quantification of the reaction will be challenging without further optimizations. Studies on biological RNA were not explored.
At present, two methods have been proposed to synthesize RNA standards null in modifications for comparison to biological datasets; one approach requires the ligation of a solid-phase-synthesized RNA to longer IVT-generated RNA strands, one sequence at a time,23,52 and the other uses cellular RNA for reverse transcription and PCR to yield DNA with the T7 promoter sequence to then re-synthesize RNA via IVT without modifications for comparison.12 Both approaches have their strengths and weaknesses. Synthesis of one mRNA modification at a time is very low throughput but can provide a well-defined control. The use of cellular RNA for cDNA synthesis to remake the RNA without modifications provides the best reproduction of a cell's RNA landscape that includes all the RNAs present and the cell-specific alternative splice forms of mRNA. However, this approach introduces reverse transcription errors particularly at some natural RNA modifications which are the sites of interest in epitranscriptomic studies (e.g., RNA base editing).53,54 Another approach for identification of RNA modifications in nanopore data is to study the RNA from cells with and without writer knockouts.21,38 This works well if the writer is known and knocking out the writer does not cause other biological impacts that interfere with the analysis.55,56
An appealing alternative is the application of chemical reagents to selectively modify target nucleic acid modifications, altering the nanopore signature, and it has been proposed by us and others.22,45,48,57,58 The bisulfite reaction at pH 7 to furnish stable adducts to Ψ is advantageous because the reaction conditions are fairly mild resulting in low degradation of the RNA (e.g., see Fig. 4B),39,47 unlike other approaches.27 The Ψ-(SO3−) adduct can minimize the sequence dependency in the nanopore base-calling errors (Fig. 3), which in turn minimizes the need for synthesizing such large libraries of benchmarking RNA strands for comparison to biological data. We studied E. coli rRNA as a test case for the bisulfite reaction and found the Ψ-(SO3−) adducts in many of the 10 Ψ sites produced enhanced base-calling error signatures (Fig. 3B and Fig. S9, ESI†); however, not all were able to be studied because the reaction did result in fewer successful sequence alignments as previously stated, which may result from the few clustered Ψ sites in the E. coli rRNAs. We did attempt the pH 7 bisulfite reaction on human rRNA followed by nanopore sequencing, but the results failed to align, likely as a consequence of the high density of Ψ adducts in this RNA failing to traverse and/or be analyzed by the nanopore sequencer; further studies were not conducted on these RNAs. The second demonstration of the reaction was to locate the high and low occupancy Ψ residues in E. coli tmRNA (Fig. 4). A key demonstration in this study was the low occupancy Ψ at position 347 before reaction was read predominantly as a U nucleotide, that is a false negative. This is likely because this Ψ is in a pyrimidine-rich k-mer (5′-ACΨCC) and resides at low occupancy. On the other hand, the bisulfite adduct created an indel signature that could be readily observed (Fig. 4B). This analysis demonstrates bisulfite adducts to Ψ can yield new signals to follow modification in k-mers where Ψ continues to code like a U. Application of this chemistry and nanopore sequencing on mRNA could result in success for detection and possibly quantification of Ψ sites via Ψ-(SO3−) adducts that change with cellular cues.17,25 Lastly, the bisulfite reaction at pH 5 can be used to locate both Ψ and m5C/hm5C (Fig. 3–5);42 however, a limitation to this approach is that the C nucleotides are converted to U nucleotides, which reduces the sequence complexity of the reads resulting in challenges for reference alignment.42,46
The Ψ-(SO3−) adduct additionally can differentiate U/C sequence variation sites that will give a mixture of U and C calls similar to a bonafide Ψ site (Fig. S10, ESI†). Naturally existing U/C variations will be false positives for Ψ. These natural variations are refractory to reacting with bisulfite and do not change when comparing sequencing data before and after the reaction (Fig. 3C); in contrast, Ψ sites will have altered base-calling behavior after the reaction to reveal them as actual modification sites (Fig. 3D). The present studies inspected the established U/C sequence variations found between the 7 operons for the rRNA strands in E. coli that have a 99.6% similarity.59 The recent release of the complete human genome sequence identifies humans have on average ∼400 ribosomal DNA sequences spread across multiple chromosomes that have sequence similarities ranging from 99.4–99.7%;60 the key point is human ribosomal RNA will harbor many false positive Ψ sites as a consequence of the U/C variations that naturally exist in the sequence. Other sources of T/C sequence variation include C-to-U editing in mRNA,61 and natural U/C variations in coding portions of the genome.62 The bisulfite reaction at pH 7 to label Ψ provides a method to differentiate sequence variations from RNA modifications when inspecting nanopore direct RNA base-calling data.
The use of chemical tools for site-selective labeling for RNA modifications can introduce challenges. The reactions must be highly selective and not cause degradation of the fragile RNA strand, and bisulfite chemistry at pH 7 fits these requirements; however, our data on the E. coli rRNA do show decreased alignment success suggesting some degradation or too many adducts on the strand resulted in decreased alignment to the reference (Fig. S3; ESI†). The data reported found that the Ψ-(SO3−) adduct when passing through the nanopore sensor yields current levels that the base-calling algorithm fails to call, which was observed as the high indel frequencies reported (Fig. 3). We were curious what the features of the raw current vs. time traces were that led to the loss in ability to base call these sites. Our analysis of two different sequence contexts found that when the Ψ-(SO3−) adduct passed through the nanopore sensor the current levels were noisier in ∼50% of the events compared to those of the unadducted nucleotides (Fig. S11; ESI†). This is problematic because the available software for the analysis of nanopore data filters this data out (Fig. S11; ESI†), likely as a consequence of these traces presenting raw data similar to that coming from failing nanopores. Future work in the software domain will be required for additional analysis of these adducts to Ψ and likely other larger adducts naturally or synthetically placed on the RNA nucleotides. The noisy feature may also have a benefit for development of machine learning tools to look at nanopore direct RNA sequencing data to find Ψ, especially in sequence contexts in which Ψ is called as a U (Fig. 4E. coli tmRNA Ψ347).
Chemical tools for site-specific labeling of Ψ expand past bisulfite to include the carbodiimide CMC that yields a stable N3 adduct to Ψ after a two-step reaction conducted under alkaline conditions known to degrade RNA.27,63 The CMC-Ψ adduct is likely too big (MW: bisulfite = 80; CMC = 252) to be successfully sequenced with the nanopore, in addition to the RNA degradation issue. Acrylonitrile and methyl vinyl sulfone are alternative Ψ alkylators for which the acrylonitrile adducts have been analyzed by nanopore sequencing;22,64 however, these reagents can also react with inosine3,64 and 4-thiouridine,65 and therefore, they are not site-selective reagents. There exists a library of chemical tools with varying degrees of selectivity for reacting with RNA modifications;66 these may become part of the toolbox for mapping RNA modifications in nanopore direct RNA sequencing data. As demonstrated in the present studies, site-selective chemical modification of Ψ provides data that minimizes or eliminates false positives at U/C sequence variation sites and false negatives at Ψ sites that code like U. Future advances in computational tools will need to occur to fully release the potential of using chemical tools to advance nanopore sequencing for epitranscriptomic modifications.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3cb00081h |
| This journal is © The Royal Society of Chemistry 2023 |