Minimal RNA self-reproduction discovered from a random pool of oligomers

The emergence of RNA self-reproduction from prebiotic components would have been crucial in developing a genetic system during the origins of life. However, all known self-reproducing RNA molecules are complex ribozymes, and how they could have arisen from abiotic materials remains unclear. Therefore, it has been proposed that the first self-reproducing RNA may have been short oligomers that assemble their components as templates. Here, we sought such minimal RNA self-reproduction in prebiotically accessible short random RNA pools that undergo spontaneous ligation and recombination. By examining enriched RNA families with common motifs, we identified a 20-nucleotide (nt) RNA variant that self-reproduces via template-directed ligation of two 10 nt oligonucleotides. The RNA oligomer contains a 2′–5′ phosphodiester bond, which typically forms during prebiotically plausible RNA synthesis. This non-canonical linkage helps prevent the formation of inactive complexes between self-complementary oligomers while decreasing the ligation efficiency. The system appears to possess an autocatalytic property consistent with exponential self-reproduction despite the limitation of forming a ternary complex of the template and two substrates, similar to the behavior of a much larger ligase ribozyme. Such a minimal, ribozyme-independent RNA self-reproduction may represent the first step in the emergence of an RNA-based genetic system from primordial components. Simultaneously, our examination of random RNA pools highlights the likelihood that complex species interactions were necessary to initiate RNA reproduction.


Introduction
The rst genetic system before the emergence of life may have been based on RNA, because RNA can simultaneously carry genetic information and catalyze chemical reactions. 1,2 This "RNA World" hypothesis is supported by the observation that all genetically encoded proteins are synthesized by RNA in the ribosome. 3 A crucial aim in the quest for an RNA-based genetic system is to nd self-reproducing RNA molecules. 4-6 A potential mechanism for RNA reproduction is template-directed polymerization of nucleotides, i.e., replication, as observed in extant life. However, despite signicant progress in improving nonenzymatic or ribozyme-catalyzed RNA polymerization, 7-10 the self-replication of these systems remains challenging. Previous studies have therefore explored alternative, simpler mechanisms for RNA self-reproduction through the assembly of oligonucleotides. 6,11,12 In view of recent clarication, we use the term "reproduction" to denote RNA copying in general by distinguishing canonical "replication" that follows templatedirected polymerization chemistry. 6,12 RNA reproduction has been demonstrated for ligase and recombinase ribozymes. [13][14][15][16] For example, a ligase ribozyme derived from the R3C ligase ribozyme 17 catalyzes the joining of two RNA substrates as a template to form a sequence identical to itself. 13 This ribozyme is the simplest self-reproducing RNA known to date in terms of its length (61 nt) and the number of components (two fragments: 13 and 48 nt). However, the ribozyme is still relatively large and was rationally designed, and it remains unclear how such a ribozyme and its components could have been prevalent in prebiotically accessible RNA mixtures, which were likely dominated by shorter (up to ∼20 nt) and random oligonucleotides. 18,19 The ligase ribozyme also requires 5 ′ -triphosphate activation, which necessitates an additional set of complex reactions. 20 Consequently, it has been proposed that the template-based self-reproduction of short RNA molecules independent of complex ribozymes may have emerged rst in the RNA World. 21,22 Self-reproduction of short nucleic acids has been studied mainly using DNA. Previous studies demonstrated the autocatalytic reproduction of chemically modied DNA oligonucleotides through template-directed ligation, although the reproduction was severely hindered by two tightly bound templates (or a template and its identical product aer ligation). [23][24][25] A recent study employed temperature cycling to overcome such template inhibition in the reproduction of chemically activated DNA. 26 Despite these efforts with DNA, the self-reproduction of short RNA oligomers is currently missing. Moreover, temperature cycling would be incompatible with RNA because, unlike DNA, RNA easily degrades at high temperature, which is accelerated by divalent metal ions 27,28 that commonly enhance ribozyme catalysis 28,29 as well as template-directed RNA synthesis. 30,31 Template-directed ligation of short RNA is achieved in the laboratory using terminal activation such as with 2 ′ ,3 ′ -cyclic phosphate (>p), 30 which readily forms in prebiotically plausible environments, 32,33 while recombination occurs directly-or in combination with spontaneous >p formation through hydrolysis of RNA, termed a/a ′ mechanisms. 31,34 Notable are recent studies that demonstrated that pools of short random RNA can undergo diverse intermolecular ligation and recombination, presumably in a templated manner. 31,35 In these populations, RNA products that form efficiently or that self-amplify are expected to be enriched. Thus, a close examination of the enriched products may lead to the discovery of efficient RNA reproduction via template-directed ligation or recombination. The identication of such reproducing RNA would also provide insights into the likelihood of the emergence of selfreproduction out of random chemistry.
In this study, we rst examined spontaneous ligation and recombination reactions in pools of short random RNAs and found that they can be detected more quickly than previously demonstrated. We observed the enrichment of RNA families with common motifs in multiple RNA pools. Subsequent analyses of the most enriched products and their variants led us to nd a short (20 nt) RNA oligomer that can self-reproduce via template-directed ligation of two 10 nt substrates. The RNA contains a 2 ′ -5 ′ phosphodiester bond, a linkage usually generated during non-enzymatic RNA synthesis. [36][37][38] Partly due to the non-canonical linkage, the RNA circumvented its dimerization and displayed a potential for exponential reproduction in an isothermal environment, although the restricted formation of an active complex with the substrates limited the amplication. Its autocatalytic properties and structures are somewhat similar to those of the previously developed self-reproducing ligase ribozyme. 13 These results demonstrate the rst example of minimal RNA self-reproduction independent of a ribozyme and also help understand the dynamics of primordial, random RNA pools.

Results and discussion
Incubation of short random RNA pools We investigated reactions in fully random 20 nt RNA (N 20 ), which was previously shown to undergo both ligation and recombination if pre-activated with >p. 35 We prepared N 20 and N 20 >p pools containing ∼3 × 10 14 molecules to cover all possible ∼10 12 sequences of 20 nt with redundancy (∼300 copies). Previous studies detected ligation and recombination in 16-20 nt random RNA pools (5-100 mM) only aer incubation for months or longer times in ice. 31,35 However, we found that, in the presence of high concentration (100 mM) of MgCl 2 , which promotes >p-mediated template-directed ligation and recombination, 30,31 both N 20 and N 20 >p pools (50 mM) generated detectable >20 nt products aer just a 2 day incubation, as visualized by denaturing polyacrylamide gel electrophoresis (PAGE) (Fig. 1A). Note that degraded fragments in the initial pools may also have contributed to the reactions.
To examine sequences enriched in the random RNA pools, we excised the elongated products (ca. 21-45 nt) in both N 20 and N 20 >p pools from a denaturing polyacrylamide gel and subjected them to RT-PCR and high-throughput sequencing (HTS). The RT-PCR was performed using the SMARTer technology, via poly-A tailing and following template switching during reverse transcription. We detected PCR products for both N 20 and N 20 >p pools only if they were pre-incubated for two days, con-rming recombination and ligation during the incubation (Fig. 1B). From the HTS data, we analyzed 374 357 and 412 461 reads of 21-45 nt products that were detected at least twice for the N 20 and N 20 >p pools, respectively. The majority of the products derived from the N 20 pool were 24-39 nt (95%) with a sharp drop-off above 39 nt (Fig. 1C), indicating that they were generated primarily by recombination, because a single recombination of two 20 nt RNAs could lead to a 21-39 nt product. On the other hand, the products in the N 20 >p pool were predominantly 24-40 nt (98%) with a sharp peak at 40 nt (Fig. 1C), suggesting that both recombination and ligation operated in the pool. It should be noted that recombination could occur either directly or indirectly via ligation on >p of a hydrolyzed RNA. 31 The nucleotide compositions in the <40 nt products of the random RNA pools displayed a slight enrichment in G at the both sides of a putative ligation junction between a cleaved RNA>p (<20 nt) and a 20-mer ( Fig. 1D and S1, † in the direction indicated by black arrowheads), which was more evident in the N 20 pool than in the N 20 >p pool. The results contrast with the previous studies that incubated random RNAs in ice and without MgCl 2 , where cytosine and/or uracil were particularly enriched as putative phosphate donors. 31,35 The predicted secondary structures of the products tended to be more stable than those of random sequences of the same sizes and nucleotide compositions ( Fig. S2 †), consistent with a previous study. 35

Identication of enriched RNA families
If the RNA products were synthesized by previously identied ligation or recombination mechanisms, 30,31,34 20 nt sequences in the original pools should remain intact at the 5 ′ or 3 ′ end of the products, consistent with the enrichment of specic nucleotides at the putative junctions ( Fig. 1D and S1 †). Thus, we grouped the most abundant 10 000 products from each pool of N 20 and N 20 >p into families based on sequence similarity around the 5 ′ or 3 ′ terminus. Products differ from the most abundant sequence of each family by seven or fewer edits for the 21 nucleotides at each end. When grouping N 20 -derived products by their 3 ′ ends, we observed a highly enriched family, named N 20 -f1, that comprised ∼1.5% of all analyzed products. This family was 2.4-fold more abundant than the second most enriched family ( Fig. 2A). The N 20 -f1 family consists of 93 sequences that were well aligned at the 3 ′ end (Fig. 2B). More than 80% of them contained common nucleotides at positions 1, 2, 4, 13-15, 17, and 19-25 from the 3 ′ end (indicated by the black lines), while nucleotides at other positions were relatively random. Likewise, when grouping the N 20 >p-derived products by their 3 ′ ends, we found an enriched family with a similar set of sequences, N 20 >p-f1 (Fig. 2B). Although N 20 >p-f1 was the most abundant in the pool, the frequency was comparable to other low-rank families and comprised ∼0.6% of the analyzed products ( Fig. 2A). 17 sequences were commonly found in both N 20 -f1 (18%) and N 20 >p-f1 (43%). The enrichment of specic families was less clear when grouped by the 5 ′ end ( Fig. 2A). Other high-ranked families are described in Fig. S4; † some of them have similar nucleotide compositions to N 20 -f1 and N 20 >p-f1. We also note that in the same analyses using the synthetic sequences ( Fig. S2 †), unsurprisingly, the most enriched families represented only ∼0.2% for each set, and their components did not align at all.
RNA sequences in N 20 -f1 and N 20 >p-f1 displayed a common stem-loop structure at positions 11-27 nucleotides from the 3 ′ end, with ve consecutive base pairs and a seven-base loop  35 The compositions of the original 20 nt pools were displayed for comparison. Arrowheads indicate putative ligation junctions. (Fig. 2C). The stem-loop region contained the majority of the commonly observed nucleotides, as represented in the most dominant sequence in N 20 -f1, named f1-1. Secondary structural prediction showed the same stem-loop structure at the same positions in 68% and 52% of $27 nt sequences in N 20 -f1 and N 20 >p-f1, respectively. In addition, only 7% of the RNAs in either family could form more than ve base pairs in the stem region, underscoring the dominance of the specic stem-loop structure.
The enrichment of RNA families with shared nucleotides and structures in the random RNA pools encouraged us to investigate how these sequences could have been synthesized. As they were observed in both N 20 and N 20 >p pools, they should form via recombination. The conserved 3 ′ region in the RNA of varying lengths, in conjunction with the current understanding of recombination mechanisms, suggests a two-step a/a ′ recombination, wherein hydrolysis forms >p at the 3 ′ end of one RNA, followed by ligation of the 5 ′ -OH of another RNA to the >p. 31 If the ligating RNA is 20 nt long, as in the original pools, the probable recombination junction was between the oobserved C and U at positions 20 and 21 from the 3 ′ end. We rst tested whether f1-1 (29 nt) can form through this mechanism by splitting f1-1 into the rst 9 nt attached with >p (i.e., fragment A) and the remaining 20 nt (i.e., fragment B) (Fig. 3A) so they could undergo ligation, the second step of a/a ′ recombination. In a 2 day incubation of A and B, we detected f1-1 with ∼0.2% yield ( Fig. 3B and C). It is important to note that this reaction may not strictly reect what happened in the original random RNA pools because other RNAs could have been involved.
We also tested recombination directly by attaching 11 nt random nucleotides to A (A N 11 ). Incubation of A N 11 with B did generate a distinguishable product whose length is similar tobut slightly longer than-f1-1 ( Fig. S5A and S5B †). Sequence analysis of the product revealed that it was predominantly f1-1 with a G inserted between positions 20 and 21, named f1-1 G (Fig. S5C †). We conrmed that the addition of a G at the 3 ′ end of A (A G ) (Fig. 3A) signicantly enhanced its ligation with B ( Fig. 3B and C). We also examined the effect of other nucleotides A, U, or C at the same position (A A , A U , or A C ) for ligation with B. The fragment A A exhibited improved ligation but less efficiently so than A G , whereas A U and A C did not show enhanced ligation (Fig. S6 †). These variant RNAs were not detected in the products derived from the N 20 and N 20 >p pools, despite only a single nucleotide difference from f1-1 and high capacity for synthesis, highlighting the difficulty of understanding reactions in random RNA mixtures based on an examination of only a small number of isolated RNAs.

Discovery of a minimal self-reproducing RNA
We noticed that the common stem-loop structure in N 20 -f1 and N 20 >p-f1 (Fig. 2C) and their variants with the G insertion could catalyze the ligation between the 5 ′ and 3 ′ regions of themselves as a template, i.e., self-reproduction (Fig. S7A, † 4A, and B). In particular, nucleotide pairings around the ligation junctions upon ternary complex formation could enhance the ligation by positioning the termini of the two RNA substrates more proximally. We tested this hypothesis using the stem-loop regions of f1-1 and its variant with G at the ligation site, named T and T G , respectively (Fig. S7A † and 4A). We incubated 20 mM each of the 5 ′ regions with >p (A or A G ) and the 3 ′ region (B S , the rst 10 nt of B) for 2 days in the absence or presence of 20 mM T or T G . Whereas T improved ligation between A and B S only slightly (∼1.4 fold) ( Fig. S7B and S7C †), T G enhanced ligation between A G and B S far more noticeably (∼21-fold) ( Fig. 4C and D), demonstrating possible self-reproduction. We also tested the same reaction using A A , A U , and A C and corresponding templates (T A , T U , and T C , respectively) instead of A G and T G (Fig. S8 †). Although T A and T U catalyzed ligations between A A or A U and B S , their spontaneous ligations relative to the templatedirected reactions were more productive than that of A G and B S . The fragment T C did not affect the ligation between A C and B S .
Ligation between >p of A G and B S could generate two possible phosphodiester bonds, either 3 ′ -5 ′ or 2 ′ -5 ′ linkages (Fig. 4A). Using ribonuclease (RNase) T1, which selectively cleaves G3 ′ -p-5 ′ N linkages of unpaired nucleotides, we determined that the ligation catalyzed by T G primarily formed a 2 ′ -5 ′ linkage (Fig. S9 †). Next, we prepared T G containing a 2 ′ -5 ′ linkage at the ligation junction and named it T G ′ . We conrmed that T G ′ catalyzed the same ligation reaction to generate more of itself ( Fig. 4C and S9 †), demonstrating true self-reproduction (Fig. 4B), although the extent of catalysis was approximately half than that of T G (Fig. 4D). Whereas previous studies found that RNA containing a fraction of 2 ′ -5 ′ linkages can assist nonenzymatic RNA polymerization 39 and retain functions as aptamers or ribozymes, 40 our study further showed that such RNA can also self-reproduce. A time course experiment revealed the gradual appearance of T G ′ , with the reaction slowing aer a 2 day (48 h) incubation (Fig. 4E, F and S10 †). The yield of T G ′ was positively increased with the concentration of initial T G ′ , demonstrating its autocatalytic ability. The ligation between >p of A G and B S was conrmed by control reactions performed in the absence of >p or B S , which showed negligible T G ′ reproduction (Fig. S11 †). We also found that the self-reproduction of T G ′ was substantially enhanced at high concentration of Mg 2+ (100 mM MgCl 2 ) and temperatures around 22°C (Fig. S12 †), the condition used for incubating the original random RNA pools (Fig. 1A). Next, we examined the formation of higher-order complexes among A G , B S , and T G ′ by native PAGE aer co-incubating one, two, or three of these RNAs containing uorescently labeled T G ′ (FAM-T G ) or A G (FAM-A G ) for 6 h (Fig. 5A). In this experiment, A G contained a monophosphate (-p) instead of >p at the 3 ′ end to preclude ligation to B S (Fig. S11 †). When incubating only T G ′ , we found that the majority of T G ′ existed as a T G ′ monomer, with only a fraction (∼11%) forming a T G ′ $T G ′ dimer (Fig. 5B). A T G ′ $T G ′ dimer is presumably a simple self-complementary template dimer (Fig. S14 †), but two T G ′ molecules may also interact by forming a kissing loop. The prevention of the formation of a T G ′ $T G ′ dimer was partly due to the 2 ′ -5 ′ linkage, which signicantly reduced the dimerization of T G ′ (Fig. S13 †), consistent with previous studies showing the diminished thermal stability of RNA duplexes in the presence of 2 ′ -5 ′ linkages. 40,41 The amount of T G ′ $T G ′ increased to 23-27% in the presence of either A G or B S . However, in the presence of both A G and B S , the total amount of the T G ′ $T G ′ dimer and a T G ′ $A G $B S ternary complex decreased to ∼3.8%. When incubating the three RNA molecules with FAM-A G , we detected the formation of a comparable amount of the T G ′ $A G $B S complex. In addition, we found that the majority (∼80%) of A G was bound to B S , and thus most of the substrates were not freely available, which could explain the low percentage of the T G ′ $A G $B S complex formation and the limited self-reproduction of T G (Fig. 4E). The high availability of T G ′ as a monomer implies its potential to undergo non-linear amplication by circumventing the strong association of two self-complementary T G ′ molecules that form aer ligation of A G and B S (Fig. 4B). A common way of examining such a possibility for a template (or an autocatalyst) is to t the initial rate of its own production to the model of selfreproduction: 13,14,23,24,42 where k a , k b , and p represent the autocatalytic rate enhancement, the background reaction rate, and the reaction order, respectively. We doped varied concentrations of T G ′ into a mixture of xed concentrations of A G and B S and investigated the enhancement of the initial reaction rate (Fig. 4F and 5C). The concentrations of T G ′ were chosen so that the fraction of the T G ′ $A G $B S complex was sufficiently small compared with the total amount of substrates 42 (cf. Fig. 5B), as in a previous study. 13 As expected, the initial rate of T G ′ formation increased with the initial concentrations of T G ′ . Furthermore, the initial rates can be t well (R 2 = 0.996) with the self-reproduction equation by assuming p = 1, corresponding to exponential growth. This result indicates the potential of T G ′ to undergo exponential selfreproduction. We estimated k a and k b as 0.0011 ± 0.000069 h −1 and k b = (0.0045 ± 0.00070) × 10 −6 M h −1 . The autocatalytic efficiency (k a /k b ) 42 of T G ′ (2.4 × 10 5 ) is comparable to or lower than a much larger recombination or ligase ribozyme, 13,14 while higher than DNA-based self-reproduction systems [23][24][25] with the caveat that they have smaller reaction orders (p = ∼0.5).
The RNA molecule T G ′ shares many similarities with the previously engineered 61 nt self-reproducing ligase ribozyme, 13 although they catalyzed different ligation chemistries (Fig. S14 †). The ribozyme catalyzes the attack of the 3 ′ -OH of an RNA substrate on a 5 ′ triphosphate of another substrate in a template-directed manner and generates a ligated product identical to the ribozyme. Its self-reproduction was limited because of the strong association of the two substrates, as is also observed in T G ′ (Fig. 5A). Nevertheless, both systems exhibited high apparent autocatalytic reaction order (∼1) in an isothermal environment as a consequence of the weak selfbinding of the templates, compared to other nucleotide-based template-directed self-reproduction systems that showed an order of ∼0.5. [23][24][25] This could be partly attributed to the intramolecular structural formation of a template, G:U wobble pairs that can facilitate template-directed ligation while supporting dissociation of a duplex, 43 and multiple thermodynamically unfavorable bulges in a dimer, 44 all of which are commonly observed in both T G ′ and the ligase ribozyme (Fig. S14 †).
The limited self-reproduction of T G ′ resulted from multiple factors. First, the 2 ′ -5 ′ linkage, while reducing the dimerization of T G ′ , decreased the ligation efficiency (Fig. 4D). Second, T G ′ did not efficiently form an active complex with the substrates A G and B S because most of the two substrates bound to each other and were not freely available (Fig. 5A). These limitations may be overcome if strong chemical activation is adopted instead of >p or in environments that periodically experience low pH, high temperatures, or low MgCl 2 concentrations, which destabilize RNA-RNA interactions (e.g., the association of substrates). [45][46][47] Alternatively, as demonstrated for a self-reproducing ligase ribozyme, 48 directed evolution with T G ′ as the parent RNA may also identify highly efficient reproduction of oligonucleotides in a constant environment. It was shown that only a slight difference, including two critical mutations, was sufficient to convert the original ligase ribozyme 13 (Fig. S14 †) into a continuously self-reproducible RNA. 49 Thus, it is conceivable that there may be a short RNA oligonucleotide capable of unlimited self-reproduction, in a sequence space accessible from T G ′ by natural selection.

Conclusions
We demonstrated a form of minimal RNA self-reproduction driven by prebiotically plausible chemistry, providing a potential missing link between abiotic oligomers and the eventual emergence of a genetic system. The 20 nt RNA, T G ′ , accelerated >p-dependent ligation between two 10 nt substrates, A G and B S , as a template for generating identical T G ′ molecules (Fig. 4C and S9 †). Such self-reproduction of RNA could have occurred in the RNA World because RNA of these lengths can be generated nonenzymatically, 18,19 and >p can also be readily formed by spontaneous RNA hydrolysis or with prebiotically plausible reagents. 32,33 Although >p is eventually hydrolyzed to monophosphates, in situ reactivation back to >p 33 could extend the self-reproduction of T G ′ , which is currently limited (Fig. 4E). The self-reproduction was also supported by a 2 ′ -5 ′ phosphodiester bond, which is thought to have been prevalent in primordial RNA pools as generated in typical non-enzymatic RNA synthesis. 36-38 Short RNA molecules capable of self-reproduction by template-directed ligation, as shown in the present study, has been proposed as the earliest stage toward the evolution of complex replication ribozymes. 21,22 Our results complement this view and help delineate the development of RNA-based genetic systems during the origins of life. Our results also give insights into the dynamics of short random RNA mixtures. From completely random pool of 20-mers, we identied a discrete class of related, enriched sequences of which f1-1 appeared to be a canonical representative. The fragment T G ′ is a truncated version of f1-1 G , a single-mutation variant of f1-1. Both f1-1 G and f1-1 were accessible products in both N 20 and N 20 >p pools explored in the present study. However, while f1-1 was highly enriched in both random RNA pools along with many related sequences (e.g., N 20 -f1 and N 20 >p-f1), f1-1 G was undetected even at a low frequency. On the other hand, biochemical analyses revealed the superiority of f1-1 G to f1-1 for its formation through simple ligation of two substrate fragments ( Fig. 3B and C). This discrepancy may imply the involvement of other RNA species for the synthesis of f1-1 in the random RNA pools. In the chaos of primordial soup, it is without question that a complex ecology of chemical reactions must have given rise to enriched species sets. 50,51 A previous study also reported the inefficient synthesis of some products isolated from random RNA pools. 35 Altogether, our results highlight the difficulty of inferring dominant reactions in random RNA mixtures from the analyses of isolated sequences. Nevertheless, the information obtained from examining the random RNA products was valuable in the discovery of the minimal self-reproducing RNA, which exhibited its highest activity in the original environment where the random RNA pools were exposed (Fig. S12 †). Future experiments exploring the synthesis of f1-1, f1-1 G , or T G ′ in combination with random RNA mixtures would give more insights into the likelihood of the emergence of self-reproduction in a primordial RNA soup.

Data availability
The data supporting the ndings of this study are available from the corresponding author upon reasonable request.

Author contributions
R. M. and N. I. designed the project. R. M. performed experiments, analyzed data, and wrote the paper with comments from N. I.

Conflicts of interest
There are no conicts to declare.