Open Access Article
Gregor
Bajc
a,
Anja
Pavlin
a,
Małgorzata
Figiel
b,
Weronika
Zajko
b,
Marcin
Nowotny
b and
Matej
Butala
*a
aDepartment of Biology, Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia. E-mail: matej.butala@bf.uni-lj.si
bLaboratory of Protein Structure, International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
First published on 18th November 2024
DNA molecules are a promising data storage medium for the future; however, effective de novo synthesis of DNA using an enzyme that catalyzes the polymerization of natural nucleoside triphosphates in a user-defined manner, without the need for multiple injections of polymerase, remains a challenge. In the present study, we demonstrated that the bacteriophage abortive infection system reverse transcriptase AbiK from Lactococcus lactis facilitates such an approach. We employed surface plasmon resonance to monitor the polymerization of the DNA strand with a user-defined sequence of multiple segments through a sequential buffer exchange process. Using this method, we synthesized synthetic DNA with segments of random length and a sequence consisting of only three of the four natural nucleotides. The information is encoded using the absence of one nucleotide in each segment. We demonstrated that synthetic DNA can be stored on the chip, and when the DNA is released from the chip, the second strand can be synthesized and read by sequencing. Our setup facilitates a writing speed of one nucleotide in less than 1 s and holds enormous potential for synthesizing DNA for data storage.
Here, we present an innovative approach based on the catalytic activity of AbiK, a reverse transcriptase of the phage abortive infection system of Lactococcus lactis. AbiK, a plasmid-encoded protein comprising 599 amino acids (molecular size, 71.4 kDa; isoelectric point, 7.98), is active in its homohexameric form.14–16 It catalyzes the protein-primed synthesis of long, ssDNA strands with random sequences—independent of a DNA template—which confer resistance to phage infection.17 The first nucleotide is covalently bound to the priming tyrosine in the amino-terminal region of the finger subdomain, and subsequent nucleotides are incorporated with equal frequency.16,18 To our knowledge, previous studies have not demonstrated the ability of AbiK to synthesize oligonucleotides or polynucleotides in a controlled manner. In the present study, we used surface plasmon resonance (SPR) for sequential buffer exchange to control the DNA polymerization activity of AbiK in real time using natural nucleoside triphosphates. The results indicate that this system can be used to achieve the template-independent synthesis of ssDNA carrying a custom order of segments with sequences composed of selected three of the four nucleotides. This unique encoding method facilitates the information storage on the basis of the absence of one of the four nucleotides in each segment. We illustrate the technique by writing the word “DNA” and using a unique codec for the letters of the alphabet, which are written in nucleobases.
000 response units. The data were analyzed using the Biacore T200 evaluation software (GE Healthcare, Chicago, Illinois, USA).
![]() | ||
Fig. 2 Encoding information in DNA using the bacteriophage abortive infection system reverse transcriptase AbiK. (a) Schematic representation of the workflow depicting the synthesis of DNA with the sequence encoding the word “DNA” twice, the release of single-stranded DNA (ssDNA) from AbiK by proteinase K treatment followed by the temperature inactivation of the protease, the synthesis of the second DNA strand by the Klenow fragment, temperature inactivation of the Klenow fragment, and the nanopore sequencing of the synthesized DNA (created with BioRender, https://www.biorender.com/). (b) Sensorgram depicting the polymerization of ssDNA encoding “DNA” and “DNA”, followed by the injection of dNTPs or dGTP (G). Four different nucleotide mixtures containing three of the four nucleotides (dATP, dTTP, dGTP, or dCTP; mixtures: ACG, TCG, ATC, or ATG) were injected over immobilized AbiK (∼20 000 RU) in a sequence encoding the word “DNA”. A dilution of the single nucleotide dGTP was injected in the middle and at the end of the custom sequence. A mixture of all four nucleotides (dNTPs) was injected at the beginning and end of the experiment. All nucleotides or nucleotide mixtures were injected over AbiK at a concentration of 100 μM and a flow rate of 40 μL min−1 for the number of seconds indicated under the marked nucleotide or nucleotide mixture in the sensorgram. The experiment was performed in a running buffer containing 20 mM Tris (pH 8.3), 140 mM NaCl, 2 mM MgCl2, and 0.005% P20. The colored rectangles on the sensorgram represent selected sequences of the polymerized DNA. The detailed nucleotide sequences of DNA fragment 1 is shown in panel (c) and the sequences of other fragments in ESI† Table S2. The positions of the nucleotide sequences encoding the letters in fragments numbered 1–4 are deliniated next to the sensorgrams. (c) Nucleotide sequence of synthetic DNA fragment 1 that carries nucleotides encoding the word “DNA”. The different sections of DNA are color-coded according to the nucleotide triplet: ATC in green, TCG in blue, ACG in orange, and ATG in red. The nucleotide A shown in black is likely a result of a sequencing error. | ||
To store the information in synthetic DNA, we created a codec in which each letter of the alphabet is encoded by a unique combination of three DNA segments, with each segment containing only three of the four natural dNTPs (ESI† Table S1). Next, we prepared four 100 μM mixtures containing three dNTPs in the running buffer, in which either dATP, dCTP, dGTP, or dTTP was missing. Each mixture of nucleotides was injected at a flow rate of 40 μL min−1 as a substrate for AbiK to generate an ssDNA with the desired sequence encoding the word “DNA”. The word was written twice, and a homopolymeric DNA segment containing only guanine nucleobases followed the word, which facilitated the binding of an 18-mer poly-C oligonucleotide and subsequently the synthesis of the complementary strand using the Klenow fragment of Escherichia coli DNA polymerase I. At the beginning and end of the experiment, a mixture of all four nucleotides (100 μM each) was injected to generate short flanking sequences.
The SPR technology measures the change in refractive index at the surface of the SPR chip upon the binding of molecules, which is expressed in response units (RUs). We immobilized approximately 20
000 RU of AbiK, which corresponds to ∼20 ng of protein per square millimeter of the SPR chip surface. This estimate is based on a previous finding that a 1000 RU corresponds to a density of 1 ng mm−2 for globular proteins.19 To test whether the immobilized enzyme retains DNA synthesis activity and whether it accepts each of the four natural nucleotides, we injected 100 μM dATP, dCTP, dTTP, or dGTP, respectively, with a wash step between each injection. The injection of each nucleoside triphosphate resulted in an increase in the measured RU (∼80 to 250 RU per nucleotide), which was due to an increase in the local mass on the surface of the SPR chip, indicating that the nucleotides had been incorporated into the DNA chain (Fig. 1a). After the dNTPs were injected, the response remained stable, suggesting that the nucleoprotein complex detached only minimally from the chip. The RU signal decreased by 50% after the injection of benzonase (20 U mL−1 for 600 s), indicating nuclease activity.
We used SPR to examine whether the AbiK enzyme generates homopolymeric segments of ssDNA. We injected 18-mer homopolymeric oligonucleotides containing cytosine, adenine, or thymine monomers (oligo-C18, oligo-A18, and oligo-T18, respectively) both before and after the injection of the selected dNTP. We speculated that if AbiK synthesized ssDNA carrying a custom sequence of homopolymeric segments, the oligonucleotides would not anneal until a complementary strand was synthesized by AbiK. Indeed, the present findings suggest that oligo-C18, oligo-A18, or oligo-T18 hybridized to the ssDNA generated by AbiK only when the enzyme was prompted to generate a homopolymer with a sequence complementary to the injected oligonucleotide (Fig. 1b–f). This was signified by an increase in the RU after the injection of a selected oligonucleotide. The injection of benzonase (20–150 U mL−1) eliminated most of the DNA synthesized by AbiK; however, AbiK remained active after benzonase treatment, as indicated by the fact that the binding of oligo-A18 was observed only after dTTP was injected (Fig. 1b and f). Overall, these results suggest that the SPR system can be used to deliver selected deoxynucleotides to AbiK immobilized on the chip, which catalyzes the polymerization of homopolymeric nucleotide segments in a user-defined manner.
To demonstrate that information can be stored in DNA synthesized by AbiK, we designed an assembly to synthesize ssDNA encoding the word “DNA” and read the stored information (Fig. 2a). Notably, AbiK was prompted to synthesize a sequence encoding the word “DNA” twice, flanked by a hompolymeric guanine nucleobase sequence, which enabled data retrieval. The sensorgram indicates that AbiK on the chip incorporated the nucleotides with each injection, suggesting that the information was encoded in the DNA (Fig. 2b). However, on the basis of the sensorgrams alone, we cannot determine whether the activity of the AbiK enzyme molecules immobilized on the chip was homogeneous, that is, whether all enzyme molecules came into contact with nucleotides and used the incoming nucleotides to extend the DNA strand. Therefore, we released the synthetic ssDNA from the SPR chip by AbiK proteolysis and generated the second strand using the large (Klenow) fragment of E. coli DNA polymerase I and an oligo-C18 primer (Fig. 2a). Nanopore sequencing revealed that seven out of 89 sequences showed significant matches in the BLAST searches against a non-redundant database. These seven sequences were excluded from further analysis and the remaining sequences were considered as new fragments produced by AbiK. The DNA produced by AbiK was indeed heterogeneous, with regions of the DNA fragments carrying nucleotides for the corresponding letters of the alphabet ranging from a few hundred to 1012 bp (ESI† Table S2, ESI Data 1). Forty-two percent of the DNA fragments encoded at least one letter of the alphabet, i.e. they contained at least three different segments arranged in the correct order, with each segment consisting of only three of the four natural nucleotides. In the DNA fragments, multiple ∼10–400-bp-long stretches of DNA consisting of only three of the four nucleotides were detected (Fig. 2c and ESI† Table S2). Moreover, DNA regions in which the segments signified the injection patterns (Fig. 2a) and represented specific letters (ESI† Table S2) or even the whole word “DNA” (Fig. 2c) were observed. Nevertheless, not all ssDNA strands formed by AbiK on the chip were the same, as signified by the different lengths of DNA reads, different lengths of segments containing three nucleotides, inconsistencies with the injection pattern and some “bleeding” of the nucleotides between various segments (ESI† Table S2, ESI Data 1).
The present study provides proof of principle that the reverse transcriptase AbiK of the bacteriophage abortive infection system facilitates custom DNA synthesis. We used the AbiK enzyme, which—unlike TdT—shows no preference for the incorporation of some nucleotides over others and catalyzes protein-primed ssDNA synthesis;18,21 thus, only the dNTPs needed to be injected into the system, whereas the enzyme was immobilized on the solid phase. Furthermore, we used a unique DNA writing approach for information storage based on the absence of nucleotides, by injecting mixtures containing three of the four natural dNTPs. Using the SPR system, we directly monitored the activity of the AbiK enzyme. To our knowledge, this is the first report showing that DNA synthesized on the SPR chip can be released and sequenced. It is likely that a similar biophysical technique, such as biolayer interferometry (BLI), could be used analogously to SPR for the sequential buffer exchange and real-time monitoring of DNA polymerization by AbiK. The SPR method was preferred because it uses a continuous flow across AbiK immobilized on the sensor chip, whereas BLI relies on sequential dipping of the sensor tip in buffers containing mixtures of selected three nucleotides. We assumed that the dipping process could lead to nucleotide carryover from one buffer to another.
We hypothesize that other proteins belonging to the bacteriophage class 1 of the so-called unknown group and abortive infection (UG/Abi) reverse transcriptase family can also be used, but among those that were biochemically characterized, namely AbiK, AbiA, and Abi-P2, AbiK is the preferred enzyme because it incorporates all four natural dNTPs equally and synthesizes over 1000 nucleotide-long ssDNA products in vitro.18,22 In contrast, AbiA synthesizes ssDNA mainly containing adenosine and cytosine bases with a length of only about 150 nucleotides, while Abi-P2 produces ssDNA of only a few tens of nucleotides.16,18
However, controlling the AbiK system such that ssDNA is elongated at one-base resolution remains a challenge. Furthermore, the immobilized AbiK enzyme molecules do not uniformly take up the incoming dNTPs; this “heterogeneity” effect needs to be addressed. In addition, scaling up the system with an alternative solid support will increase the efficiency and throughput, making the system more cost-effective and feasible for large-scale DNA synthesis applications. We believe that, with further development, AbiK and—most likely—other reverse transcriptase proteins from the abortive bacteriophage infection system could offer significant potential for DNA data storage and other nanobiotechnological applications.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4lc00755g |
| This journal is © The Royal Society of Chemistry 2025 |