Meng
Yu
abc,
Xiaohui
Tang
ac,
Zhenhua
Li
ac,
Weidong
Wang
c,
Shaopeng
Wang
d,
Min
Li
d,
Qiuliyang
Yu
e,
Sijia
Xie
*abc,
Xiaolei
Zuo
*d and
Chang
Chen
*abcf
aInstitute of Medical Chips, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, 200025, Shanghai, China. E-mail: sijia.xie@shsmu.edu.cn; chang.chen@shsmu.edu.cn
bSchool of Microelectronics, Shanghai University, 201800, Shanghai, China
cShanghai Industrial μTechnology Research Institute, 201800, Shanghai, China
dInstitute of Molecular Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 200127, Shanghai, China. E-mail: zuoxiaolei@sjtu.edu.cn
eShenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518055, Shenzhen, China
fState Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, China
First published on 18th March 2024
With the explosion of digital world, the dramatically increasing data volume is expected to reach 175 ZB (1 ZB = 1012 GB) in 2025. Storing such huge global data would consume tons of resources. Fortunately, it has been found that the deoxyribonucleic acid (DNA) molecule is the most compact and durable information storage medium in the world so far. Its high coding density and long-term preservation properties make itself one of the best data storage carriers for the future. High-throughput DNA synthesis is a key technology for “DNA data storage”, which encodes binary data stream (0/1) into quaternary long DNA sequences consisting of four bases (A/G/C/T). In this review, the workflow of DNA data storage and the basic methods of artificial DNA synthesis technology are outlined first. Then, the technical characteristics of different synthesis methods and the state-of-the-art of representative commercial companies, with a primary focus on silicon chip microarray-based synthesis and novel enzymatic DNA synthesis are presented. Finally, the recent status of DNA storage and new opportunities for future development in the field of high-throughput, large-scale DNA synthesis technology are summarized.
Generally, data can be categorized into “hot data” (frequently accessed on a daily basis, for example, logs and emails) and “cold data” (characterized by long-term storage requirements, such as archives, surveillance videos, and backup files, etc., whose significance may only be obvious over time), based on their access frequency. Moreover, the statistics indicate that more than 60% of all data are going to become archival so that it is natural to observe that some “hot data” are transformed into “cold data” eventually. Before humans established the digital world, such cold data were stored in media such as paper, films and later on, tapes in specific warehouses under delicate environmental conditions (temperature, humidity, UV light exposure, etc.) to prevent damage or degradation of the physical form of these media. Although these cold data are rarely needed, they carry crucial archival and evidential information that can be highly valuable in certain circumstances where historical investigation is expected. Currently, in order to store (and to transfer, when necessary) these huge amounts of “cold data” with the growing trend, tapes and hard disk drives are commonly utilized thanks to their advantages of low cost, large scale, and environmental compatibility. However, these magnetic recording siblings can only serve for merely a few decades, while “cold data” are usually required to be stored for 50 and even more than 100 years.3 In order to preserve data integrity and mitigate the potential risk of data loss resulting from storage device failures, it is imperative to routinely transfer and back up data. The maintenance and periodic transference of “cold data” to new devices gradually become burdensome and risky in terms of data preservation. In recent years, the development of the Cloud technique has greatly eased the transportation as well as transfer of the big data among devices, even remotely, while its reliability and potential cost for long-term data storage are yet unclear. The extended lifespan of DNA largely mitigates this need for DNA data storage. Plus, storing massive amounts of data consumes huge quantities of energy. According to reports, data centers consumed 205 terawatt hours (TW h) of electricity in 2018 and this number is expected to be 210 TW h in 2023 and estimated to reach 1929 TW h in 2030.4–7 Therefore, investigating promising data storage technologies with higher capacity, lower cost, and long-term stability is still an urgent essential for today's information society. Fortunately, it is found that DNA may be a “game-breaker”.8
The most advanced transistor was realized by IBM in 2021, having a size down to 2 nm with the help of the 2-nm process.9 In a digital circuit, a bit is stored by a function unit (e.g., a flip-flop or a latch) which is made of several transistors. Thus, the denser the transistors are fabricated on a chip, the higher storage capacity the chip in principle will have. 3D memory chips were developed to further increase storage capacity by vertically stacking multiple layers. For example, Micron released a 232-layer 3D NAND with a chip size of about 70.1 mm2 which holds 1012 bit data.10 DNA is the storage medium for genetic information in biology. It is a macromolecular polymer composed of deoxyribonucleotides (hereafter referred to as nucleotides). Nucleotides are composed of phosphate, deoxyribose, and their attached bases. There are four types of bases in DNA: adenine (A), guanine (G), cytosine (C) and thymine (T). The sequence of these four bases on a single nucleotide strand of DNA holds the biological genetic information. A single nucleotide, which has a size of about 0.3 nm, is considered the most basic unit of biological genetic information. Currently, the digital coding density of DNA can reach max. 2 bits per base.11–15 According to the evaluation in the literature, the data density of flash can reach 1016 bits per cm3, and hard drive 1013 bits per cm3, while DNA up to 1022 bits per cm3 in principle.16–20 From this perspective, a nucleotide as the basic element for digital data storage is potentially a more effective candidate compared with the transistors, given that the size of the latter is not probable to be further reduced unless there will be sub-nanometer, large-scale semiconductor fabrication techniques. Of course, the overall capacity of DNA storage is limited by the length of a generated DNA strand, the number of copies required for each strand considering the practical aspects during synthesis, storing and sequencing, as well as the speed and throughput for DNA synthesis and sequencing, which will be discussed in detail later in this review. Additionally, if only the physical size is considered, DNA seems a much lighter-weighted medium than the conventional media such as tapes or hard disks. In this sense, the physical transfer of massive data in DNA would be much easier. Although it still requires intense development to facilitate an integrated and mature process flow of DNA data storage and transfer, in the foreseeable future, it is optimistic to predict that transferring data with DNA will be an easier and more resource-efficient way than using the current storage media. Besides, its high chemical stability, relatively low maintenance cost, and almost non-commutative nature of the format for billions of years also make DNA an excellent candidate for storing the “cold data” of human society.19,21 Given the rapid iteration in the field of information technology as well as biotechnology, it seems positive that DNA, the information carrier from ancient times, would be provided a way to be compatible with future technologies. Although the whole storage process is currently rather complex and costly, the technical difficulties are expected to be solved in the near future. Using DNA for data storage is considered to be the next-generation revolution in the storage of the “cold data”.
Furthermore, the rapid development of semiconductor technology has accelerated the iterative improvement of storage media. Memory chips based on integrated circuit technology have been developed for flash memory, memory cards, U-disks, hard disk devices, etc., whose capacity and processing speed have been greatly improved. Under this prerequisite, the combination of DNA storage with semiconductor technology is expected to create more superb possibilities for data storage. Similar to the data storage processes of semiconductor devices, DNA data storage also includes encoding, writing, preservation, reading, decoding, etc. Compared with the data storage mechanism of electronic devices, the digital data are “written” by DNA synthesis technology to couple nucleotides one by one according to designed sequences and then “read out” by DNA sequencing technology. The core of this pioneering massive data storage technology is to achieve high-throughput and high-speed data writing and reading. To realize this, a fast and massive data writing approach, namely DNA synthesis, is a critical aspect of this technique.
In the field of DNA data storage, there are already a large number of previous works on topics such as coding/decoding algorithms,14,17,22–24 error correction mechanisms,15,20,25 preservation methods,19,26,27 and overall reviews.12,18,28–35 Complementary to these articles and reviews, here, our discussion mainly focuses on the field of hardware manufacturing that is committed to the use of silicon-based chip technology to develop micro and nano scale integrated chips for high-throughput artificial DNA synthesis platforms: The history and current situation of DNA synthesis technology are introduced. Several high-throughput array-based DNA synthesis technologies are described in detail, and the state-of-the-art representative companies are listed. The advantages and limitations of each technology are comprehensively compared from the perspective of DNA data storage-oriented applications. Besides, we outline the evolution of next generation enzymatic DNA synthesis technology, as well as the new opportunities it brings to DNA data storage, together with the semiconductor chip technology. Finally, we analyze the performance gap between DNA data storage and current storage devices from the perspective of synthesis and sequencing, and propose future directions for the development of this technology.
As mentioned above, data storage with DNA involves several steps (Fig. 1): encoding, writing, preservation, reading, and decoding.8,12,18,28–35
Specifically, “encoding” refers to the conversion of the binary digital data (0/1) of the initial data into DNA base sequences (A/G/C/T) arranged in specific order according to the corresponding coding algorithms, such as: Huffman-code,14,46 Fountain code,23 Reed Solomon (RS) code,15,45 and yin-yang code (YYC).47 Usually, the final base sequences don't only contain valid data fragments, but may also include assisting fragments such as (1) primers located at both ends of a strand. They are used for the polymerase chain reaction (PCR) when data amplification or random access is needed. To prevent a higher melting temperature for the PCR process and a larger loss of storage density, the length of this region is usually limited to less than 25 nt (nt represents the unit of oligonucleotide strand length), but cannot be much shorter without sacrificing diversity in addresses. When storing 1 GB of data with 200 nt long DNA molecules, the primer fraction ratio is estimated to be 6.5%.12 There are encoding schemes which skip the primers or only include them on one end of the synthesized sequences48–50 while some of them may potentially sacrifice the data storage capacity by occupying the valuable data payload space. (2) Address fragments. They help mark the locations of the data; (3) error correction code areas. They are for adding logical redundancy to deal with errors such as deletion, insertion, and substitutions that could happen during the process of synthesis and sequencing. They play an important role in retrieving lost sequences and correcting errors for reliable DNA-based data storage. Normally, the length of currently synthesized DNA is limited to 200 nt with high accuracy, so the total data have to be segmented and jigsawed into a whole piece later on. Therefore, the assisting regions in each segment bring a lot of redundancy as they occupy the space for data payload (about 20 nt at both the front and back ends), limiting the storage capacity and also decreasing the storage density (by about 15% per address region).23,49,51 In this sense, using a long nucleotide strand for DNA data storage would in principle ease the coding and decoding, and reduce the complexity of data generation and retrieval.
Following the sequence encoded in the previous step, “writing” refers to the process of coupling base monomers through chemical and biological reactions, resulting in the assembly of segments one by one. The obtained single-stranded DNA (ssDNA) fragments are known as an “oligo pool” equivalent to a database. This is a key step in actually turning digital data into molecular structures. The level of DNA synthesis technology determines the overall quality of data storage.
“Preservation” usually involves the assembly of synthesized ssDNA into more chemically stable dsDNA, which is subsequently purified and stored in vitro23,45 (in the forms of frozen powder or solution, or capsuled in micro/nanoparticles15,48) at a low temperature or preserved in vivo39,52–54 (in plasmids55 or artificial chromosomes56 in bacteria, or integrated in the genome of a living organism57–59) by genetic technology including transfection, clustered regularly interspaced short palindromic repeat-associated (CRISPR–Cas) or recombinases for long-term preservation.60,61 Recent research studies have mentioned the use of ssDNA attached to the dsDNA62 or in the storing sample49 for better data access. However, ssDNA is prone to coil conformation due to the lack of the double helix secondary structure, and it suffers from damage by environmental conditions (e.g., temperature, humidity, UV irradiation, oxidation, etc.) more easily than dsDNA. For in vitro preservation, although ssDNA can be stored as synthesized for up to 2 years63–66 under optimized conditions, dsDNA is still the “safer” option when longer preservation time is expected. Nevertheless, when using ssDNA as the data storage medium, the technical benefits of not first having to go through the complex process of preparing ssDNA into dsDNA may well outweigh its disadvantages in long term stability. Owing to the higher efficiency of DNA replication in living organisms, the information encoded into DNA stored in vivo can be replicated with greater accuracy and speed than through in vitro preservation.29,67–69 In addition, it has been shown that information preserved in vivo enables dynamic data access and editing of information with single-base resolution.52,70 However, further investigation is still needed for better understanding and optimizing the compatibility, stability, and functionality of the input DNA.70–72 In addition, specific laboratory environments are often necessary for ensuring the genetic stability as well as the viability of the living organisms, unless considering using the tenacious candidates.73 For DNA data storage applications, the most favorable method should be selected according to the frequency of access to data required in different scenarios. Generally, for DNA data storage, the amount of each sequence of the synthesized DNA is at a trace scale, because large-scale parallelization is needed to increase the speed and data density of the “writing step”. In order to ensure the effectiveness and reliability of the data, PCR amplification technology is necessary to increase the concentration of the synthesized products and backup the data as well.74–76
“Reading” means that the stored DNA molecules are extracted by biochemical methods, and the target base sequences are identified one by one to obtain the written coding data. DNA sequencing technology is used to read and splice the base sequences carried by DNA fragments in the oligonucleotide pool. In early studies, data stored in DNA required sequencing all of the molecules. Later on, PCR-based random-access techniques were developed,17,45 allowing random access to a portion of the data without sequencing all of the DNA in the oligonucleotide pool. As a new trend, array-based technologies for DNA data storage may ease the workload for the PCR because the synthesized DNA is confined or immobilized to the designed location on the array and can be addressed directly via the chip. Approaches such as spot-specified digital microfluidics,77 sequencing-by-synthesis,78 DNA microdisks,79 and SlipChip80 have contributed to a further step towards high manipulability and rapid access. The DNA is conjugated to the surface of the chip and is not damaged or lost during replication, which also allows for easy access and handling, reducing the need for PCR primer selection and large-scale PCR amplification.
Finally, “decoding” is the reconversion of base sequences into digital data and further restoration to the original format of the data. In the whole workflow of a storage cycle, biological and chemical reactions take on the function of writing/reading data.
Using DNA for data storage has several attractive advantages: (1) high storage density. Considering a coding density of 2 bits per base,13 DNA would have a theoretical data density of 6 bits per nm,33 given that a nucleotide is ∼0.3 nm long. If we only consider the nature of the DNA molecule and put aside the complexity of the practical aspects including data retrieval, 1 gram of DNA can store about 4.5 × 107 GB of data given that only a single copy of each unique DNA sequence presents the mixture, while the current technology only stores 10 terabyte (TB) on a 600 g HDD, which is 6 orders of magnitude difference.17,33,81 For the possibility of fully retrieving the data, to use as few as 10 copies of per sequence in the mixture would result in a storage density of 17 EB g−1,17 which is still a significant improvement compared with the current HDD. (2) Long preservation time and durability. Under suitable conditions (e.g., at room temperature in a dry atmosphere, or lyophilized powder), DNA can remain stable for thousands of years and withstand temperatures as low as −196 °C (liquid nitrogen) and as high as 250 °C (silica).82–85 As for magnetic, silicon-based storage devices, the requirements for humidity, temperature and magnetic fields in the environment are stringent and the lifetime usually does not exceed 50 years.15 However, long-term storage of DNA molecules also does face some risks.68 For example, the stored data may be contaminated by bacteria or human DNA.67 In addition, natural DNA is highly susceptible to degradation by microorganisms and nuclease enzymes in the natural environment, while environmental factors can cause strand breaks, hydrolytic damage and UV-induced cross-linking, all of which can lead to partial data loss. Mirror-image DNA has the same storage density as natural DNA, but also has a unique bio-orthogonality, which prevents it from being easily degraded by microorganisms and nucleases, and it is successfully utilized in orthogonal information storage.86–88 (3) Low maintenance cost and environmental friendliness. The inherent durability of DNA renders it highly amenable to preservation. Compared with the regular maintenance of conventional long-term storage equipment which consumes a lot of electricity, energy, and land resources, the energy to store the DNA is almost negligible.15,28,89 In addition, the data stored in DNA can be easily backed up by PCR technology. A helpful comparison of the main performance indicators for various storage media was given by Linda C. Meiser et al. in 2022 (Fig. 2).31 Although current DNA synthesis methods cannot completely avoid using toxic chemicals, even in the case of enzymatic synthesis, DNA is still a more friendly option for data storage media compared with its opponents, as DNA is biodegradable90 and requires less heavy metals and rare elements for synthesis.31
![]() | ||
Fig. 2 A comparison of the various storage methods in terms of lifetime, capacity and cost. The cost of mainstream media is derived from the average consumer market price. The data survey was carried out during the writing period of ref. 31. Reproduced from ref. 31 with permission from Copyright 2022 Springer Nature. |
In 2020, the world's leading enterprises including Microsoft Research, Illumina, Western Digital, Twist Bioscience, etc., founded the international organization “DNA Data Storage Alliance”. As the association is growing, the total number of members has now exceeded 40. It brings together the world's state-of-the-art information technology, DNA artificial synthesis, DNA sequencing, and integrated circuit manufacturing industries.
Their mission is to create and promote an interoperable storage ecosystem based on manufactured DNA as a data storage medium. The alliance launched its first version of white paper in 2021,3 outlining the background, strategy, and technical development of DNA data storage. It seems promising that the establishment of the consortium will accelerate the cross-fertilization and breakthrough progress of data encoding technology, high-throughput DNA synthesis technology, and sequencing technology, and will vigorously promote the process of DNA data storage technology.
However, current DNA data storage technology still faces several challenges: (1) low throughput and speed. At present, the throughput of synthesis technology and sequencing technology is far not high enough for data storage, particularly the synthesis technology. Enzymatic synthesis offers a higher speed for “data writing” compared with the chemical approach. It has been demonstrated that the coupling time of enzymatic synthesis can be minimized to 10–20 s, while that of the chemical phosphoramidite synthesis is usually in the range of 4–10 min.91,92 Lee et al. gave an estimate of 40 s per cycle for enzymatic synthesis, which is six times faster than phosphoramidite synthesis. However, this rate is still much slower than that of the state-of-the-art electronic devices.93,94 (2) Difficult data access. Unlike conventional storage devices, it is not yet feasible to access random parts of the data or modify them in DNA molecules on a single device. (3) Workload in large-scale data reproduction. Although the PCR is no doubt a powerful tool for nucleic acid amplification and is generally acknowledged to be a high-fidelity process, it introduces bias by e.g., the GC content in the strand, which may cause loss of the data strands containing a high GC content during amplification. This would lead to a significantly different proportion of the sequences when the PCR cycles are large,95,96 and affect both data storage capacity and retrieval efficiency. Also, it is still difficult to amplify highly repetitive sequences by the PCR.97 (4) High complexity and costs of integration. Most of today's DNA data storage strategies are realized on separate devices and locations for synthesis, preservation, replication, and sequencing sessions, making the process complex and time-consuming. Besides, although the average cost of sequencing genes per TB of data in 2021 was only $0.006 (calculated based on the production cost of sequencing one million bases, including equipment, reagents, administration, and overhead costs), significantly decreased from $5292.39 in 2001, according to the National Human Genome Research Institute (NHGRI),98 the cost of DNA synthesis is still orders of magnitude higher compared with the cost of sequencing. According to the estimation by Meiser et al., storing 1 MB encoded data into DNA would cost around $800 to $500023,28,31 in which the cost of DNA synthesis makes up the major proportion.12,99 Yet, tape storage costs just $16 per TB.33 Antkowiak et al. had given a detailed estimation on each step of the DNA data storage workflow in 2020.100 The high cost greatly prevents DNA storage from becoming a commercial product.13,28,100 Nevertheless, DNA data storage is still considered to be one of the most promising long-term storage solutions for the future, as the cost of synthesis and sequencing keeps falling dramatically and consistently over the years.
![]() | ||
Fig. 4 A four-step cycle for the synthesis of oligonucleotides by solid-phase phosphoramidite chemistry method. ① Deprotection. The DMT group at the 5′ end of an oligonucleotide monomer is removed, and the hydroxyl group is exposed to start the reaction. ② Coupling. The desired free nucleotide monomer is attached to the 5′ end hydroxyl group of the previous monomer. ③ Capping. The unreacted 5′ end hydroxyl groups of the oligonucleotide are sealed to prevent unwanted strand extention. ④ Oxidation. Oxidation reagent oxidizes the linkage bonds between the coupled monomers to a more stable state. The cycle is repeated until the target sequences are achieved. Reproduced from ref. 94 with permission from Copyright 2014 Springer Nature. |
Step 1: deprotection. The 5′ end of each phosphoramidite monomer has a protecting group (e.g., dimethoxytrityl (DMT)). This protecting group prevents a monomer from the chemical reaction which couples another monomer. In the deprotection step, the protecting group is removed under certain chemical conditions (e.g., an acid environment created by trichloroacetic acid (TCA)), exposing the 5′ end hydroxyl group that allows the subsequent reactions to occur.
Step 2: coupling. The 5′ end hydroxyl group of the monomer in step 1 couples with a newly added 5′ end protected monomer by forming a phosphite triester.
Step 3: capping. The uncoupled 5′ end hydroxyl group of the monomer in step 1 is acetylated in the presence of N-methylimidazole to prevent unwanted coupling (e.g., strand extension) in the next cycle.
Step 4: oxidation. The unstable phosphite triester bond between the successfully coupled monomers in step 2 is oxidized to the more stable phosphate by an oxidizing agent (e.g., a mixture of I2 in pyridine/H2O/THF). The strand is then ready for the next four-step nucleotide extension cycle.
Between each reaction step, there is a washing process commonly by acetonitrile which rinses off the excess reagents to start the next reaction. By cycling the above four steps, bases are added one per cycle in the designated order until the target oligonucleotide sequences are achieved.
Since the 1980s, based on the aforementioned synthetic principles, the carrier linkers, reagents, and deprotection methods used in synthesis have been continuously optimized. Established industrial DNA solid-phase chemical synthesis uses controlled pore glass (CPG) or polystyrene (PS) beads filled in a synthesis column as the solid carrier for nucleotide strands.109,110 Automated synthesizers drive the synthesis reagents in a unidirectional flow into the synthesis columns by controlled air pressure or a peristaltic pump. Solenoid valves can precisely eject milliliter (mL) and even microliter (μL) levels of reagents. In the meantime, multiple synthesis columns are utilized simultaneously, allowing for parallel and crossover synthesis. After synthesis, the target products are separated from the carrier, and the terminal protecting group is removed under alkaline conditions. The treatment is generally concentrated ammonia, while liquids such as methylamine are also used. Finally, depending on the needs of the application, direct elution and purification steps are performed for subsequent use. Typically, purification techniques such as polyacrylamide gel electrophoresis (PAGE) or high-performance liquid chromatography (HPLC) can effectively remove the wrong strands from the initial products. The products within a single column correspond to the same target sequence, namely the yield of a single column is between 2 nmol and 1000 nmol. In the case of large-scale column-based synthesis, the throughput of common automated parallel synthesis ranges from 96–1536 oligonucleotide sequences.101,111,112
Chemical synthesis methods are already well-developed and have been industrialized for many years. However, a number of errors may occur during synthesis: (1) deletions (0.1%). It is caused by a failure in coupling a nucleotide when the deprotection is insufficient or the coupling efficiency is low, leading to a missing base in the designed sequence; (2) insertions (0.1%). This means that additional, unwanted nucleotides are added. When the terminal-protected phosphoramidites undergo unintended deprotection due to cross-contamination of the reagent or a wrong treatment, it results in unwanted active sites on the strand to-be-extended, leading to an insertion error; (3) substitutions (0.5%). It occurs when another base is coupled instead of the intended base, mostly induced by reagent contamination or incomplete washing between the synthesis steps. Eventually, because of the limitation of chemical reactivity, the usual stepwise efficiency of the chemical synthesis is only 95%–99.5%.100,113 In addition, depurination may occur during deprotection, meaning that excess acid causes loss of purine bases, leading to hydrolysis of the DNA strands and ultimately to the decrease of yield and purity of the targeted products.114–116 As the depurination damage may occur in each synthesis cycle, it may accumulate as the sequence extension continues. Thus, the longer the synthesis sequences, the greater the probability of depurination would be. This greatly limits the yield of long products by chemical synthesis as well.109 Totally, a high error rate leads to low yield (yield = coupling efficiency(length−1)), where the coupling efficiency indicates the proportion of correct monomers added at each synthesis cycle. Clearly, the longer the strand, the lower the yield obtained due to the accumulation of errors. A small numerical increase of the coupling efficiency would lead to a great increase in the yield of the full-length product. For example, when the desired strand length is 200 nt, the full-length yield obtained by 99.3% coupling efficiency is only 24.7%, while 99.9% coupling efficiency corresponds to 81.9%. The length of the nucleotide strands synthesized based on phosphoramidite chemistry is usually limited to 200 nt and no commercial product has been yet announced to exceed 300 nt in practice without DNA assembly.92,117 Therefore, to synthesize a long target nucleotide sequence chemically, the sequence has to be split into short fragments of 50–100 bases, which are synthesized respectively, and then assembled. At present, in addition to producing short fragments such as primers and probes, the synthetic biology industry has begun to explore synthetic genomes and DNA data storage, placing higher demands on the length and quality of synthetic DNA. There are several concerns about the practical aspects that may hinder further industrialization of utilizing chemical DNA synthesis for data storage. For example, phosphoramidite chemistry requires demanding laboratory environment maintenance (e.g., strict humidity and inert atmosphere control), expensive reagents (e.g., phosphoramidites, whose price is up to 81 US$ g−1), long synthesis cycles, and generates large amounts of hazardous waste (e.g., acetonitrile which contaminates water and soil, pyridine and furan that are harmful mammal nervous system) during the synthesis process.118 In large-scale chemical synthesis, the purchase of raw materials, the waste stream, as well as the post-processing (e.g., purification and assembly) all contribute to tremendous costs that strongly limit the industrialization of DNA data storage. Although similar cost and environmental concerns also exist in the manufacturing industry of semiconductor memory chips, the technical limitations of chemical synthesis are becoming more and more critical.
A main advantage of enzymatic DNA synthesis is that it is carried out under mild aqueous conditions which effectively reduces DNA damage such as depurination and results in fewer by-products, making the achievement of longer target nucleotide strands possible. TdT-based enzymatic DNA synthesis has been extensively pursued by numerous companies (a detailed description is provided in Section 5). As reported, the coupling efficiency of enzymatic DNA synthesis can exceed 99%, which is comparable to phosphoramidite chemistry,125 and a coupling efficiency of about 99.9% calculated from the overall yield of 85% for 300 nt full-length sequences has been advertised.126 In terms of synthesis speed, the time of coupling a single base by phosphoramidite chemistry is about 4–10 min, while enzymatic synthesis by the TdT–dNTP conjugate can take only 10–20 s.91,92,127 Notably, the synthesis speed of different enzymatic synthesis pathways varies greatly.91 It's clear that enzymatic synthesis has the potential to produce longer strands with high accuracy at faster cycle times compared with chemical synthesis.92,122 Currently, the TdT-based enzymatic synthesis routes are being enthusiastically investigated and the mechanism of the enzyme is well studied. However, there are several issues that are necessary to be considered carefully: how to add nucleotides in a controlled and precise manner? What moieties are used to modify the monomers? How much do the unreacted initiators contribute to the deletion error rate? What is the probability of side reactions occurring in the synthesis process? What level of scale can be achieved for target products?122 How to improve the enzyme activity of the native TdT on 3′-end blocked dNTPs? In addition, further exploratory improvements in enzyme engineering and optimization of enzyme cycle reactions, etc., are still needed for large-scale industrialization. For example, Lu et al. demonstrated a two-step cyclic synthetic route using an engineered Zonotrichia albicollis (ZaTdT) enzyme with an average stepwise coupling efficiency of 98.7% for extending single nucleotides, which has some potential applications. The catalytic activity of this engineered enzyme was 3-fold higher than that of the normal TdT enzyme.128 Verardo et al.129 from DNA Script recently reported their approach to large-scale industrialization of TdT-based enzymatic DNA synthesis, which will be discussed in detail later in this review. This is a significant step towards the industrialization and parallelization of enzyme synthesis.
There are two strategies for improving the throughput of DNA synthesis: one is to simply increase the number of channels for the above column-based synthesis and expand the scale of parallel synthesis; the other is to increase the synthesis density and miniaturize the system as a whole. Miniaturized array-based synthesis allows more sequences to be synthesized in parallel in a limited space while reducing the amount of consumed liquids. The scale of the products at a single site in the array is much lower than that in the column. Furthermore, array-based synthesis costs only $0.00001–$0.0001 per base, while column-based synthesis costs $0.05–$0.10 per base which is 2–4 orders of magnitude higher.94,101 Array-based DNA synthesis is oriented to the synthetic biology field of gene splicing, library building, and other applications that require trace level (e.g., fmol) as well as multiple sequences. The automation and continuous miniaturization of the instrument further enhance the throughput of array-based synthesis, which precisely provides a more suitable platform for DNA artificial synthesis whose application is data storage.
Fig. 5 shows the density of synthetic arrays required to achieve high-speed writing of large amounts of data. The total amount of data written per unit area can be calculated using the following equation:
C = Eυριt |
Assuming that the amount of data shown in the figure needs to be achieved over an area of 1 square centimeter (cm2), and if a base can be encoded as 2 bits, a single synthesis site effectively encodes a nucleotide length of 100 nt, and 1 base could be synthesized at a rate of 1 base per second, then, to achieve TB (1 TB = 240 B) level data writing in one day, the scale of the array sites needs to be below the submicron level. However, current coding density is only able to reach 2 bits per base pair.11–15 What's more, the “encoding” and “decoding” steps also lead to errors. To restore the original data, in addition to the information-containing fragments, a certain length of data redundancy sequence needs to be added to the synthesized DNA strand. This requires the length of the synthetic sequence to be longer than the effective coding sequence.35 In sum, the above description implies that a much higher array density is required to achieve the TB level of data per day. To achieve such high-density arrays, micro and nanochips based on integrated circuit fabrication are the most optimal strategy.
Here, we aim to list and evaluate the diverse technological routes of utilizing integrated micro and nanoscale chips for DNA artificial synthesis. By weighing the pros and cons of each unique route, we hope that this review could provide a basic perspective on the trends in high-throughput DNA synthesis.
![]() | ||
Fig. 7 Schematic diagram of inkjet printing synthesis platform. (a) A program controls the motion of the inkjet print heads and prints trace amounts of phosphoramidite reagents on the slide surface.143 The slides are packed with tens of thousands of reaction chambers. Each of them can carry out a conventional four-step synthesis of phosphoramidite chemistry. Reproduced from ref. 143 with permission from Copyright 2013 Elsevier. (b) Twist's silicon-based DNA Synthesis platform. There are thousands of clusters on the chip, each consisting of 121 surface sites, performing different sequence synthesis.146 |
Agilent is the forerunner in commercializing inkjet printing technology and has been a leader in the synthesis of long oligonucleotides in the past decade. In 2001, Timothy R. Hughes et al. first used inkjet printing technology to synthesize 25000 oligonucleotide strands with a length of 25 nt on a single 25 mm × 75 mm chip (glass wafer) with a coupling efficiency of 94–98%.144 Currently, Agilent's advanced SurePrint platform uses a proprietary, non-contact industrial inkjet printing process in which oligo monomers are deposited uniformly onto specially prepared glass slides, enabling high-fidelity, high-throughput parallel DNA synthesis of up to 244
000 oligonucleotide strands, simultaneously. It has industry-leading fidelity of up to 1
:
2400 and allows synthesis of long oligos of about 230 nt.142
Emily Leproust et al. (Co-Founder of Twist Bioscience) have developed “a proprietary semiconductor-based synthetic DNA manufacturing process” that uses a high-throughput silicon-based platform to miniaturize the chemical reaction conditions required for DNA synthesis. This miniaturization platform can reduce the reaction volume by a factor of one millionth, while increasing the throughput by a factor of 1000 and even up to 696000. The chip synthesizes oligonucleotides on specially treated micron-sized through-holes and uses high-speed inkjet print heads to deliver trace amounts of reagents. The chemical reaction size is dramatically reduced from 15 μL in a 96-well plate to 10 pl on the silica-based platform.145
Since inkjet printers consume a low amount of reagent and produce relatively little waste when running, they make the synthesis method more environmentally friendly. This chip is stated to have the ability to synthesize 9600 genes on a single in silico chip, meanwhile, traditional synthesis methods using 96-well plates can only produce one gene with the same length (up to 300 nt per oligo) of physical space. Each chip contains thousands of discrete clusters and each of the clusters contains 121146 individually addressable surfaces that are capable of synthesizing one type of unique oligonucleotide sequence, enabling high-throughput synthesis of millions of oligonucleotide sequences in the length range of 120–300 nt with yields exceeding 0.2 fmol. The average error rate is up to 1:
3000 nt. Twist has announced a milestone technical achievement in the successful synthesis of 200 nt oligonucleotide strands on a chip for DNA data storage. In 2021, twist announced its ability to synthesize DNA on a silicon chip with sites spaced 1 micron apart. This is so far the highest synthesis site density for inkjet printing synthesis.147
Recently, Verardo et al. integrated an inkjet printing system with enzymatic DNA synthesis and have achieved a synthesis length of 21 nt with an estimated cycle efficiency of 98.9%, while allowing for parallelization of more than 2000 sites.129 Thanks to inkjet printing, the reagent consumption per cycle was reduced to low quantities at the micromolar level. They also optimized the nozzle setup to improve the printing performance. Additionally, they developed a low-viscosity ink that managed to avoid damaging the enzyme activity, and optimized additives in the ink to prevent evaporation and minimize secondary structure formation of the ssDNA.
At present, the synthesis of DNA by inkjet printing is the mainstream choice for commercial products. Further increasing the synthesis throughput requires reducing the synthesis site size as well as the nozzle size. However, to ensure precise delivery of low-volume reagents and avoid splashing during injection, the nozzle size cannot be continuously minimized, which limits the increase of synthesis density. Generally, in piezoelectric printing, the droplet volume can be precisely controlled by adjusting the drive voltage and pulse waveform to achieve the pl level. The distribution and spacing of the nozzles affect the uniformity and actual volume of the droplets. When the distance is close, satellite droplets will be produced to interfere with the neighboring sites. Surface tension influences reagent dispersion, and the viscosity of the reagents affects the average size of droplets or even prevents droplet generation. In addition to this, problems such as low droplet orientation, nozzle plugging, wettability of the nozzle inner, and the nozzle-to-substrate height are present. Developing and maintaining smaller and more complex printing equipment may be a challenge both technologically and financially.
Evonetix is the representative company for heat-controlled chip-based DNA synthesis technology. Andrew J. Ferguson et al. managed to control the synthesis process by precisely adjusting the temperature of reaction sites, combined with a microfluidic system. They developed silicon chips by the semiconductor MEMS (micro-electromechanical system) technology to provide thousands of independent temperature-controlled reaction sites for high-throughput parallel DNA synthesis, including error checking to improve yields and the eventual assembly of the as-obtained dsDNA.136,148 The entire reaction processes are carried out at the reaction sites (called “virtual wells”) in a continuously flowing liquid system with thermosensitive reagents. Each heating site on the chip has a diameter of 100 μm and a space of 300 μm resulting in approximately 10 heaters per square millimeter.
Under the control of computer programs, thousands of sites can be independently activated and warmed to start the independent DNA synthesis cycles, respectively. The closed-loop thermal control system allows liquid in each virtual well to reach different temperatures within the same circulation system and avoids the thermal diffusion on each site that happens with an array of conventional heaters. Temperature sensors at the sites feed the actual temperature back to the computer system, and, then, an algorithm compares it with the target temperature to determine whether it needs to be warmed up or cooled down. This requires very precise scaling circuitry and algorithmic programming. To achieve both “warming & cooling” functions, the material with controlled thermal resistance is installed underneath the site, which draws heat from the site to achieve a cooling effect.149 As shown in Fig. 8a, firstly, the circuitry controls the generation of heat at the activated sites. The heat transfers to the liquid above and, as a result, the temperature-sensitive protecting groups are removed.137 Subsequently, a new monomer can be added to each oligonucleotide strand at the activated sites. The cycle of heating and extension is repeated until the target oligonucleotide strands are synthesized. After that, with the help of precise flow pumps and electromagnetic fields, the short ssDNA fragments are selectively released by heating and are transferred to the partner strands with complementary base sequences immobilized on the substrate. In this way, long dsDNA can be automatically assembled on the chip. In addition, mis-matched double strands are identified, once the oligos are annealed because they have a lower denaturation temperature than the desired DNA. Subsequently, unwanted DNA strands are removed by applying precise, sequence-dependent temperature followed by flushing liquid (Fig. 8b). The error correction and purification processes can minimize polluted fragments in the product and help to provide a higher yield. Finally, the successfully matched oligos continue to assemble into longer dsDNA by complementary pairing at the terminal (Fig. 8c).
![]() | ||
Fig. 8 Schematic of thermally controlled oligonucleotide synthesis.103 (a) Thermally controlled strand extension process. The temperature-sensitive protecting group is removed by heating the selected site (site 1). The protecting group may alternatively be Boc, Fmoc, Bsmoc, and more examples could be found in ref. 137. Then, free oligonucleotide monomers are added onto the strand terminal. The cycles of heating and extension are repeated until the desired ssDNA fragments are achieved. (b) Thermally controlled cleavage and error-correction process. Deprotection and cleaving occur at different temperatures. The ssDNAs are released from site 3 by heating and then migrate toward partner strands with complementary base sequences which are immobilized on site 1; the mis-matches can be cleaved by applying a precise temperature during annealing and eventually washed away with the flowing liquid. (c) Thermally controlled assembly process. By heating site 5, the short dsDNAs are released and combined with another dsDNA (site 6) by the principle of complementary base pairing to assemble a longer strands; Heating site 4, short-stranded DNA continues to assemble at site 6. Those processes continue to produce desired long dsDNAs with high yield. Reproduced from ref. 103 with permission from Copyright 2023 Springer Nature. |
It is claimed that this technology platform is compatible with chemical and enzymatic DNA synthesis methods. However, it also faces some challenges. For example, appropriate protecting groups are selected according to the type of activating agent used in the heating step. When the activator is acidic (e.g., trifluoroacetic acid), tert-butyloxy carbonyl (Boc) or trityl (Trt) is mostly used. When the activator is basic (e.g., morpholine or piperidine), (1,1-dioxobenzo[b]thiophene-2-ylmethyloxycarbonyl (Bsmoc)) is preferable.137 It is challenging to develop highly temperature-sensitive protecting functional groups. Another serious difficulty is how to independently and precisely control the thermal behavior of micron reaction sites on the chip. To ensure the efficiency of synthesis, it is important to consider the approach that can help the generated heat dispersed evenly around the reaction site without conducting to the gap region or the adjacent sites. To efficiently conduct heat, Evonetix has developed a cooling system that consists of fluid flowing coolant, a thermoelectric cooler, and a copper substrate glued to the back side of the chip. Besides, there are other technical difficulties, such as: which microfluidic system to choose and what is the optimal flow rate? How to control the behavior of DNA under different thermal conditions? How to manufacture precisely assembled silicon wafer modules and avoid the risk of wafer explosion at the weakly bonded area during heating? How to prevent the chip from corroding when it is immersed in the strong acid/alkali reagent at a high temperature?
Mask-based photolithography synthesis refers to the transmission of light through specifically designed physical masks placed over the synthesis surface. Light is only allowed to pass through the transparent area of the mask, and be projected onto the substrate at certain locations.154 Affymetrix's commercial product GeneChipTM represents a mask-based photolithographic in situ synthesis (Fig. 9a).155 This technique typically produces 20–25 nt oligonucleotide strands and more than 106 feature sites per chip. With the development of photolithography process, the feature size of each chip has evolved from 50 μm to 20 μm, 18 μm, 11 μm and eventually down to 5 μm on a 1.28 cm × 1.28 cm chip in 2005.139,156 Subsequently, it was found that a further reduction of feature size of the chip to 1 μm with densities up to 1 × 108 cm−2 was proven to be promising by simulation with a reasonable control of the diffraction.139 However, a unique mask is needed for almost each cycle of nucleotide strands extension. For long sequence synthesis, the mask photolithography method requires a large number of custom-made mask plates, which dramatically increases the cost of synthesis.
![]() | ||
Fig. 9 Schematic diagram of two photochemical methods of DNA synthesis. (a) Mask-based photolithography synthesis.155 Top: Mask-based photolithography. UV light passes through a lithographic mask that acts as a filter to either transmit or block the light from the chemically protected microarray surface (wafer). The sequential application of specific lithographic masks determines the order of sequence synthesis on the surface. Bottom: Chemical synthesis cycle. UV light removes the protecting groups (squares) from the array surface, allowing the addition of one nucleotide. The sequential synthesis cycles result in multiple 25-mer probes on the array surface. Reproduced from ref. 155 with permission from Copyright 2015 Elsevier. (b) Maskless photolithography synthesis.152 Top: DMD. The 365 nm UV light from an LED is uniformly projected onto the DMD. Digital micromirrors in the “ON” state reflect the light onto the surface of selected synthesis sites. Bottom: The cycles of phosphoramidite synthesis with a Bz-NPPOC protecting group at the 5′ end that is used in this method. Reproduced from ref. 152 with permission from Copyright 2021 Oxford University Press. |
Similar to modern projection techniques, maskless lithography synthesis uses a programmable and addressable digital micromirror device (DMD) instead of chrome masks (Fig. 9b).152 The light path is reflected to the designated synthesis sites for the deprotection reaction by adjusting the angles of each of micromirrors.106,109,135,157 Roche has developed a series of commercial products with densities exceeding 106 oligonucleotides per cm2 using DMD to control the angle of aluminum lenses. LC Sciences' μParafloTM synthesizer uses DMD to irradiate the photocatalyst to produce acid, followed by the removal of the acid-sensitive protecting group DMT. Their microfluidic platform delivers extremely low amounts of reagents. In a single 1.4 cm2 chip that contains 4000 reaction chambers corresponding density is 2857 chambers per cm2, and each of them requires only 270 pl of reagents, remarkably reducing reagent consumption.158,159 Plus, compared with mask-based photolithography synthesis, digital photolithography synthesis does not require expensive, specifically designed and case-sensitive masks so that the synthesis cost can be reduced.
Photochemical synthesis method ensures that each reaction chamber can synthesize nucleotides independently by controlling the light accurately, which brings great possibilities for increasing the throughput. So far, a standard high-definition DMD (1920 × 1080 dpi) can synthesize up to 786432 sequences, simultaneously.160 The higher resolution enables more precise control of light to reduce diffraction and cross-contamination, contributing to a significant boost in synthetic throughput. However, the maskless lithography synthesis method has some other drawbacks that may diminish its corresponding efficiency, yields, and fidelity. The physical properties of light limit the minimum size and spacing of individual synthesis sites, hindering further miniaturization of the synthesis system. For example, light diffraction and scattering from micromirrors may cause loss of contrast at pixel edges, resulting in unintentional exposure and cross-contamination of the neighboring sites. Unfortunately, diffraction is an inevitable problem in all optical systems. This limits the gap size of pixels to the micron scale. In the case of micromirror array, when the micromirror spacing is reduced to around 1 μm, the average total error rate increases sharply to 21.8% per bp.152,160 Another defect is the local flare caused by refraction along the light path, bubbles in the solvent, or the interface between solvent and surface. They may all lead to unintended exposure.161 Furthermore, high-resolution systems are more sensitive to global scattering, and require higher complexity of the equipment which increases the equipment size and development cost. Efforts on optimizing the micromirror distribution, light irradiation time, capping time, protecting groups and solvents may help enhance the quality of photochemical synthesis.154,162
CustomArray, now part of GenScript, developed independently controllable microelectrode arrays based on complementary metal–oxide–semiconductor (CMOS) integrated circuit chips. These microelectrodes are treated with a porous reaction layer (sucrose) to improve the quality of nucleotides synthesis.164 To confine the diffusion of proton acid from the activated electrode sites to the neighboring ones, an opposite potential is applied to the electrodes around the synthesis sites to trigger a reduction electrochemical reaction that produces bases to neutralize the excessive acid.165 Their 12 K microarray chip product has a circular electrode diameter of 44 μm and can synthesize 12472 oligos. The 90 K chip offers synthesis throughput of 92
918 and oligonucleotide libraries up to 170 nt in length with an error rate of less than 0.5%, and the electrode size is further reduced to 22 μm. On the company's website, it is announced that this is the highest density commercial oligo-synthetic chip at present, with a throughput of 8 million oligos per chip, and the number may reach 200 billion, potentially.166 In addition, the cost is affordable at less than $0.2 per base and the yield of each oligo is up to 1 fmol.166,167 Their chip products are starting to be used in DNA data storage research, which may bring the cost of data storage down to $50 per TB.24,168
Similar approaches were recently studied by Microsoft Research and University of Washington. They have achieved a parallel synthesis of arbitrary sequences of DNA at submicron scale, increasing the synthesis density by three orders of magnitude compared with existing products. The electrodes are 650 nm in diameter and the corresponding pitch length is 2 μm. According to the density of electrodes, 2.5 × 107 oligonucleotide strands are theoretically synthesized in parallel on a 1 cm2 area, which meets the electrode density required for data storage speeds of megabytes per second that we estimated in Fig. 5. In addition, the synthesis length is up to 180 nt, tripled than previous electrochemical microarray-based DNA synthesis methods.116,132 Furthermore, the total cumulative error rate including deletions, insertions, substitutions ranges from 4% to 8%, which is still within the 15% tolerance of DNA data storage technology employing an error-correcting system.14,100,133,169 They designed a special electrode array (Fig. 10c) to resolve the crosstalk problem among adjacent reactors. The synthesis sites (circular-shaped anodes) are at the bottom of a nanowell structure, where deprotection and coupling steps occur. One anode electrode is surrounded by four cathode electrodes (diamond-shaped) applying an opposite potential. As reported, the deprotection step involved the addition of methanol to acetonitrile in a ratio of 1:
9, resulting in the generation of alkaline species that consumed the protonic acid at the cathodes and completed the electrochemical half-reaction. The alkaline methoxide anion chemically confines the acid within the synthesis sites region effectively, preventing unwanted deprotection at the sites which are supposed to be “off” during a synthesis cycle. Additionally, the deep nanowell also provides a physical barrier to limit the acid cross-contamination.
![]() | ||
Fig. 10 An overview of the electrochemical DNA synthesis. (a) Schematic diagram of the electrochemical synthesis of nucleotide strand on an electrode. ① A positive potential is applied to the electrode, producing a protonic acid to remove the DMT protecting group and exposing the “–OH” to start the next cycle. ② A free phosphoramidite monomer with a protecting group (DMT) at its 5′ end is coupled to the “–OH” on the electrode/previous nucleotide. ③ The newly formed phosphite backbone linkage is oxidized to the more stable phosphate by an oxidizing agent. ④ The capping reagents seal off “–OH” groups that are not coupled to the monomer, making them unavailable for subsequent reactions. (b) An example of reaction of redox pairs at electrodes during the electrochemical deblock step. The anode undergoes an oxidation reaction to generate protons; the cathode undergoes a reduction reaction that consumes protons. Reproduced from ref. 133 with permission from Copyright 2021 AAAS. (c) (I) Cathodes (diamond-shapes) are connected together (dashed line) while four anodes (circle-shapes) of the same color connected together (solid line). (II) SEM image of a nanoscale electrode array. The 650-nm anodes with the pitch length of 2 μm are sunk in a 200-nm deep well and surrounded by four counter electrodes. (III) A fluorescent image of the array in (II) after parallel synthesis of two different sequences with different fluorophores. The clear demarcation of the different fluorescence proves that the acid generated by the electrodes is strictly confined and demonstrates independently controlled parallel synthesis. Reproduced from ref. 133 with permission from Copyright 2021 AAAS. |
Further shrinking electrode feature size and shortening electrode pitch are effective solutions to greatly increase synthesis density and throughput. Typically, electrode sizes based on advanced semiconductor manufacturing technologies can reach submicron or even nanometer scale. It is relatively feasible to prepare ultra-dense micro/nanoelectrode arrays. In other applications, researchers have succeeded in narrowing down the diameter of micro-electrodes to 100–200 nm or even 10 nm, and the pitch of electrodes to 750 nm.170–172
However, the risk that the acid diffuses to neighboring electrodes raises at a higher density of electrodes.165,173 This results in unwanted deprotection on the surface of adjacent electrodes, which increases error rates and reduces synthesis yields. Currently, the biggest technical challenge in electrochemical DNA synthesis is to strictly confine the acid produced around the activated microelectrodes and prevent it from diffusing to the adjacent electrodes. A compromise between the synthesis density and the ion diffusion must be studied before a breakthrough technology that can solve the conflict appears.116,132,163,174
Although the aforementioned array-based DNA synthesis technologies have improved the throughput by several orders of magnitude over the traditional column-based synthesis, their capabilities are still not yet ready for applications such as DNA data storage. Each of these technologies faces its own challenges to substantially increase throughput, reduce costs and speed up synthesis while ensuring appropriate coupling efficiency: (1) for inkjet printing, the size, complexity and cost of piezo printheads are considered as the crucial limit. (2) The size and spacing of heater units and precise control of heat become an obstacle for thermal synthesis techniques. (3) Photochemical methods struggle with the inherent diffraction and refraction of light. Novel developments in applied physics, such as the plasmonics, may bring disruptive technological innovation to overcome the physical constrain of the chip size. It has been demonstrated that metallic nanostructures can generate localized surface plasmons when they couple to electromagnetic waves, resulting in thermoplasmonics effect like localized heating,175 or subdiffraction-limited spatial resolution in optics,176 which might shine a light in further reducing the working units of chips for thermal or photochemical DNA synthesis. (4) Electrochemical synthesis has to overcome the proton acid diffusion and crosstalk effects between electrodes (Table 1). For example, technologists have proposed new ways in generating ions or protons, e.g., by using ion-releasing materials as a working electrode that releases protons directly instead of via the redox reaction in the solution, which may hold potential to further localize the protons and enable even higher synthesis density.177
Synthesis method | Merit | Challenge | Ref. |
---|---|---|---|
Inkjet printing | • Low volume reagent and cost; | • Relatively low throughput; | 109, 143–145 and 148 |
• Parallel synthesis without sacrificing fidelity; | • Limited positioning accuracy; | ||
• Highly commercialized. | • Large surface tension of a liquid droplet. | ||
Thermal | • High throughput up to 100 million; | • Emerging heat-sensitive reaction system; | 32, 104, 137 and 138 |
• Difficulty in controlling local heat transfer; | |||
• Highly complex process; | |||
• Simultaneous assembly and error correction. | • Relatively immature, more R & D is expected. | ||
Photolithography | • The photolithography resolution is high; | • Inherent light diffraction, scattering and other phenomena; | 94, 135, 136, 153, 166, 186 and 207 |
• Highest commercialized synthesis density; | • Difficult to reduce the size further; | ||
• Technology is relatively mature. | • Expensive masks and complex system. | ||
Electrochemical | • Utilize highly integrated circuit manufacturing techniques; | • Acid diffusion crosstalk; | 26, 92, 133, 134 and 164 |
• Ultra-high synthesis throughput; | • High integration and process requirements. | ||
• Further miniaturization possibilities. |
The initial research focus was on the modification of the 3′-O-end of the nucleotides. A reversible “terminator” group is added to the nucleotide monomer, which interrupts the synthesis and prevents the next nucleotide from being coupled (Fig. 11a).91,119,122,152,179 A variety of reversible terminators are available, such as 3′-O-NH2-dNTP (DNA Script and Nuclera),180–183 3′-O-azidomethyl-dNTP184,185 (Molecular Assemblies), 3′-O-(2-nitrobenzyl)-dNTP (Camena Bioscience),186 3′-phosphate187 (Codexis188) or photo-cleavable nucleotides189,190 and other protecting groups.191,192
![]() | ||
Fig. 11 Three typical TdT-based enzymatic DNA synthesis routes. (a) The use of TdT to mediate synthesis of modified dNTP.91 Normal dNTP is modified with 3′-O-blocking group. Step 1: extension: TdT catalyzes the specific dNTP to couple with the primer. The blocking group can effectively prevent the next cycle of extension. Step 2: deprotection: the 3′-O blocking group of the primer that has been extended is removed to start the next nucleotide coupling. Reproduced from ref. 91 with permission from Copyright 2021 Frontiers. (b) Scheme for TdT–dNTP conjugate mediated two-step oligonucleotide extension.92 A short DNA primer is immobilized to a solid-phase support for the conjugate to begin synthesis. The TdT–dNTP conjugate consists of a TdT tethered with a dNTP via a cleavable linker. Step 1: extension, the TdT–dNTP conjugate incorporate into a primer. After incorporation of the tethered dNTP, the 3′ end of the primer remains covalently bound to TdT and is inaccessible to other TdT–dNTP molecules. Step 2: deprotection, the linkage between TdT and the incorporated nucleotide is cleaved, releasing the primer and allows subsequent extension. Reproduced from ref. 92 with permission from Copyright 2018 Springer Nature. (c) Schematic depiction of competitive reaction between TdT and apyrase (AP) enzymes. First, oligonucleotide initiators (N, gray) are tethered to a solid support. During synthesis, TdT catalyzes nucleoside triphosphate to the 3′ end of the initiators, whereas apyrase terminates the coupling cycle to prevent excessive extension. A wash step after synthesis is necessary to remove byproducts.127 Reproduced from ref. 127 with permission from Copyright 2019 Springer Nature. |
In 2020, DNA Script launched the world's first desktop DNA printer “SYNTAX” based on enzymatic synthesis, enabling automated synthesis, purification and quantification. The platform uses resin as a solid-phase carrier, and a cleavable linker DNA molecule (called “initiator DNA (iDNA)”) is attached to it. The entire synthesis process is completed in two steps. Firstly, the engineered TdT adds a modified monomer (3′-O-NH2 dNTP) to the iDNA.180,181 Then, the terminator group at the 3′ end of the monomer is removed to initiate the next DNA extension. The two steps are repeated until the target sequence is synthesized. In detail, 3′-OH modified monomers (also known as reversible terminators-dNTP) are employed to prevent enzymes from conjugating multiple monomers to extend DNA strands at each cycle. The reversible terminators can be removed by a mild acid buffer reagent without contaminating the final product. Meanwhile, the TdT is engineered in vitro by DNA Script to catalyze rapid and selective addition of monomers to iDNA, maintaining high fidelity and coupling efficiency. In addition, it allowed for longer synthetic length. They successfully synthesized 360 nt oligonucleotide strands and the stepwise efficiency was up to 99.5%. However, the engineered polymerase currently takes about 5 minutes to add a modified monomer, while native TdT is efficient with unblocked nucleotides and is capable of adding a monomer within 10–20 seconds.119,193 Better engineered enzyme variants for blocked nucleotides are thus desired for improving the efficiency of controlled enzymatic synthesis. Recently, Verardo et al. reported the first demonstration of multiplex enzymatic synthesis of DNA with single-base resolution, showing the possibility of microspatial control of nucleic acid sequence and length and a longer length than nonmultiplexing studies. Besides, they used a silicon MEMS platform to synthesize 50 nt sequences with spot density up to 10560 on a 75 mm × 25 mm slide, and the cycling efficiency was estimated to be 98.9% based on 21 nt long synthesized sequence.129
Molecular assemblies developed a novel enzymatic method of DNA synthesis. In their pioneering fully enzymatic synthesis™ (or FES™) technology, a high-performance engineered TdT, i.e., calf thymus TdT sourced from engineered E. coli was used for synthesis.194,195 More than 25% of the enzyme's amino acid sequence is altered, enabling industry-leading coupling efficiency, accuracy and speed at high temperature.194,195 This new enzymatic synthesis technology allows for rapid synthesis of ultra-long sequence-specific DNA on demand, with an in-process purification process that selectively removes erroneous sequences rather than waiting until the entire reaction is completed, ensuring yields while saving time. Michael J. Kamdar, President and CEO of molecular assemblies announced on website in 2018 that they were the first industrial group to complete DNA data storage by enzymatic DNA synthesis.196,197,198 They recently delivered the first enzymatically synthesized oligonucleotides using the new technique FES™.199
Nuclera' eDropTM platform is based on thin film transistor (TFT) materials powered by electrical wetting effects. The software-controlled electronic signals modify the interaction between droplets and surfaces. Their high-resolution device provides precise control of droplet manipulation, enabling a high level of parallelization and generation of small volumes (tens to hundreds of nanoliters) of droplets. They have utilized the nitrite-mediated deprotection of the 3′-O-aminooxy reversible terminator to convert the aminooxy moiety (–ONH2) to the hydroxyl moiety (–OH) in the presence of acid, combined with engineered TdT. At the basal pH of the system, the nitrite deprotection solution is inactive while the modified TdT is active. When the system pH is lowered to the point where the deprotection solution is active and the reversible terminator is removed, the modified TdT is inactive. Subsequently, when the system pH is back to basal, protected monomers are added to the 3′-terminal of the oligo strands at the selected sites by the active modified TdT. This strategy prevents extension of the released “–OH” groups prior to addition of the next nucleotide.182 Based on the advantages of this platform, Nuclera is investigating novel enzymatic DNA synthesis techniques that promise to automate efficient DNA synthesis. Similar to DNA Script's monomer deprotection strategy, the reaction is fast and highly specific because of the enhanced nucleophilicity of the 3′-O-aminooxy group. Although there is a risk of unwanted side effects such as acid-induced depurination, it can be suppressed by pH adjustment and addition of certain salts (e.g., Mg2+, Na+, spermine).114 Another concern is that the nonspecific nitrosation of base leads to a much longer time for deprotection,180 which would affect the speed of synthesizing long DNA strands. This can be solved by using non-nitrosation methods instead.129,200
Ansa Biotechnologies aims to commercialize artificial enzymatic DNA synthesis by controlling the release of TdT enzyme from the 3′ end of the DNA based on their published technique.92 Instead of modifying the enzyme itself, they conjugate each TdT with a single dNTP using a reversible linker.201 This offers a much faster synthesis than several other enzymatic synthesis methods involving free monomers: the coupling time of using TdT with reversible terminator is around 60 min179,189 while for template-dependent polymerase mediated oligo synthesis it is 1 min.184 The utilization of TdT–dNTP conjugates allows for a reduction in reaction time to 10–20 seconds.92 During synthesis, the conjugate adds the monomer to the exposed 3′ end of the primer nucleotide, and the enzyme remains attached to the 3′ end to prevent the addition of other monomers (Fig. 11b).92,125 This strategy requires a higher amount of enzyme, but typically only consumes ∼11 μM of the enzyme–substrate conjugates in each cycle as stated, while the concentration of the monomers need for phosphoramidite method is 9000 times higher.92 In this sense, it is still cost-effective compared with other techniques that require high concentrations of expensive modified nucleotides.125 The company recently announced the successful de novo synthesis of the world's longest DNA oligonucleotide whose length can reach 1005 nt. The proportion of full-length products reached 28%, indicating an industry-leading average stepwise yield of approximately 99.9% during synthesis.202
Due to enzyme's general nature of high catalytic activity, the native TdTs are able to add unblocked nucleotides extremely quickly. Therefore, it's a huge challenge to control the synthesis of individual bases. For specific long sequence synthesis, enzyme engineering and modification monomer strategies are necessary to ensure coupling efficiency. However, for applications with high fault tolerance (e.g., data storage), a modification-free strategy is feasible. Kern Systems, founded in 2019, used an enzyme competition strategy:127,203 Two enzymes, TdT and apyrase, bring two opposite effects on nucleotides (Fig. 11c).127 TdT adds new nucleotides to the DNA strand, while apyrase degrades the remaining nucleotides, preventing them from being added to the existing strand redundantly. TdT, apyrase, and short oligonucleotide initiators are mixed in different concentrations into a synthesis system. By tuning the ratio of the concentration of the two enzymes, the TdT adds a limited number of the nucleotides with the same base to the extending strand in each synthesis round, until this batch of nucleotides are degraded by apyrase. At the same time, the company uses a non-exact synthesis co-coding technology that relies on redundancy and error correction mechanisms. This strategy suits the application of DNA data storage well, because in this way the validity of the data is not merely dependent on the exact sequence, which relieves the pressure in ensuring the synthesis accuracy. Although this strategy results in longer synthesis lengths that may lead to a compromise of volumetric storage density, and it puts higher demands on the coding system as consecutively identical bases are not permitted in coding, it is nevertheless an inspiring solution for the application of the enzymatic synthesis in DNA data storage.
In addition, Camena Bioscience uses a proprietary enzyme combination to achieve template-free DNA synthesis with trinucleotide building blocks. Their novel de novo synthesis and gene assembly technology-gSynthTM are able to synthesize 300 nt DNA strands and successfully produce a 2.7 kb plasmid. For the synthesis of 300 nt fragments, on average, the yield of gSynthTM full-length product is 85.3%, approximately indicating a coupling efficiency of higher than 99.9%. While for phosphoramidite synthesis, the yield is only 22.3%, which corresponds to a coupling efficiency of only 99.5%.126,186,204
An overview of the present technologies of DNA artificial synthesis is summarized in Table 2. Compared with phosphoramidite chemistry method, enzymatic synthesis technology uses aqueous reagents most of which are non-toxic, requires less purification process, and enables faster synthesis and large-scale synthesis of longer strands. Some reagents, such as the cacodylate buffer, still require careful disposal as they could be toxic for organisms at a high dose. Nevertheless, enzymatic synthesis generally offers a more cost-effective and environmentally friendly approach. Although enzymatic synthesis is still at its early stage where the “high-throughput” is mostly realized at multi-well plate scale, it is foreseeable that further optimization of enzymatic methods combined with the semiconductor microchip technology will likely drive the large-scale oligo synthesis technology into the next generation. There are already pioneer works: Light-mediated template-free TdT enzymatic synthesis provides a new strategy for accurate stacking of single nucleotides;179,189 Lee et al.205 proposed a strategy for DNA synthesis using array chips based on the enzymatic principle, which achieved point control of TdT enzymatic activity by coupling synthesis on a chip with a DMD platform. In 2023, Smith et al. reported a polymerase-nucleotide conjugate which works as a protecting group and is electrochemically cleavable. When the conjugate detaches from the surface at mild oxidative voltages, the oligonucleotide with an extendable 3′-end is spatially available for subsequent nucleotide coupling catalyzed by TdT. They preliminarily demonstrated controllable single-base enzymatic synthesis on a microelectrode array.206 These works is an important step forward toward high-throughput enzymatic DNA synthesis using high-density array chips platform. Han Sae Jung et al. from Harvard University has developed a control system for real-time pH monitoring, utilizing a CMOS integrated circuit system that applies voltages independently to electrode sites in a “concentric circle” layout, confining electrochemically-generated acid within the vicinity of the anode to prevent diffusion and enabling real-time monitoring of acid distribution on and around the activation site. This system enables parallelized pH-regulated enzymatic oligonucleotide extension and potentially can be extended to high-throughput enzymatic DNA synthesis in dense arrangements and mild aqueous media.207 However, it is not clear whether this system can properly develop pH-controlled effects in the high-density arrays required for DNA array-based synthesis, and further validation is needed. In addition, rapid amplification using biological cells, their enzymes or organelles is also the trend to achieve fast and accurate DNA synthesis. The booming revolution of synthetic biology as well as biotechnology will surely promote the development of large-scale, high-throughput DNA synthesis for data storage.
Synthesis method | Platform | Technique | Control mechanism of the synthesis cycle | Core components | Reagent | Addictives | Equipment | Representative company |
---|---|---|---|---|---|---|---|---|
Solid-phase phosphoramidite | Column | Columnar liquid flow | Each oligo is synthesized in a separate column having an independent synthesis cycle | pH-sensitive 5′ end terminal protected dNTP | Organic based | NA | Solenoid/pneumatic valve liquid circuit | Twist Bioscience; IDT |
Micro-array | Inkjet printing | Oligos share a synthesis cycle, while monomer and catalyst are applied to individual sites all over the array in order by an inkjet printer | NA | Piezo printheads and precision step-shift mechanical system | Agilent; Twist Bioscience | |||
Thermal synthesis | Oligos share a synthesis cycle, while the deprotection step is controlled synchronously on individual sites by temperature change | Temperature sensitive 5′ end terminal protected dNTP | NA | Addressable heating control circuit system | Evonetix | |||
Photo-degration | Oligos share a synthesis cycle, while the deprotection step is controlled individually by light irradiation | Photoliable 5′ end terminal protected dNTP | NA | Lithographic mask and optical control system, or digital | Affymetrix; LC Sciences; Roche | |||
Photo-acids | Oligos share a synthesis cycle, while the deprotection step is controlled individually by light irradiation, light-induced acidogenesis and a pH-sensitive protecting group on the monomer | pH-sensitive 5′ end terminal protected dNTP | Light sensitive | Micromirror control system | ||||
Electrochemical-acids | Oligos share a synthesis cycle, while the deprotection step is controlled individually by voltage, electrochemical acidogenesis and a pH-sensitive protecting group on the monomer | Redox | Integrated circuit-controlled addressable electrode arrays | CustomArray; Twist Bioscience; Intel Corporation; Microsoft | ||||
Enzymatic synthesis | Column/array | pH/thermal/photographic/kinetics | Column: similar to column based platform for solid-phase phosphoramidite method; array: similar to the micro-array based platform for solid-phase phosphoramidite method. | pH/thermal/photon sensitive 3′ end terminal protected dNTP, engineered TdT; or dNTP–TdT conjugate; or TdT and apyrase | Aqueous | Redox, or temperature/light sensitive | Workstations with solenoid valves or pipettes | Molecular Assemblies; DNA Script; Thermo Fisher; Nuclera |
Novel ideas and approaches are expected to further boost the development of technological platforms for DNA data storage. Optical or electrical recognition methods, which are utilized in DNA sequencing, can be integrated into the DNA synthesis process to achieve error correction during synthesis. For example, Xu et al. achieved synthesis and sequencing on the common electrode. The fluorescence spectra after the coupling of the expected bases are analyzed in real-time to check whether they are consistent with the expected spectra. A polymerase is used to catalyze primer extension during sequencing, which in turn causes charge redistribution, resulting in measurable current spikes.80 Combining real-time sequencing with in situ synthesis on a chip is expected to reduce the errors as well as reagent consumption. However, there may be a mismatch between synthesis and detection time. Furthermore, gene synthesis routes in the field of synthetic biology may further inspire the array-based synthesis technology. The synthesis process may be improved by monitoring various indicators such as electrical properties, temperature, magnetism or light influenced by the changes in the structure and length of ssDNA through physical or chemical analysis techniques. It is promising that in the future, the collaborative innovation of both academia and industry will give birth to new technological solutions, and drive the DNA data storage technology to become mature gradually and move further towards commercialization.
This journal is © The Royal Society of Chemistry 2024 |