Eray U.
Bozkurt
,
Emil C.
Ørsted
,
Daniel C.
Volke
and
Pablo I.
Nikel
*
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kongens Lyngby, Denmark. E-mail: erayub@biosustain.dtu.dk; emchr@biosustain.dtu.dk; chdavo@biosustain.dtu.dk; pabnik@biosustain.dtu.dk; Tel: +45 93 5119 18
First published on 15th October 2024
Covering: up to August 2024
Enzymes play an essential role in synthesizing value-added chemicals with high specificity and selectivity. Since enzymes utilize substrates derived from renewable resources, biocatalysis offers a pathway to an efficient bioeconomy with reduced environmental footprint. However, enzymes have evolved over millions of years to meet the needs of their host organisms, which often do not align with industrial requirements. As a result, enzymes frequently need to be tailored for specific industrial applications. Combining enzyme engineering with high-throughput screening has emerged as a key approach for developing novel biocatalysts, but several challenges are yet to be addressed. In this review, we explore emergent strategies and methods for isolating, creating, and characterizing enzymes optimized for bioproduction. We discuss fundamental approaches to discovering and generating enzyme variants and identifying those best suited for specific applications. Additionally, we cover techniques for creating libraries using automated systems and highlight innovative high-throughput screening methods that have been successfully employed to develop novel biocatalysts for natural product synthesis.
Enzyme engineering enables the discovery or creation of biocatalysts that can mediate new reactions, accept alternative substrates, or operate under different process conditions. The recent developments in computational and experimental techniques have accelerated the process of enzyme engineering and expanded the repertoire of biocatalytic transformations.22–25 Nowadays, enzyme engineering is a key driver of biocatalysis and bioeconomy, i.e., the sustainable production of chemicals and materials from renewable biological resources.26–28 Some of the current strategies for the development of high-performance biocatalysts involve screening natural diversity to isolate enzymes with the desired activity, engineering biocatalysts to expand their substrate range, redesigning mechanisms to create novel reactivity, and computationally designing enzymes from scratch. Biobased products and processes require tailored enzymes that fit their specific needs, e.g., activity (rates), stability, sensitivity, and selectivity. The discovery, design, and optimization of enzymes for bioproduction requires a systematic and comprehensive approach.29–31 Yet, one of the main challenges in enzyme engineering is the screening of enzyme variants, i.e., the selection of the best-performing enzymes from a large and diverse pool of mutants.32
In this review, we discuss the growing demand for improved enzymes as biocatalysts for a number of bioprocesses and the increasingly important role of high-throughput screening (HTS) methods in meeting these needs. We explore possible sources of enzyme variants, including state-of-the-art genome mining and gene diversification techniques. Additionally, we examine recent examples of machine learning (ML) and de novo enzyme design, highlighting their current limitations. The review also considers the expanding toolbox for HTS library generation and how it can complement in silico towards de novo biocatalyst engineering design and selection. We then outline strategies for screening and isolating novel enzymatic activities, focusing on both in vivo and in vitro HTS approaches. By showcasing recent progress and improvements in these methods, this review provides a roadmap for researchers in natural product synthesis, biocatalysis, and metabolic engineering who are interested in integrating HTS strategies into their workflows.
Genome mining bypasses cultivation by directly analyzing microbial genetic data (metagenomes). This approach unveils a hidden treasure trove of enzymes encoded within hitherto unculturable microbes and hence facilitates the discovery of novel biocatalysts with unique properties. Dedicated software packages (e.g., antiSMASH38) help navigating these vast datasets, predicting functionalities based on biosynthetic gene cluster similarities. Additionally, sequence comparison tools (e.g., BLAST) aid in selecting the most promising candidates,39–43 while EnzymeMiner44 complements these approaches by automating the search for soluble enzymes across a wide range of organisms; the latter tool filters candidates based on user-defined criteria (e.g., activity and stability), thus refining the selection process for enzymes that are not only novel but also industrially viable.
While extensive databases like UniProt,45 Brenda,46 and RetroBioCat47 offer a wealth of information on known enzymes, information on how these enzymes interact with non-natural substrates can be limited. This is a key challenge in enzyme engineering because the engineering campaign typically relies on starting with an enzyme that already possesses some activity. Therefore, to explore the vast sequence space found in metagenomic data and find enzymes with the desired activity, bioinformatics and HTS screening can be coupled to overcome the starting activity hurdle and aid in biocatalyst development for industrial applications.48,49
AlphaFold2, an AI-driven tool developed by DeepMind50 that builds upon the foundation of genome mining and bioinformatics, has marked a paradigm shift in our understanding of protein structures and their complex interactions. AlphaFold2 has revolutionized the field by accurately predicting the three-dimensional structures of proteins solely from their amino acid sequences. The implications for the field are relevant because this tool allows for rapid modeling proteins of unknown structures, thereby accelerating the discovery and engineering of novel enzymes and biocatalysts.51
The impact of AlphaFold2 extends beyond mere structure prediction and has catalyzed a deeper comprehension of protein dynamics and function. By elucidating the intricate folding patterns that dictate protein activity,52 this tool has opened new avenues for enzyme engineering, making possible the rational design of proteins with enhanced or novel functionalities. This feature is particularly significant for industrial biocatalysis, where the ability to predict and manipulate protein structures can lead to more efficient and sustainable processes.53
The recent introduction of AlphaFold3 has taken these capabilities a step further by incorporating the prediction of protein–ligand interactions into its repertoire.54 This improvement is a leap forward in computational biology because it enables the prediction of how proteins interact with small molecules, which is crucial for understanding enzyme–substrate relationships. The predictive power of AlphaFold 3 facilitates the exploration of enzyme–substrate interactions with non-natural substrates, and thus provides a powerful tool for enzyme engineers seeking to tailor biocatalysts for specific industrial applications. While many of the tools discussed are effective at predicting protein structure, significant progress is still needed to enhance the predictions on enhancing enzyme activities. Such improvements require further refinement, and potential candidates generated by these tools must undergo rigorous testing to confirm their activity. In this context, a recent review by Gantz et al.55 examines the interplay between HTS systems and ML in protein engineering.
Focused mutagenesis (also termed as site-directed mutagenesis) is one effective strategy for accomplishing library size reduction. Unlike random mutagenesis, focused mutagenesis refers to diversification of predetermined locations of a gene. Predetermined locations are selected based on the detailed knowledge of the protein's structure and function. Site saturation mutagenesis is the most common type of focused mutagenesis, where all 19 possible amino acids are substituted at targeted positions which results in focused but highly mutagenized, smaller, and smarter libraries. To shrink the library size even further, codon degeneracy can be arranged. For example, by allowing only G or T at the wobble position, the library size can be decreased by 32-fold, making the screening more efficient.57 Saturation mutagenesis can be performed in iterative rounds, with the best variant from each round used as the template for the next.
A recombination approach can also be used for creating smarter libraries. Homologous recombination is a natural evolutionary mechanism and key to running sequence diversification campaigns by DNA shuffling, which enables the exploration of genetic sequence spaces inaccessible by methods like epPCR and focused mutagenesis. The genes targeted for recombination (often a family of related genes) are first randomly broken down to fragments using DNaseI, and then the fragments of the desired size are selected and purified.57 Recombination takes place when fragments from different parental genes anneal at regions where there is high similarity in the sequence. It has previously been shown that in many instances DNA shuffling led to the creation of proteins that exhibit enhanced properties. Since reassembly relies on sequence similarity between DNA fragments, parental genes tend to be reconstructed due to high sequence identity. To facilitate combinatorial DNA shuffling, different techniques have been proposed. Random chimeragenesis on transient template (RACHITT) utilizes single stranded uracil-containing transient templates for library preparation.59 Donor fragments are first hybridized with transient templates. Heteroduplex strand fragments are stabilized on the transient template in full-length, rather than small overlaps. Next, uracil-DNA-glycosylase treatment, followed by PCR, forms homoduplex double-stranded DNA, resulting in a shuffled DNA library with an increased number of crossovers.57
The choice of DNA library preparation method largely depends on the screening technique accessible. An essential distinction between DNA library preparation approaches is the pooling status of the library. Unlike methods utilizing liquid handlers and microfluidics, random mutagenesis generates a pooled library of variants. Consequently, the screening process must be compatible with a pooled library setup. Typically, screening of pooled libraries involves a two-stage process: an initial screening that favors the selection and enrichment of certain variants over others, which is followed by a confirmatory screening to verify the effectiveness of these variants.
In recent years, ML has emerged as a powerful tool in the field of rational design of biocatalysts.60 The application of ML techniques has significantly accelerated the identification and optimization of enzymes for diverse catalytic reactions and applications in biocomputing.61 ML algorithms can analyze vast datasets comprising protein structures, sequences, and reaction mechanisms, while extracting meaningful patterns and correlations that can guide the design process.
ML techniques, e.g., deep learning and support vector machines, have been employed to predict enzyme–substrate interactions, substrate specificity, and catalytic activity. These models leverage the wealth of available biological and chemical data to make accurate predictions,62 streamlining the search for optimal biocatalysts. The integration of ML with experimental approaches has led to more efficient and targeted enzyme engineering, ultimately contributing to the development of tailored biocatalysts for specific applications.
In addition to leveraging ML, de novo enzyme design represents a paradigm shift in the rational engineering of biocatalysts.63–65 Unlike traditional rational enzyme design, where focus is placed on modifying the existing active site or binding pocket of the enzyme to improve its interaction with the substrate or the transition state, de novo design involves the creation of entirely new enzymes from scratch. This innovative approach is grounded in a deep understanding of protein structure, function, and the principles governing enzymatic catalysis.66 The de novo enzyme design process typically begins with the identification of a target reaction and the determination of its key catalytic features. Computational tools, e.g., molecular dynamics simulations and quantum mechanical calculations, are then employed to model potential enzyme structures capable of catalyzing the desired reaction. Iterative cycles of design and optimization, guided by both computational predictions and experimental validation, refine the engineered enzymes for improved efficiency and specificity.
Advances in de novo enzyme design have opened avenues for tailoring biocatalysts to meet specific industrial and environmental needs. The ability to custom-design enzymes for non-natural reactions or challenging substrates has broad implications for biotechnology and offers sustainable solutions for various processes ranging from pharmaceutical synthesis to biofuel production. Moreover, integrating ML techniques and principles of de novo enzyme design are transforming the biocatalysis landscape to enable the development of highly efficient and tailored catalysts with unprecedented precision and speed. Yeh et al.67 demonstrated these advantages by employing artificial intelligence to guide the creation of novel luciferase enzymes. By optimizing idealized protein scaffolds, they achieved high substrate specificity and catalytic efficiency for oxidative chemiluminescence. This work showcases the potential of deep learning in enzyme design, even without natural templates.
An excellent example on the use of Opentrons for DNA assembly automation was described in Storch et al.82 The authors reported the implementation of a homology-based cloning method called Biopart Assembly Standard for Idempotent Cloning (BASIC). This assembly method utilizes 21-bp single-stranded overhangs to combine DNA parts. With this method, 88 independent variants, each expressing three genes, were created and tested. Automation decreased the hands-on time from more than 5 hours to 1.5 hours. The authors also calculated that the operating cost was as low as 1.5 $ per DNA construct, which eliminates the barrier of cost for many practical applications. Despite its many advantages, this method is still not fully integrated and requires operator intervention. Four different scripts need to be initiated at the beginning of each step and the configuration of the plates needs to be arranged. Although introduction of a thermocycler module to OT-2 robots by Opentrons will add to the value of this method and will shorten the hands-on time, alternative methods provide more flexibility in terms of experimental design. DNA-BOT, despite its low-cost and hence greater accessibility, is yet to be fully integrated with cloning pre-steps, e.g., oligomer design, and post-steps, e.g., sequence verification. Therefore, pre- and post- DNA cloning steps remain as the bottleneck of the workflow.
Comprehensive integration of the procedure is crucial for maintaining high throughput since the overall throughput of the method is going to be evaluated based on the slowest process. Therefore, transition between steps and inclusion of pre- and post-DNA assembly steps should also be considered to achieve more robust workflows. The application developed by Nava et al.,86 termed DNAda, offers a comprehensive platform for DNA assembly (Fig. 1). DeviceEditor,87 a j5 algorithm-based user-friendly software for DNA assembly, lies at the core of the DNAda platform. The j5 algorithm is a computational tool that creates automated protocols for DNA assembly. Since its introduction in 2012, this algorithm has been one of the key technologies supporting computer-aided design of DNA constructs.88 DNAda is the first example of integration of an algorithm into automated laboratory workflows.
![]() | ||
Fig. 1 DNAda, an end-to-end, automated DNA assembly platform. The sequence begins with in silico combinatorial DNA design using DeviceEditor, which generates primers for PCR amplification of each part and construct. The system also provides instructions for the Echo acoustic liquid handler to arrange oligonucleotides and templates into compatible plates for each design. DNAda can analyze PCR results with an integrated gel electrophoresis system and offers guidelines to redo any unsuccessful amplifications. After successful amplification, the software directs the Echo liquid handler to purify DNA using magnetic beads and coordinates the components for DNA assembly. The process continues with a yeast-assisted homologous recombination protocol, followed by plasmid DNA extraction. DNAda then assists in transforming Escherichia coli, selecting colonies, and managing post-procedure steps, including stock preparation and submission for verification of all constructs. Adapted from Nava et al.86 |
The DNAda platform utilizes j5-designed DNA assemblies and provides streamlined automation instructions in each step of the DNA assembly process. The workflow starts by designing an in silico combinatorial DNA assembly using DeviceEditor, which designs oligonucleotides for PCR amplification for each construct and outputs a purchasing order sheet. Furthermore, the DNAda application provides automation instructions compatible with an Echo Acoustic Liquid Handler (Beckman Coulter). Machine-compatible plates, using appropriate oligonucleotides and templates, can be incorporated to assemble each construct designed in silico. Additionally, if amplification results are analyzed using a compatible gel electrophoresis device [Zero Agarose Gel (ZAG) DNA electrophoresis; Agilent Technologies], the DNAda application can interpret the results of PCR and provide instructions to repeat the failed amplifications. The high-level of automation not only decreases hands-on time but also minimizes interactions by the user in terms of sample handling and tracking.
After successful amplification, the program proceeds with Echo liquid handler instructions for DNA purification using magnetic beads and mixing parts for DNA assembly. The user then performs the yeast-assisted homologous recombination protocol and plasmid extraction. After plasmid extraction, the DNAda application assists with transformation to Escherichia coli, colony picking and, most importantly, post-steps (e.g., stock preparation and submission for verification).
The DNA assembly methods in the examples above still have flexibility limitations. Restriction based methods, e.g., Golden Gate assembly, require all DNA fragments to be free of “unintended” restriction sites, which places a huge burden on the versatility of the method. Even though homology based cloning methods, e.g., SLIC, SLICE, or Gibson assembly do not have stringent sequence constraints, they are not entirely without limitations.89 Single-stranded DNA (ssDNA) generated by this method may form secondary structures which significantly decrease the efficiency of certain constructs. The tendency to form secondary structures introduces a degree of bias into these methods. Therefore, in an outstanding example of end-to-end automation at biofoundry scale, Enghiad et al.89,90 developed a novel cloning strategy based on artificial restriction enzymes. The cloning strategy “PfAgo” can create defined ends unlike homology-based methods where exonuclease activity creates >20–40 nt; therefore, “PfAgo” can handle repeated sequences better and theoretically can construct any sequence. This workflow represents a fully automated, end-to-end platform integrating 23 machines. The workflow starts with entering the sequences to be assembled after which the system requires no further human interference. The developed software designs primers and guides for PfAgo cloning. This software is also capable of making smart decisions that overlap with daily wet lab work. For example, if a part is smaller than 80 bp, the software makes the decision to add the part as overhangs by PCR, if feasible. Furthermore, for verification of plasmid assembly, the platform performs confirmatory digestion in which the restriction sites are decided so that the resulting bands are distinct and observable.
The platform can be divided into three main modules: (i) upon receiving the oligo parts, the process from diluting oligonucleotide stocks until PCR purification is performed without human intervention with the help of complex automation equipment, e.g., centrifuges, robotic arms, and liquid handlers; (ii) once the required DNA sequences have been amplified, the cloning module spans from mixing correct sequences to be joined to transformation of assembled products to E. coli cells using integrated liquid handlers and thermocyclers; and (iii) the last module consists of isolating single colonies, purification of plasmids and verification of cloning by digestion followed by gel electrophoresis and preparing glycerol stocks for verified plasmids. The workflow has been highly validated by assembling 101 plasmids with different sizes of plasmids, ranging from 5 to 18 kb, with up to 12 DNA parts for six different species. The combination of the three automated modules amount to 20000 pipetting steps that otherwise would need to be performed by a group of researchers. In this workflow, the operator only has to make decisions based on the experimental designs—and repeat if necessary.
Shih et al.93 proposed a framework on how microfluidics could accelerate the build part of the Design-Build-Test-Learn (DBTL) cycle. The authors showcased a first-case example of using hybrid microfluidics for synthetic biology by adopting three common cloning techniques, i.e., Golden Gate, Gibson Assembly, and yeast assembly. This engineering effort was validated by automatically constructing 32 separate plasmids. The microfluidics device consists of three regions. The first one, the DNA assembly region, is where droplets carrying inserts and vectors are mixed. In this step, necessary reagents for the cloning technique are also added. Next, the droplets are incubated for varying times, depending on the cloning technique. A built-in temperature control ensures the right thermodynamics for DNA assembly. The plasmids are then mixed with cells and the droplets are delivered to the next region where a potential is applied to the droplet containing assembled DNA and cells for electroporation, hence allowing plasmid assembly and electroporation of assembled plasmids on-a-chip. The system shows potential for automating and scaling up DNA assembly and made possible the design of up to 16 unique constructs per run. While effective for assembling plasmids and performing transformations, the ability to handle multiple DNA assemblies simultaneously was not verified. Users need to manage preparatory and follow-up steps to maintain the throughput offered by the device.
An ideal HTS setup should cover the overall process. When a step is not amenable for automation or needs significant user intervention, it creates a process bottleneck hindering the overall throughput of the workflow. A comprehensive example is provided by Linshiz et al.94 The introduced platform facilitates a seamless integration of design, build, and test phases, beginning with automated in silico DNA library design and culminating in evaluation of these libraries. The “DNA Constructor” software was created for optimizing combinatorial DNA library protocols. For DNA assembly, a new method named isothermal hierarchical DNA construction (IHDC) was developed. The advantages of IHDC are that it is independent of temperature cycling and needs reduced process control, making it particularly well-suited for microfluidic environments. In addition, the adaptability of IHDC for DNA cloning was validated using Gibson assembly, Golden Gate assembly, and yeast DNA assembly methods. Heat-shock transformation was performed utilizing a Peltier external temperature controller. Furthermore, Linshiz et al.94 evaluated the DNA library basic assessment methods by combining their setup to an external camera, such that fluorometric and colorimetric assays can be easily monitored.
Overall, the automation era of biology brings standardization to molecular cloning protocols and minimizes the bias introduced by the user. Employment of automation techniques allows higher throughput per round of experiment and significantly decreases hands-on time. Furthermore, the much smaller volumes required per sample enable efficient cost-minimization. Liquid handlers are a vital asset for the next foundational steps of automated biotechnology due to their greater adaptability with current DNA assembly protocols, relative ease of integration with other equipment, and higher throughput compared to microfluidics systems to date. However, there is a limit to miniaturization of liquid handlers due to machine capabilities and evaporation in open-air systems. Microfluidics systems as a closed platform have advantages in terms of down-scaling and therefore cost effectiveness. Both systems have unique advantages and drawbacks, and the future of these technologies may lie at the intersection of microfluidics and robotic systems, with liquid handlers integrated into microfluidics setups to allow manipulation of liquid in microfluidics channels.
MAGE is an efficient and rapid tool that allows gene manipulation simultaneously in multiple loci. Initially described for E. coli,105 the technique was quickly adapted for its use in a wide range of prokaryotic and eukaryotic organisms. MAGE utilizes the λ-Red phage proteins to facilitate recombination between short single stranded DNA and the genomic locus of interest. The workflow starts by growing the cells at 30 °C until the culture reaches mid-logarithmic phase. A temperature inducible expression system regulates the expression of λ-Red proteins. Thus, when the cells are transferred to 42 °C, the expression of λ-Red genes is initiated, and the mismatch repair (MMR) system of the host is inactivated.106 Upon introducing single-stranded, mutation-carrying DNA oligonucleotides via electroporation, the oligonucleotides integrate in the lagging strand of the replication fork during DNA replication, yielding mutated genomes.47 In this way, MAGE allows specific gene manipulations while maintaining a stable genome by using only DNA oligonucleotides and creating genetic diversity that can be tested in a search for better production strains.
Since the MAGE system was first described in 2009, it has proven useful for many purposes, including (but not limited to) changing the native genetic code of microbial hosts,107,108 incorporating non-standard amino acids,109 inserting histidine tags,110 and optimizing the production of high-value chemicals, e.g., lycopene,111 curcumin,112 β-carotene,113 and L-DOPA, with improvements in the product titer of up to 38.2 fold.112 For instance, while achieving improved curcumin production, Kang et al.112 constructed an E. coli strain library where 5′-untranslated region sequences of six genes playing a role in curcumin pathway were randomized. The strain library was first screened by visual inspection on agar plates for color formation. For selected variants, LC-MS measurements were carried out. Strikingly, the highest performing strains were observed to express significantly less of two of the six enzymes, which led to improved production of curcumin. In another article, L-DOPA production from glucose was improved by tuning the ribosome binding site of 15 genes utilizing MAGE. In yet another example, β-carotene production in S. cerevisiae, was improved by creating an S. cerevisiae strain library. Here, the promoters, open reading frames, and terminators of four genes in β-carotene production pathway were targeted. The library then showed different levels of β-carotene and diverse colorimetric phenotypes, proving the effectiveness of MAGE.114
![]() | ||
Fig. 2 High-throughput technologies for selecting superior biocatalysts. (A) High-throughput screening (HTS) workflows implemented to facilitate the generation and screening of enzyme variants. (B) Techniques applied for read-out detection in HTS of enzymatic reactions. TF, transcription factor. (C) Comparison of different detection methods that enable versatile HTS workflows. Chromatographic separation, coupled with various detection methods (e.g., LC-MS, HPLC, GC-MS), has a relatively low throughput, but products can be measured and quantified directly. Fluorescent and colorimetric assays in a microtiter plate format can be scaled down to the nL level. Automation is possible when using an acoustic liquid handler. Microfluidics droplet generation is compatible with various manipulation and sorting strategies for different readouts, as shown in panels (D)–(F). FACS enables extremely HTS with tiny reaction volumes. (D) Fluorescence assistant droplet sorting (FADS)-based screening platform by Holstein et al.119 A cell-free expression (CFE) system was employed to screen a large (1014) protease library. To this end, the DNA encoding the variants was encapsulated together with an isothermal DNA polymerase and PCR components to amplify the template in a completely in vitro setup. This approach increases the gene copies in each droplet to 30![]() |
Once in vivo mutagenesis has been applied to enhance library complexity, a pool of mutants is conventionally screened by plating on selective agar plates and selecting colonies based on size, fluorescence, or other phenotypical characteristics. One of the most commonly adopted strategies for enzyme selection is the use of biosensors.133 Biosensors play a crucial role in these systems, serving as molecular sentinels that detect and quantify the presence of specific genetic or phenotypic changes. The use of cell surface display technologies, which present the biosensor on the exterior of the cell, allow for immediate interaction with the extracellular space of the target molecules. The integration of in vivo mutagenesis with HTS technologies, e.g., FACS and pico-droplet encapsulation, permits rapid analysis of large numbers of mutants at a scale and speed that were once inconceivable.134–136 These automated high-throughput methods also significantly reduce the labor and costs associated with traditional mutagenesis techniques, e.g., manual screening of agar plates.137
![]() | ||
Fig. 3 Growth-coupled selection for metabolic modules. (A) To apply growth-coupled selection, a wild-type microbial strain is rewired through targeted gene deletions that disrupt the native metabolic network, rendering the strain auxotrophic for specific metabolites. These selection strains are then used to test synthetic modules, e.g., enzymatic reactions of interest and their variants, which provide the essential metabolite(s). Successful complementation restores the growth of the synthetic auxotroph. (B) Adaptive laboratory evolution (ALE) can be implemented to improve growth rates and biomass yields. These parameters are a proxy of the synthetic module activity, as they are coupled to each other under selective growth conditions. (C) ALE cycles can be iterated as necessary, and the evolved clones are then subjected to multi-omic characterization and genotyping. This sequential process allows for the identification and reverse-engineering of beneficial mutations in a naïve (non-evolved) microbial host. Adapted from Cros et al.140 |
![]() | ||
Fig. 4 High-throughput selection of enzyme variants encapsulated in alginate droplets with nanoliter reactors. (A) A DNA library encoding PpAAR racemase variants is generated using error-prone PCR amplification, and fluorescent E. coli cells expressing a superfolder GFP gene (sfGFP) are transformed with the library and encapsulated in nanoliter reactors (NLRs). These NLRs are formed by laminar jet break-up in a calcium chloride (CaCl2) solution. The encapsulated cells proliferate in a stringent selective medium, where only those with a functional enzyme can grow. Large-particle flow cytometry is used to rapidly enrich NLRs containing large colonies, effectively separating active and inactive enzyme variants. Clones with active enzyme variants are transferred to rich LB medium for growth before being re-encapsulated for another round of enrichment. The top candidates are sorted into microtiter plates, where the expression of the PpAAR racemase variants is analyzed to determine specific activity. (B) Deletion of argA creates an arginine auxotrophy that can be rescued by L-ornithine. L-Ornithine can be obtained from externally supplied D-ornithine through PpAAR activity. The gene encoding PpAAR is expressed from a plasmid that includes tetracysteine (TC) and hexahistidine (H) tags, subsequently used for fluorescence-based quantification of PpAAR racemase levels in cell-free extracts and for protein purification, respectively. The PpAAR variants are transported to the periplasm, where L-lysine and D-ornithine compete for the active site; the selection stringency can be adjusted by titration with the antimetabolite L-lysine. Adapted from Femmer et al.141 |
Consistent selective pressure can drive the evolution of auxotrophic selection strains and improve the flux capacity of targeted enzymes. By integrating strategic genetic engineering, growth-coupled selection, and adaptive laboratory evolution (ALE, Fig. 3B), enzyme variants or metabolic pathways can be efficiently identified and refined. Upon genotypic and phenotypic characterization (e.g., aided by multi-omic analyses; Fig. 3C), these methodologies enable harness optimized biological systems for high-throughput, growth-coupled selection for natural product biosynthesis.115,142
A recent study by Femmer et al.141 demonstrated the miniaturization of a growth-based, enzyme evolution process where growth behavior correlates with the desired improvements in enzyme performance. Here, the authors engineered an arginine auxotrophic E. coli strain for directed evolution of the periplasmic broad-substrate racemase PpAAR that catalyzes racemization of D-ornithine (Fig. 4B). The product of PpAAR, L-ornithine, is a precursor of arginine synthesis and can therefore complement the auxotrophy. The stringency of the selection can be fine-tuned by including an antimetabolite, D-lysine, which is a competitive inhibitor of PpAAR. A HTS platform that leverages alginate droplets with nanoliter reaction volume (NLRs) was implemented to cultivate library clones. By creating conditions where only the bacteria with the most efficient racemase survive, the researchers could identify mutations that enhance enzymatic activity and sort the NLRs through fluorescence-assisted particle sorting.
This method is particularly useful because it enables selection of beneficial traits without the need to directly screen each individual variant, which would be impractical given the large number of possibilities. Instead, the growth of the microorganism itself indicates a successful mutation. This connection facilitates rapid screening and identification of enzyme variants with enhanced performance. The results exposed the platform's potential as a simple and effective strategy for in vivo enzyme evolution by quantification of residual substrate143 or enzyme activity, either surface-displayed144,145 or intracellular.146 The utility of this method is further highlighted by the fact that it can be easily adapted to most screenings using different antimetabolites.
Selecting improved variants can also be automated using a biosensor. Biosensors are particularly valuable in HTS because they streamline the traditionally labor-intensive and time-consuming process of enzyme selection. Additionally, biosensors facilitate rapid and sensitive detection of specific biological molecules, making them ideal for identifying and selecting enzymes with desired characteristics.147 In HTS, a large number of variants of enzymes or other proteins can be tested simultaneously for their ability to catalyze a reaction, bind to a substrate, or inhibit a process.148 By integrating biosensors into microfluidic systems or other automated platforms, thousands of enzyme reactions can be monitored at once, greatly accelerating the process of identifying the most effective enzymes for industrial processes, therapeutic applications, or research purposes.
Li et al.149 described the utilization of a biosensor to enhance erythritol production in the oleaginous yeast Yarrowia lipolytica through a picodroplet-based co-culture system. The methodology revolves around the use of fluorescence-activated droplet sorting (FADS) combined with a transcription factor (TF)-based biosensor. This system facilitates high-throughput isolation of yeast mutants that overproduce erythritol, an important sugar alcohol in the food industry. In the absence of erythritol, the biosensor circuit expresses a repressor protein, inhibiting the expression of reporter gene. In the presence of erythritol, the repressor protein undergoes a conformational change and can no longer bind to the promoter region, allowing expression of the GFP reporter gene. The three-step droplet operation process is illustrated in Fig. 5. Initially, droplets encapsulating yeast mutants were generated and incubated to support erythritol production (Fig. 5A). Next, a fluorescent erythritol-biosensing E. coli strain was pico-injected into the droplets (Fig. 5B). Finally, the droplets with the greatest fluorescence, which correlates with the highest erythritol production, were sorted via FADS (Fig. 5C). To optimize this process, the temperature and pH within the droplets were controlled, thus separating the erythritol production and detection phases and suppressing background biosensor expression using pH-sensitive erythromycin.
![]() | ||
Fig. 5 Fluorescence-activated droplet sorting (FADS) platform for high-throughput screening of Yarrowia lipolytica mutants with improved erythritol production. (A) In a first step, single Y. lipolytica mutants are encapsulated and collected in a breathable Teflon tube, where they are incubated at 30 °C for 36 hours. During this incubation, the yeast cells proliferate and release erythritol into the droplets. (B) The second step involves injecting an E. coli biosensor strain into the droplets. The droplets are then incubated at 37 °C for 12 hours, allowing the E. coli biosensor to proliferate and detect erythritol produced by Y. lipolytica within the droplets. (C) Droplets with high fluorescence intensity (connected to enhanced GFP, eGFP, which indicates elevated erythritol concentrations), are sorted using FADS (see also Fig. 2D). This process allows for the identification and isolation of high-yield mutant strains. Adapted from Li et al.149 |
After four rounds of iterative mutagenesis and screening, a high-performing strain, Yarrowia lipolytica S4-9, was identified and isolated. This strain had a significant increase in erythritol yield (17%) and production rate (26%) in 5 liters bioreactor cultures compared to the parent strain. The success of the study illustrates the potential of the FADS-TF co-culture system for HTS of extracellular products, paving the way for its application in screening other valuable metabolites as more biosensors become available. The methodology and results collectively highlight the efficiency and effectiveness of the microfluidic-biosensor co-culture system in enhancing industrial bioproduction and provide valuable information on how background expression of biosensors can be eliminated.
The platform described in the study advances the level of automation for HTS in bioproduction. The use of microfluidic droplets, combined with biosensors and FADS, enables the implementation of a largely automated process that can easily screen 106 samples per hour, identifying and isolating high-performing strains with minimal human intervention. However, while the system automates screening and selection of yeast mutants, human oversight is still necessary at certain stages. For instance, setting up the initial parameters, preparing the mutant libraries, and analyzing the final strains for erythritol production require human expertise. Additionally, the interpretation of results and decisions for subsequent rounds of mutagenesis and screening also relies on human judgment.
Cell surface display is an advanced biotechnological strategy that enables the presentation of heterologous proteins on the exterior of a microbial cell. This is accomplished by genetic fusion of the protein of interest, referred to as the passenger protein, with a carrier protein that possesses an anchoring domain. This domain facilitates the localization of the fusion protein to the cell membrane. Cell surface display can be a powerful tool in HTS for enzyme selection due to its ability to link phenotype with genotype for extracellular processes and products. In this context, the phenotype is the observable functional trait of the enzyme, e.g., its ability to catalyze a reaction, while the genotype is the genetic makeup that encodes the enzyme.
In HTS, a vast library of enzyme variants can be displayed on the surface of a cell, typically yeast or bacterium. The screening process involves exposing such enzyme libraries to a substrate or a reaction condition. The enzymes that can catalyze the desired reaction will bind to or convert the substrate which can be detected by various methods. For instance, if the reaction produces a fluorescent product, the cells with the most active enzymes will glow if excited by light of a specific wavelength. These ‘successful’ cells can then be isolated using techniques, e.g., FACS. Once isolated, the coding genes of these cells can be sequenced to identify the mutations that led to the improved enzyme function. This method allows researchers to screen millions of enzyme variants simultaneously, making it an ultrahigh-throughput approach. The approach is particularly useful for identifying enzymes with improved properties, e.g., increased activity (rates), stability, or substrate specificity, which are important for industrial applications, e.g., biofuel production, waste treatment, and synthesis of complex chemicals.
Cribari et al.150 have developed a high-throughput yeast cell surface display platform that can rapidly evaluate over 10 million enzyme mutants, as shown in Fig. 6. Their novel methodology was used to enhance the activity of enzymes that degrade synthetic polymers, which is crucial for eco-friendly plastic recycling. The library size for screening of such enzymes was limited to around 10000 mutants, due to the constraints imposed by polymer degradation assays. In the innovative platform, each yeast cell displays a unique mutant enzyme, and the activity of these enzymes is measured by the change in fluorescence when a synthetic probe, designed to resemble a polymer of interest, is cleaved. The most active mutants, those showing increased fluorescence, are then isolated using FACS and subsequently sequenced. To demonstrate the efficacy of this platform, the researchers conducted directed evolution experiments on a polyethylene terephthalate (PET)-depolymerizing enzyme, known as leaf and branch compost cutinase (LCC).151 They discovered mutations that significantly enhanced the enzyme's ability to degrade solid PET films. One mutation, H218Y, was highlighted for its role in improving the enzyme's binding affinity to PET, which was supported by biochemical assays and molecular dynamics simulations. Overall, the study achieved a remarkable increase in the throughput for screening polymer-degrading enzymes by three orders of magnitude.
![]() | ||
Fig. 6 Yeast surface display platform for evolving PET-depolymerizing enzymes. A library of >107 yeast clones is prepared in a single test tube, with each yeast cell displaying copies of a polyethylene terephthalate (PET)-depolymerizing enzyme variant on its surface. These cells are coated with a probe molecule resembling PET. For this tagging procedure, yeast cells are first non-specifically coated with an azide-linker construct using NHS labeling. Then, copper click chemistry is used to attach a probe to the linker, which contains an aromatic ester and a terminal biotin. High-activity enzyme mutants cleave the probe more efficiently than low-activity mutants. To detect biotin, yeast cells are stained with a fluorescent streptavidin phycoerythrin conjugate (sAv-PE). Cells displaying low sAv-PE signals are isolated using FACS, based on high levels of probe cleavage and, consequently, enhanced enzyme activity. The sAv-PE signal is normalized to the expression level of the PET-degrading enzyme by immunostaining a myc epitope tag appended to the enzyme. Adapted from Cribari et al.150 |
While mutagenesis takes place concurrently with the selection of the desired phenotypic trait during directed in vivo evolution, in vivo hypermutators can be used to increase the mutation rate which is naturally between 10−10 to 10−9 per cell per generation to as high as 10−4 within the target gene(s).152–155 A hypermutator is a synthetic tool used to enhance the mutation rate within, or in proximity to, a gene of interest.156 Different techniques have been developed for engineering hypermutator strains, as recently reviewed by Molina et al.153
The key advantage of using hypermutator strains is that they can be used to generate mutational depth quickly in the form of diversified enzyme variants that are highly fit to perform a desired activity without mutating or activating off-target enzymes that might circumvent selection in the chosen strain. The integration of hypermutators with high-throughput capabilities of in vivo experimentation opens the door to a multitude of evolutionary pathways, hence enabling researchers to identify the true global maxima within the fitness landscape. Additionally, oscillating the selection pressures has emerged as an alternative technique for traversing different fitness landscapes, thus increasing the chances of reaching a global maximum for the phenotype of interest.157
In a recent study, Chen et al.158 developed an innovative ultraHTS-assisted in vivo evolution system for enzyme engineering using a hypermutator allele (Fig. 7A). The continuous in vivo evolution utilizes a thermo-inducible error-prone DNA polymerase and a temperature-sensitive mutS mutant. This setup facilitated the increased generation and fixation of mutations in an α-amylase gene cloned in the E. coli hypermutator (Fig. 7B). This system had a 600-fold higher mutation rate than the wild-type strain.
![]() | ||
Fig. 7 A hypermutator platform for temperature-controlled continuous in vivo evolution. (A) The system is based on a plasmid-borne error-prone DNA polymerase I (Pol I*) gene under control of the temperature-sensitive λPR/PL-cI857* expression system. Pol I initiates the replication of ColE1 plasmids and thereby the mutation rate is increased downstream of the ColE1 origin of replication (ori) at temperatures above 37 °C. To further enhance the mutation rate, the mismatch repair (MMR) system was rendered temperature-sensitive by incorporating the A134V mutation into mutS, which leads to almost wild-type levels of spontaneous mutants at 37 °C, but increases the mutation rate above 43 °C. During replication of the target plasmid at 43 °C, the synergistic action of Pol I* and MutSA134V mediates a higher mutation frequency compared to lower temperatures. (B) In vivo evolution of α-amylase (bla) for enhanced starch hydrolysis. Bla was fused to the carboxyl terminus of the outer membrane chimeric protein Lpp–OmpA, which directs the chimera to the cell surface. An evolved library of Bla variants was generated by passing the hypermutator strain from mutagenesis conditions (43 °C, rich medium) to selective conditions (37 °C, minimal medium supplemented with starch as sole carbon source). Single clones of the library were co-embedded with starch in droplets, sorted on the basis of high fluorescence. (C) Using the hypermutator platform for improved resveratrol production using a resveratrol-responsive biosensor. The TtgR repressor blocks mCherry transcription in the absence of resveratrol. Upon binding resveratrol, TggR releases the operator site and mCherry is synthesized. A ColE1 plasmid harboring the genes for resveratrol biosynthesis, i.e., stilbene synthase (STS) and p-coumarate:CoA ligase (4CL), was introduced into the hypermutator strain. A library of variants was created using iterative mutagenesis passages with consecutive FACS sorting for highly fluorescent cells. Adapted from Chen et al.158 |
Finally, cell-free expression (CFE), also known as in vitro protein synthesis, has been increasingly adopted over recent years for high-throughput production and characterization of proteins.159 CFE refers to protein production without the use of living cells, circumventing the conventional protein production workflow. While traditional protocols require growing cells and expressing and purifying proteins (typically, consuming several days), CFE can produce proteins within a few hours. Due to the small size of reactions and volume of reagents used, the costs are also significantly decreased. By the same token, CFE increases the capabilities of parallel processing, and CFE can be easily integrated with various detection technologies mentioned in this review. This flexibility opens the possibility of seamless integration of steps and streamlines the pipeline.
In colorimetric assays, enzymes in general produce a colored compound which can be detected through absorption. Spatial separation of library members and genotype-phenotype linkage is achieved using mostly microtiter plates. Employment of liquid handlers and other robotic platforms can increase the throughput to 104 variants per experiment.161 Fluorometric detection is arguably the most widespread method for evaluating the activity of enzymes. Enzyme activity needs to be linked with the production of a fluorescent molecule to track the performance of the enzyme. There are several well-established fluorescent coupling strategies in the literature that constitute the basis for any HTS setup.162 A basic and commonly used method to establish the linkage is through utilization of fluorogenic substrates.163 These have been utilized in several publications to evaluate the activity of enzymes, e.g., oxidoreductases and lyases, or in the identification of novel enzymes.119,163,164 If no direct colorimetric and fluorometric assays are available, the product of the enzyme of interest can be processed further with enzymes that use the non-fluorescent metabolite of interest and convert it to a fluorescent product. As an alternative, biosensors can also be employed to speed up screening processes. The non-fluorescent metabolite formed as the result of the enzyme activity can be detected at high specificity and sensitivity through a transcription-based, RNA-based, or DNA-based biosensor. So far, applications with aptamers, transcription factors, and riboswitches have been successfully demonstrated. As a response to the metabolite binding, the biosensor can generate fluorescent products that correspond to the enzyme activity.162 Although indirect measurements are not very sensitive and usually require confirmation after initial screening, they still provide valuable information and allow faster screening compared to direct selection methods. In the absence of colorimetric and fluorometric assays, however, the samples need to be processed label-free using analytical methods, e.g., GC-MS, LC-MS, and HPLC, which hinders the overall throughput.160
Microtiter plate-based setups are still used in most of the label-free detection approaches. A recent example of high-throughput analyte detection is reported by Albornoz et al.169 In the article, the authors focused on improving surfactin yield in Bacillus and achieved 160% improvement by refining the media formulation. Surfactin is a promising biosurfactant due to its thermostability and has applications in wide range of industries. The pipeline includes extensive microtiter plate experiments where Bacillus cells were grown with combinations of media formulations. For yield measurements, they developed a fast method to perform metabolomics analysis of the samples using flow-injection mass-spectrometry. The method can analyze a sample in 3 minutes, which allows completion of a 96-well plate analysis in 5 hours. To assess the yield and design the next round of experiments, they used iterative cycles of active-learning algorithm directed media formulations for maximum production.
Reliable liquid handling systems are in high demand due to the need to work with reduced volumes and parallel sample processing for HTS. Downsizing samples and precise liquid manipulation were made possible by integrating liquid handling devices with microtiter plate operations. In a recent example, Ahsan et al.170 demonstrated that a combination of genome-mining, CFE, and liquid handlers could efficiently screen polyester polyurethane degrading enzymes. The authors used biofilm bioinformatics tools to identify organisms growing on the metal surface of aircraft and trucks. Next, in an example of use of genome mining tools, they used a colorogenic esterase probe, 4-nitrophenyl hexanoate (4-NPH), to detect the activity of putative enzymes at 405 nm. The putative enzymes were then produced in CFE reactions. Liquid handlers automated the mixing of CFE reactions, production of enzymes, and addition of 4-NPH. The absorbance measured at 405 revealed the top performing enzymes with the highest esterase activity.
Due to the versatility of liquid handling systems, they can be employed to process samples with any read-out type, whether colorimetric, fluorometric, or analytical. In another example, Beeman et al.171 utilized liquid handling systems to prepare a pipeline for high-throughput integrated MALDI-MS analysis for c-MET kinase assay in a 1536-well assay setup. Although label-free detection methods are known to be time-consuming, with the advancement of hardware systems it has become possible to create an HT label-free detection method. A RapifleX MALDI Pharma Pulse system was employed to attain a throughput of up to 1 million variants analyzed per week. The assay principle is based upon phosphorylation of the peptide substrate, SRCtide, which leads to an 80 Da shift in m/z spectrum. Upon phosphorylation, both SRC and p-SRC can be quantified using the MALDI-MS system to provide insights into the behavior and the activity of the enzymes tested. Due to working with such small volumes, it is important to highlight the speed of the liquid handling system and possible evaporation. Acoustic liquid handling systems are overall faster than pipettor-based systems. Due to evaporation at such small scales, the authors reported that utilization of acoustic liquid handler does not interfere with the reliability of their workflow.
CFE systems promise an opportunity to further accelerate the screening process by eliminating plasmid assembly, transformation, and cell-culturing steps, and hence minimizing reagent consumption.190,191 In a clever design, Holstein et al.119 used a CFE system to screen a protease library of 1014 variants in droplets (Fig. 2D), which is not possible in vivo due to transformation efficiency limitations. They encapsulated their DNA variants with isothermal DNA polymerase and PCR reagents to achieve DNA variant enrichment in their entirely in vitro setup. This step accomplished the enrichment of DNA variants in the droplets of up to 30000 copies, which ensures sufficient yields of protein in CFE. The CFE reagents were then pico-injected into droplets which were incubated for 4 hours for protein production to take place. After production, the fluorogenic casein substrate was pico-injected and the droplets were again incubated for hydrolytic cleavage, which emits a fluorescent signal. Droplets were then sorted using a fluorescence-assisted droplet sorting device, the higher performers were selected, and a few selected variants were subjected to confirmatory assays in microtiter plates to identify a 5-fold better performing variant.
Droplet microfluidics systems are more versatile than FACS-systems because they can integrate with other devices to allow different readout types, and manipulations can be performed on the content of droplets. So far, FADS has been integrated and modified to work with colorimetric assays (absorbance-activated droplet sorting; AADS) and label-free detection (mass activated droplet sorting; MADS). A sophisticated example of absorbance-based detection was reported in Gielen et al.120 As outlined in Fig. 2E, the authors improved the yields of phenylalanine dehydrogenase (PheDH), which catalyzes the NAD+-dependent deamination of amino acids. PheDH converts L-phenylalanine (L-Phe) to phenylpyruvate, which is coupled to the oxidation of WST-1 to the dye WST-1 formazan. For the screening setup, two optical fibers were placed across a microfluidic channel to measure any decrease in transmittance when a droplet passes through, and hence created a novel “absorbance-activated droplet sorting” (AADS). After validating the screening setup, the E. coli library harboring PheDH variants was encapsulated, and the cells were lysed inside the droplet. Enzyme variants were exposed to the substrate inside the droplets and, depending on the activity of the variant, catalyzed the formation of formazan dye. After a period of incubation, droplets with varying levels of formazan dye passed through the microfluidic channel, where absorbance of each droplet was measured. The authors successfully demonstrated enrichment of PheDH variants of up to 2800-fold in activity, at a speed of up to 300 droplets per second. Medcalf et al.192 then utilized the same target and the same setup with modifications to show increased speed of sorting, achieving 1000 droplets per second. Although sorting is slower than similar setups with fluorescence read-out, this is still a major advancement over the throughput of widely used microtiter plates for colorimetric assays.
Another very promising setup, illustrated in Fig. 2F, has been published by Holland-Moritz et al.121 These authors designed a mass spectrometry integrated droplet microfluidics-based setup that can gather information about the unlabeled content of droplets. In their setup, they integrated electrospray ionization mass spectrometry (ESI-MS) for analysis of droplet content and demonstrated the utility of their setup using a transaminase that converts 1-(imidazo[2,1-b]thiazol-6-yl)propan-2-amine into 1-(imidazo[2,1-b]thiazol-6-yl)propan-2-one. Transaminases were expressed in vitro and encapsulated, and the authors tracked the decrease of the substrate concentration to identify enzymes displaying high activity. Since mass spectrometry analysis destroys the sample, Holland-Moritz et al.121 split the droplet in two prior to analysis—one droplet for analysis and the other for retaining the information. While a major part of the sample was sent for MS analysis, a small part was retained in microfluidic channels until the analysis is complete and the decision is made regarding the fate of the droplet. Although the throughput of the system droplet per second as well as sufficient product needs to be present to match the detection limit of MS, the method is very valuable for gathering information about almost any reaction that takes place 100-fold faster than standard HPLC/MS measurements.192 The system was improved later at the same laboratory and the authors demonstrated the utility of the strategy for sorting lysine overproducer strains at increased sorting speed.122 In this work, two different E. coli strains were utilized—wild-type and a derivative carrying DapAE84T, which is an improved variant for lysine production. The strains were encapsulated in emulsions and cultured in droplets for growth and lysine production. Similarly to the workflow of Holland-Moritz et al.,121 the droplets were split in two and positively-sorted droplets were cultured and sent for validation of mutant genotype.
This journal is © The Royal Society of Chemistry 2025 |