Gahyeon Kim†
a,
Dukwon Lee†
a,
Ji Hun Kim†
a,
Seong Do Kim†
b,
Hongki Kim†
b,
Jae Heon Kim†
b,
Sung Sun Yimab,
Soo-Jin Yeomc,
Jay D. Keaslingdef and
Byung-Kwan Cho*ab
aDepartment of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea. E-mail: bcho@kaist.ac.kr
bGraduate School of Engineering Biology, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
cSchool of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
dDepartment of Chemical & Biomolecular Engineering, University of California, Berkeley, CA, USA
eJoint BioEnergy Institute, Emeryville, CA, USA
fBiological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
First published on 27th June 2025
Covering: 2020 to 2025
Natural products remain indispensable sources of therapeutic and bioactive compounds, yet traditional discovery strategies are constrained by compound rediscovery. Modular biosynthetic enzymes, such as type I polyketide synthases (PKSs) and type A non-ribosomal peptide synthetases (NRPSs), offer promising platforms for combinatorial biosynthesis owing to their programmable architectures. However, practical implementation is frequently limited by inter-modular incompatibility and domain-specific interactions. This review highlights recent advances in modular enzyme assembly enabled by synthetic interfaces-including cognate docking domains, synthetic coiled-coils, SpyTag/SpyCatcher, and split inteins-which function as orthogonal, standardized connectors to facilitate post-translational complex formation. These interfaces support rational investigations into substrate specificity, module compatibility, and pathway derivatization as well as general enzyme clustering applications beyond PKS and NRPS systems. Synthetic interfaces can be integrated with computational tools to support a more systematic and scalable framework for modular enzyme engineering by providing predictive insights into domain compatibility and interface design. These approaches within iterative design-build-test-learn workflows can accelerate the programmable assembly of biosynthetic systems and expand the accessible chemical space for natural products.
However, despite this potential, practical laboratory implementation faces several technical hurdles, including module incompatibility and limitations imposed by intrinsic enzymatic specificity, exemplified by the restrictive selectivity of ketosynthase gatekeeper domains.6 Moreover, another critical challenge arises from truncated mRNA transcripts due to the substantial size of biosynthetic gene clusters (BGCs).7 To mitigate these challenges, recent efforts have focused on engineering modular interfaces to enable coordinated module swapping and enhance compatibility among enzyme modules. Such strategies incorporate synthetic interfaces, including docking domains (DDs), communication-mediating (COM) domains, which are naturally derived but can be synthetically repurposed across non-cognate contexts, as well as engineered interaction modules such as synthetic coiled-coils, SpyTag/SpyCatcher system, and split inteins.
The exploration and development of such synthetic interfaces are highly relevant in the context of synthetic biology, an innovative field that integrates engineering principles into biological research. A key goal of synthetic biology involves the establishment of standardized parts, modularity, and universality. Unlike traditional protein engineering approaches, which tend to produce highly specialized solutions with limited transferability, synthetic biology emphasizes the development of standardized components capable of performing consistent tasks across various biological systems and engineering contexts.8 To achieve this, synthetic biology employs diverse toolkits categorized by their functional targets. At the transcription level, standardized genetic parts include inducible systems, promoters, and terminators, facilitating precise control of gene expression timing and strength.9,10 At the translation level, tools such as ribosome-binding sites and codon optimization strategies enable enhanced protein synthesis efficiency and accuracy.11 However, current synthetic biology toolkits predominantly focus on transcriptional and translational regulation, leaving a gap in tools available for protein association control. Therefore, the development and characterization of protein interaction domains as standardized biological components are critically important.
The integration of synthetic interfaces with modular enzyme assembly offers significant advantages, providing enhanced modularity, structural versatility, and assembly efficiency. Furthermore, advancements in AI-driven enzyme engineering have emerged, promising to further streamline enzyme optimization, predict functional compatibilities, and facilitate the design of novel enzyme modules. These advancements are increasingly organized within a rational engineering framework known as design-build-test-learn (DBTL) cycle for synthetic biology, which enables iterative improvement of modular biosynthetic systems (Fig. 1). In the design step, a desired compound, either an existing bioactive molecule or a newly designed structure, is deconstructed to identify suitable biosynthetic modules for its synthesis. This includes determining which PKS or NRPS domains are compatible with the intended substrates and how to configure them into a functional module architecture capable of producing the target molecule. The build step enables automation-assisted combinatorial construction of modular enzyme assemblies using well-characterized parts. During the test step, chimeric constructs are expressed, and their functionality is quantified through analytic methods. Finally, in the learn step, AI-assisted linker optimization and modular enzyme design integrate experimental outcomes to improve subsequent design cycles. Beyond realizing retrobiosynthesis from desired scaffolds, the DBTL cycle can further advance toward derivatization by enabling systematic recombination of modules and synthetic interfaces. This iterative optimization facilitates the rational exploration of chemical space and supports the generation of novel bioactive derivatives.
6-Deoxyerythronolide B synthase (DEBS) from Streptomyces erythraeus is one of the most thoroughly studied PKSs for understanding the architecture and mechanism of modular type I PKSs. DEBS has been extensively investigated because it produces the macrolide precursor of erythromycin, a clinically important antibiotic.14 The significance of DEBS lies not only in its pharmaceutical relevance but also in its highly organized modular architecture, which exemplifies the assembly line strategy of type I PKSs.15 This feature makes DEBS an ideal system for illustrating the principles of modularity and domain organization in PKS biosynthesis. DEBS exemplifies the assembly-line logic of type I PKSs, comprising eight modules that sequentially mediate the biosynthesis of the macrolide precursor to erythromycin. These modules-covering loading, elongation, and termination steps-are distributed across three polypeptides, yet maintain functional continuity via DDs at their N- and C-termini. As illustrated in Fig. 2a, each elongation module incorporates a unique set of catalytic activities, enabling structural variation at every cycle. This modular repetition combined with functional variability underlies the remarkable chemical diversity observed in polyketides (PKs).16,17
At the domain level, each module consists of a defined arrangement of catalytic units typically organized as KS-AT-modifying domains-ACP. These domains act in concert to facilitate chain elongation and functional group introduction in a processive, ordered manner. Although the identity and presence of modifying domains vary between modules, the overall architecture is conserved, providing a robust framework for rational reprogramming of PKS systems.16,17 The repetitive KS-AT-modifying domains-ACP pattern is a hallmark of type I PKSs, illustrating their assembly line-like biosynthetic mechanism (Fig. 2a). In this system, each module functions as a dedicated processing station, analogous to a unit on a conveyor belt, where it adds a single malonyl-derived extension unit to the growing PK chain before passing it along to the next module.18 By strategically rearranging modules, replacing them, or introducing new catalytic units, researchers can engineer diverse structural and chemical properties into PK products. This adaptability has enabled the creation of novel bioactive compounds through domain swapping, module excision, and recombination strategies, highlighting the practical versatility of type I PKSs.19,20
All catalytic steps are orchestrated around the KS dimer axis, allowing for the efficient and accurate biosynthesis of complex organic compounds such as 6-DEB. The dimerization property of PKS modules is also crucial in the function of DDs, which mediate inter-protein interactions.21 In DEBS, two DDs are present at the interfaces between modules 1–2 and modules 2–3. These DDs facilitate stable protein docking by tightly interlocking α-helices, a feature essential for maintaining structural integrity (Fig. 2a).22 The C-terminal DD helix of the preceding module and the N-terminal DD helix of the succeeding module intertwine in a module-specific manner, preventing misalignment and ensuring precise module-to-module communication. This module-selective docking mechanism makes DDs particularly suitable targets for protein engineering.23
Although recent advances have markedly enhanced our understanding of NRPS modular structures-particularly through high-resolution studies of full mono- and di-module systems-the field continues to explore the broader structural organization of type A NRPSs.24–27 Surfactin synthetase (SrfA) from Bacillus subtilis stands out as a representative example, offering valuable insights into the architecture and mechanism of modular NRPSs.28,29 While modular representations of SrfA have been proposed, a universally accepted full structural model is still lacking, underscoring the continued challenges in elucidating the spatial organization and inter-domain dynamics of NRPS systems.30 Nonetheless, the accumulated structural and functional studies of SrfA provide a foundational framework for understanding NRPS modularity, domain architecture, and catalytic mechanisms.31
Like PKSs, type A NRPSs exhibit modular organization consisting of sequential catalytic domains arranged typically in a condensation–adenylation–thiolation (C–A–T) pattern, along with specialized domains such as epimerization (E) and thioesterase (TE) domains (Fig. 2b).28,32 Whereas, unlike PKSs, which elongate PK intermediates, NRPS systems incorporate amino acids to produce diverse peptides. Despite these mechanistic distinctions, both systems employ analogous assembly-line logic, sequentially extending and modifying intermediates. In modular NRPSs, inter-protein communication is mediated by specialized communication-mediating (COM) domains, which enable efficient substrate transfer and coordinated catalysis across module boundaries.
In SrfA, this intermodular communication critically depends on COM regions located at the terminal ends of each protein subunit (Fig. 2b). Although SrfA lacks integrated modifying domains within its C–A–T core, it incorporates modifications via separate domains, most notably the C-terminal E domain.33 The E domain not only catalyzes L-to-D amino acid conversion but also contributes structurally by stabilizing COM-mediated docking.25,27,34 This COM-driven mechanism ensures precise inter-protein alignment and functional continuity in SrfA.
Unlike the DD in PKSs, which forms an intertwined helix pair between separate modules, the COM domain in NRPSs constitutes an asymmetric hybrid interface. It consists of a short α-helix segment (COM-donor, COMD) from the upstream module (typically appended to the E domain) and a shallow surface pocket (COM-acceptor, COMA) embedded within the C domain of the downstream module. Due to the inherent mutual specificity between COMD and COMA, it is more appropriate to consider the COM interface as a cognate domain pair, rather than two independent binding sites. This distinction is particularly relevant when repurposing or engineering NRPS modules for synthetic applications.
Interaction domains such as DD in PKSs and COM domains in NRPSs exhibit notable structural diversity across different biosynthetic systems. These domains are not uniform in architecture; instead, they comprise multiple structural classes that differ in helical organization, interaction interfaces, and specificity-conferring residues.35 This diversity enables selective and directional intermodular communication and provides a broad repertoire of interchangeable elements for synthetic design. As illustrated in Fig. 2a, even within a single biosynthetic pathway such as DEBS, the structural features of DDs between subunit interfaces (e.g., DEBS1-DEBS2 vs. DEBS2-DEBS3) can differ, underscoring their functional flexibility. Understanding and leveraging this variability is essential for expanding the design space of engineered assembly lines.
These modular biosynthetic systems, despite architectural differences, operate through a conserved conveyor belt-like logic wherein each module executes a defined reaction step.32 This principle underpins the programmability of both PKS/NRPS systems and offers a conceptual basis for modular engineering, which guide the rational design of chimeric enzymes for the biosynthesis of novel bioactive compounds.
This updated boundary definition, rooted in evolutionary module architecture, clarifies longstanding challenges in PKS engineering.38 Previously, engineering efforts frequently involved swapping or modifying KS domains positioned downstream of ACP, disrupting precise interactions and reducing substrate transfer efficiency.39 By adopting the updated boundary, where ACP domains internally transfer intermediates directly to the subsequent KS domain within the same module, domain recognition becomes stable and consistent, significantly enhancing biosynthetic efficiency and product fidelity (Fig. 3a).40
Analogous structural insights extend to NRPS systems. Previously, NRPS modules were similarly defined, placing the T domain at the module's terminus.28 Yet recent engineering data indicate that, like PKS systems, NRPS modules can similarly be reorganized into evolutionary functional units with the T domain at the center.41 In this configuration, the T domain captures the amino acid monomer, undergoes modifications exclusively within its own module, and transfers the fully processed intermediate directly to its module's C domain, which is now positioned as the downstream boundary (Fig. 3b). Consequently, intermediate transfer and peptide elongation are confined and stabilized by interactions occurring strictly within each module. This modular restructuring prevents unintended interactions and backward progression, reflecting a strictly enforced biosynthetic directionality.
Ultimately, redefining the module boundary based on evolutionary and structural criteria has significant practical implications. Notably, employing the updated boundaries has already proven beneficial, exemplified by successful engineering efforts where modules designed according to new boundaries yielded previously unattainable synthetic products, such as homoaureothin.5,36 Thus, adopting evolutionary functional boundaries provides structurally and biologically coherent frameworks for rational engineering for future engineering approaches, enhancing our ability to generate novel PK and peptide derivatives with predictable and efficient outcomes.
![]() | ||
Fig. 4 Synthetic interfaces for modular enzyme assembly and biosynthetic applications. (a) Representative structural models of synthetic interfaces used for modular assembly: DD (DEBS1-CDD/DEBS2-NDD, AlphaFold3), COM (srfAA-COMD/srfAB-COMA, AlphaFold3), coiled-coil (SYNZIP 17/18, AlphaFold3), SpyTag/SpyCatcher (PDB: 4MLI), and intein (PDB: 7CFV) pair. (b) Synthetic interfaces are genetically fused to modular PKS/NRPS proteins at the N- or C-termini, enabling post-translational assembly of functional multi-enzyme constructs. Distinct interface types (green: DD; blue: SYNZIP; purple: SpyTag/SpyCatcher; pink: intein) mediate the formation of chimeric biosynthetic assemblies from individually transcribed and translated modules. (c) Functional applications of synthetic interfaces. These modules facilitate the characterization of substrate specificity by connecting known and unknown modules, allowing inference of substrate preference based on product profiles. They also support combinatorial testing for module compatibility, construction of synthetic pathways through retrobiosynthetic logic, and scaffold derivatization via domain/module substitutions. Beyond PKS/NRPS systems, synthetic interfaces also enable modular clustering in metabolic pathway engineering to enhance flux and control. |
PKS/NRPS systems.22,42,43 These studies led to the establishment of various model OIM systems, most notably the 6-DEBS OIMs.
Building upon these foundational findings, research efforts over the past five years have shifted toward expanding the applicability of OIMs, aiming to transform them from narrowly optimized, target-specific tools into robust and versatile platforms for engineering a broad range of systems, including non-model PKS/NRPS pathways and entirely unrelated metabolic pathways. These advancements can be broadly classified into three categories: employing OIM engraftment to generate truncated variants of native PKS/NRPS systems, enhance the productivity of PKS/NRPS systems, and assemble chimeric enzymes.
Using OIMs to truncate native PKS/NRPS systems into mini-synthase variants is a key strategy in biosynthetic pathway engineering. This approach not only enables the generation of diverse chemical derivatives of the original product, but also serves as an effective means to deconstruct large PKS/NRPS systems into smaller, more manageable units for detailed characterization of module compatibility and downstream engineering endeavours.44 In this context, both endogenous OIMs from the target PKS/NRPS system and heterologous OIMs can be employed to construct novel inter-modular interfaces that may not naturally occur in the absence of these elements. Notable applications include the mini-stambomycin and mini-azalomycin F variants with the aid of endogenous OIMs.44,45 When applied with alternative modification strategies such as domain swapping and rational site-directed mutagenesis, OIM modification can further broaden the scope of derivatization.44,45 This strategy has also been extended to NRPS systems, as demonstrated by the engineering of plipastatin NRPS systems using native OIMs to produce shortened module variants for product derivatization.46 A key consideration when applying this strategy to NRPS systems, however, is the inherent promiscuity exhibited by the COM domain families of NRPS OIMs. Given the technical difficulties associated with enhancing the fidelity of COM domains, a practical guideline will be to leverage their promiscuity to expand the structural diversity of derivative compounds.28 Alternatively, the promiscuous interaction modules of the NRPS system may be exchanged with more stringent NRPS OIMs, such as the Xenorhabdus-derived SLiM-βhD domain pairs.47,48
OIM engraftment may also be utilized to enhance both the stability and productivity of existing PKS/NRPS systems. Among the available options, the well-characterized DEBS OIMs stand out as a prominent and extensively adopted tool for this strategy. Beyond substituting native interaction domains, DEBS OIMs have also been used to segment intact PKS genes into smaller modules, effectively replacing covalent inter-modular linkages with non-covalent interfaces while preserving the functionality of the entire system. This approach, often used in in vitro reconstitution experiments, is particularly useful for enabling the expression of large PKS genes in heterologous hosts such as E. coli and the production of the target product from split modules.49–51 Similar strategies have also been implemented using less-characterized OIMs. For instance, OIMs derived from salinomycin and stigmatellin have been applied to split a wide array of unstable PKS systems, including butenyl-spinosyn, salinomycin, avermectin, and epothilone, into stably expressed module parts.7 This approach was similarly applied to the xefoampeptide-producing NRPS from Xenorhabdus bovienii, demonstrating its utility across both PKS/NRPS systems.52
OIMs may also be used to construct interfaces between heterologous modules, enabling the production of designer molecules. The construction of bi-modular chimeric PKS libraries using subunits from pikromycin synthase and venemycin synthase exemplifies this approach.38 Furthermore, OIMs may also be employed to alter the starter units of PKS systems by facilitating the exchange of the loading modules, as demonstrated by the use of curacin-derived OIMs to enable the production of alkyne-tagged polyketides.53
Alternatively, there has been growing interest in extending the utility of OIMs toward broader synthetic biology applications such as metabolic engineering, artificial enzyme assembly, and synthetic protein scaffolds. Because OIM pairs retain their interaction capabilities independently of their native biosynthetic contexts, they can be repurposed as orthogonal protein connectors. This potential has been quantitatively demonstrated through fusion to split GFP variants, demonstrating that OIMs can function as modular tools for general enzymatic assembly.54 Notably, the utility of OIM-mediated metabolic engineering was demonstrated in the synthetic tri-modular reconstitution of the maytansinol biosynthetic pathway, where OIM engraftment enhanced the interaction between constituent enzymes.55 In a similar note, when model OIM pairs from different evolutionary lineages were grafted into the astaxanthin biosynthetic pathway to facilitate a tri-modular enzyme assembly in Escherichia coli, a 2.4-fold increase in astaxanthin production was observed.56 This result demonstrated that OIMs from distinct structural families-typically orthogonal-can be implemented simultaneously within a single metabolic pathway, supporting their potential use in assembling longer and more complex biosynthetic pathways, demonstrating the transferability of these systems. These findings highlight the exciting potential that OIMs bring to the field of protein assembly. Given the current scarcity of reliable tools for programmable enzymatic organization, tapping into the vast natural diversity of DDs presents a promising solution.56 As a point of reference, the type I cis-AT PKS family alone is predicted to encompass over 1600 distinct clusters, representing a rich and largely untapped reservoir of candidate interaction domains for synthetic biology applications.17
To develop such a library, a multi-step approach is required. First, OIMs must be identified and selected from natural PKS/NRPS systems as candidates for characterization. Once selected, these domains must be cloned into a variety of biosynthetic contexts whose products have already been characterized. The engineered constructs must then be introduced into a heterologous expression host to evaluate whether the presence of the OIMs restore or enable production of the desired compound. Functional parameters derived from these experiments can be used to quantitatively characterize their utility as bioparts. These data should be systematically catalogued in a centralized, open-access database to facilitate reuse and interoperability. Furthermore, the sequences of validated DDs can be rationally modified through techniques such as site-directed mutagenesis to generate a library of OIM bioparts with varying interaction strengths. This would enable a synthetic biologist to design more complex circuits at the post-translational level by fine-tuning and orchestrating protein–protein interactions. Continuous expansion of the library is essential through collaboration and crowdsourcing data, ensuring it evolves with new innovations and applications in synthetic biology. Together, this comprehensive process will create a robust, versatile collection of DDs that can be leveraged to design and optimize efficient, modular biosynthetic pathways for diverse applications in synthetic biology.
The feasibility of SYNZIPs has been demonstrated in various systems, including the reconstitution of DEBS-derived bimodular PKSs and the modularization of fungal NRPSs (Fig. 4b).56 For instance, SYNZIP-mediated interactions successfully circumvent constraints imposed by KS domains in PKSs, enabling the assembly of non-native modules.62 Also, coiled-coils have proven instrumental in DNA-templated enzyme systems, where they synergize with other domains, such as zinc fingers, to position enzymes strategically along DNA templates, enhancing spatially confined catalysis and biosynthetic efficiency.63 Furthermore, coiled-coil-mediated assembly has been leveraged to create hybrid biosynthetic pathways, enabling the production of novel bioactive compounds with enhanced structural diversity.64,65 By integrating coiled-coil-based enzyme engineering with combinatorial biosynthesis approaches, researchers aim to expand the chemical space of natural product derivatives, potentially leading to the discovery of new pharmaceuticals.
Comparative studies of SYNZIPs and SpyTag/SpyCatcher systems highlight complementary strengths but also reveal common engineering challenges that must be addressed for optimal functionality. Both systems can impose structural rigidity on the engineered interface, potentially hindering the enzymatic activity by restricting conformational dynamics essential for substrate channelling and catalytic turnover.76 For SYNZIPs, excessive α-helical length or tightly packed structures can restrict conformational flexibility. To address this, flexible glycine–serine linkers have been introduced between the SYNZIP and enzymatic domains, restoring the required mobility. Additionally, truncation of SYNZIP sequences, particularly SZ1/SZ2 pair, has been shown to improve production yield, in some cases by over 50-fold, by reducing steric hinderance while preserving binding affinity.77 Similarly, while SpyTag/Spy-Catcher offers the advantage of irreversible covalent bonding, this covalency may introduce rigidity, limiting the structural plasticity required during enzymatic turnover. To overcome this, hinge-like sequences have been incorporated adjacent to ligation points to mimic the flexibility of natural linker regions.77 Thus, regardless of the binding modality, non-covalent or covalent, interface flexibility emerges as a critical determinant of successful modular reconstitution. Otherwise, SYNZIPs and SpyTag/SpyCatcher systems show complementary features. SYNZIPs offer orthogonal and reversible interactions ideal for contexts requiring conditional control. In contrast, SpyTag/SpyCatcher excels in permanent stabilization of multi-enzyme complexes, ensuring robustness in dynamic or heterologous environments. Strategic selection or combinatorial use of these systems, including with DDs or COM domains, opens new avenues for hierarchical design of reconfigurable biosynthetic pathways.
Future directions include AI-assisted design of xenologous interaction modules with enhanced context-specific functionality, high-throughput screening of interaction libraries under variable expression conditions, and the development of switchable interface systems responsive to environmental cues. By expanding the synthetic biology toolbox, coiled-coil-based and covalent interaction systems are poised to accelerate the rational engineering of enzyme assemblies for the discovery and production of novel natural products.
Feature | Docking domains | Synthetic coiled-coils | SpyTag/SpyCatcher system | Split inteins |
---|---|---|---|---|
Interaction type | Non-covalent bond71 | Non-covalent bond23 | Covalent bond66 | Protein splicing72 |
Affinity (KD) | 1–130 μM21,43,73 | <10 nM (validated pairs)59 | Irreversible74 | Irreversible72 |
Orthogonality | Low21 | High59 | High70 | High75 |
Structural flexibility | High22 | Limited76 | Limited76 | Moderate75 |
One application lies in substrate specificity characterization (Fig. 4c). In modular systems, the identity of extender unit incorporated by an extender module remains largely unknown, posing a bottleneck to module selection and engineering. By linking a validated loading module to an uncharacterized extender module using stable and orthogonal interfaces, the resulting product can reveal substrate preference. Using non-cognate or poorly matched interaction domains in this context often leads to instability in module assembly or significantly reduced product titers.35 Thus, employing orthogonal and validated linkers is critical for accurate characterization.
Module compatibility assessment is another key step in combinatorial biosynthesis (Fig. 4c). Even with known substrate specificities, modules originating from different biosynthetic contexts often display suboptimal interaction due to incompatible domain–domain interfaces. A well-documented challenge is the inability of non-cognate KS domains to accept and extend the intermediate from upstream modules, which frequently leads to failed biosynthetic outputs.44,62 To address this, compatibility can be improved through rational engineering of domain boundaries, such as replacing KS domains in acceptor modules to match the specificity of upstream partners. In one example, cassette replacement of a poorly active KS domain with a compatible one restored activity in previously inactive PKS modules, providing a basis for compatibility mapping in combinatorial assembly.62 This strategy facilitates the rational design of synthetic pathways by delineating functional interface rules.
Beyond characterization, synthetic interfaces can be employed in retrobiosynthetic pathway construction and scaffold derivatization (Fig. 4c). Once compatibility constraints are clarified, synthetic pathways can be built by recombining modules to match retrobiosynthetic logic derived from a desired molecule. In this context, orthogonal synthetic interfaces become indispensable, as multiple module boundaries must be reliably bridged within a single construct. Furthermore, modular constructs can be iteratively refined to introduce structural diversity through domain swaps, enabling derivatization of bioactive scaffolds.86 This combinatorial approach expands the accessible chemical space beyond natural diversity and accelerates analog discovery.
Finally, synthetic interfaces are not limited to PKS/NRPS systems (Fig. 4c). Their applicability extends to general metabolic pathway engineering, where they enable the spatial organization of unstructured enzyme cascades. For example, the mPKSeal strategy, which utilizes DDs derived from type I cis-AT PKSs, has been applied to assemble heterologous metabolite enzymes in E. coli for astaxanthin production to enable physical clustering of otherwise dispersed metabolic enzymes.56 This multienzyme assembly prompted spatial co-localization and improved overall product yields. As discussed in the previous section, other synthetic interfaces such as SpyTag/SpyCatcher and split inteins have also been applied to spatially organize enzymes, offering enhanced enzyme-to-enzyme transfer and conditional pathway activation.69,81 This has been demonstrated in engineered metabolic pathways where synthetic scaffolds or tag-mediated tethering improved pathway yield and stability, even outside of PKS/NRPS contexts.
While synthetic interfaces offer new routes for reconstituting modular enzyme assemblies, several studies have shown that proofreading functions embedded in catalytic domains play a decisive role in determining the success of engineered pathways. In PKSs, KS domains selectively accept properly processed intermediates and reject non-cognate or misprocessed substrates, effectively acting as gatekeepers.6,39 Similarly, in NRPSs, C domains enforce acceptor substrate specificity at the elongation step, often preventing chain extension when paired with incompatible upstream adenylation domains.87,88
These fidelity mechanisms safeguard the accuracy of native biosynthesis but also impose significant constraints on modular recombination strategies. Misprimed intermediates can stall the assembly line, and improperly matched domains may result in inactive constructs or undesired shunt products. For example, engineered stambomycin PKS systems with redesigned docking interfaces exhibited reduced yields due to KS-domain substrate rejection and premature intermediate offloading by TE domains, despite successful module fusion. To address these issues, auxiliary enzymes, such as type II TEs in NRPS systems and trans-acting acyl hydrolases in PKSs, have been employed to remove aberrant intermediates and restore biosynthetic flow.89,90
Additional strategies focus on co-designing module boundaries and domain compatibility. Domain recombination at structurally conserved splice sites, along with orthogonal assembly platforms such as SYNZIP- or intein-mediated systems, have shown promise in mitigating gatekeeping effects.91 Together, these findings underscore the importance of integrating both structural interface design and substrate-specific proofreading considerations when engineering modular biosynthetic pathways. As these biosynthetic systems continue to evolve through expanded orthogonal libraries, tunable binding affinities, and AI-assisted linker optimization, they are expected to play an increasingly integral role to the rational design of modular biosynthesis.
To date, several computational platforms have been developed for PKS/NRPS systems. These platforms address the inherent complexity of megasynth (et)ases, which consist of multiple interacting domains, by computationally deconstructing target molecules into biosynthetically plausible modules.86,97 The recent integration of structural modeling tools, including AlphaFold2 and ColabFold, further enhances intra-module interface prediction and module compatibility.98,99 Emerging resources such as generating retrobiosynthetic analysis for polyketides and nonribosomal peptides (GRAPE) and gene and reaction linker for informed clusters (GARLIC) implement retrobiosynthetic logic to map target metabolites to gene clusters.86 Also, the platforms like ClusterCAD provide user-friendly databases and design algorithms for modular PKS/NRPS engineering.97 These innovations address the limitations of structure-guided rational approaches, enabling more comprehensive reconstruction of modular biosynthetic pathways, accelerating the creation of novel secondary metabolites via synthetic biology.
Whereas GRAPE/GARLIC primarily focused on identifying putative BGCs capable of producing a given PK or NRP structure, the development of ClusterCAD marked a shift toward designing new biosynthetic pathways for de novo compound production.97 The initial release, ClusterCAD 1.0, was tailored specifically for type I modular PKSs and enabled rational design of chimeric PKSs by recombining modules from different origins. Given a target polyketide structure, the toolkit identifies a biosynthetically related intermediate from known PKS modules, which serve as a ‘truncated’ starter PKS for engineering. From this point, ClusterCAD suggests which modules must be replaced to achieve the production of the desired structure and recommends ‘donor modules’ based on sequence similarity to the starter PKS. Recent advancements in ClusterCAD 2.0 have further expanded its capabilities to include PKS-NRPS hybrids and NRPS systems, as well as a broader range of starter units.100 Using ClusterCAD, researchers successfully constructed a variety of chimeric PKSs based on the Rimocidin biosynthetic system, leading to the production of structurally diverse diols and alcohols.101 These examples highlight the growing utility of PKS-based retrobiosynthetic approaches in enabling the rational development of innovative biomaterials.
A notable addition is the BioPKS pipeline, a newly released framework that enables rational assembly of PKS modules with monofunctional enzymes, facilitating the production of structurally novel polyketides from biosynthetic gene clusters (Fig. 5a). Owing to this pipeline, from simple polyketides (only by chimeric PKSs) to complicated antibiotics (by both chimeric PKSs and post-PKS enzymes), their retrobiosynthetic pathways are successfully predicted (Fig. 5c).102 Synthetic interfaces can be applied to efficiently construct multi-protein chimeric enzymes, and also to co-localize post-PKS tailoring enzymes for productivity enhancement (Fig. 5b).103
By leveraging rBAN to deduce monomeric structures and using antiSMASH to retrieve NRPS BGC information, subsequent tools such as Nerpa have enabled the linking of target NRP structures with their corresponding biosynthetic gene clusters. Nerpa demonstrated successful structure-to-gene alignment for 117 BGCs across bacterial genomes.105 Building upon this framework, BioCAT was introduced with enhanced alignment sensitivity and broader coverage, surpassing both GARLIC and Nerpa in its ability to associate NRPs with their likely producer organisms. BioCAT's primary aim is to maximize the number of accurate NRP-BGC pairings, prioritizing sensitivity over specificity. Importantly, BioCAT accounts for non-classical biosynthetic logic, including type B and type C NRPS pathways, thus offering greater flexibility as a retrobiosynthetic prediction toolkit.33 Despite the availability of these powerful in silico platforms, there has yet to be a reported example where these tools have been applied to guide the combinatorial engineering of NRPS systems for the development of novel bioactive compounds.
Given the modular architecture of PKS/NRPS systems and the growing availability of protein engineering strategies, such as the use of DDs and synthetic coiled-coil motifs, these retrobiosynthetic toolkits are expected to play a critical role in accelerating the rational design of synthetic PKS/NRPSs. Their integration into synthetic biology workflows holds strong potential for the discovery and development of structurally novel and functionally diverse bioactive molecules.
To access a broader chemical space via PKS/NRPS pathways, future strategies must integrate megasynth(et)ases-based design tools with existing monofunctional enzyme-centric retrosynthesis platforms. Such integration would enable the generation of diverse PK- and NRP-derivatives which transcend the limitations inherent to module- or domain-only recombination strategies. Moreover, current design tools are often restricted by narrow training datasets, limiting their generalizability. Open-source platforms capable of incorporating both public and user-defined datasets will be critical to improving predictive performance and facilitating broader application.
However, to fully realize PKS/NRPS synthetic biology, understanding the natural constraints governing module–module interactions is essential. These constraints, including structural compatibility of interaction domains, their functional performance across different biosynthetic contexts, and compatibility between inter-modular substrate transfer, serve as foundational principles guiding engineering strategies. A particularly critical constraint lies in the intrinsic proofreading mechanisms embedded within gatekeeper domains.6,39 While these fidelity filters ensure biosynthetic precision in native systems, they often reject non-cognate intermediates during engineered recombination, limiting the flexibility of modular design. Current limitations are evident from the approximately 50% success rate observed when bimodular biosynthesis was attempted without prior design considerations.107 Before achieving true ‘plug-and-play’ recombination of PKS/NRPS modules to generate designer molecules, both the functional parameters of each biopart and the constraints governing molecule compatibility must be systematically quantified and clearly defined.
The most straightforward approach to addressing this challenge lies in nature itself. Recombining naturally occurring modules and interaction domains in a combinatorial fashion enables systematic quantification of their interactions and elucidation of design principles governing successful module–module communication. This approach requires high-throughput experimentation to capture the diversity of possible combinations. The ClusterCAD database exemplifies this potential. This repository of PKS/NRPS clusters with well-characterized module intermediates contains, for type I PKS alone, 531 loading modules (LM), 2515 elongation modules (EM), and 208 termination modules (END) with TE domains. When incorporating just five distinct synthetic interfaces at each junction of chimeric (LM)–(EM1)–(EM2)–-(END) configuration, the number of possible combinations approaches ∼1013.100 This design space expands further when including putative type I PKSs from additional databases, which collectively list over 1600 additional entries.17 While these numbers exceed practical experimental scale, they demonstrate the vast design space available for modular PKS/NRPS synthetic biology.
Exploring even a fraction of this immense combinatorial space requires experimental capabilities far beyond conventional laboratory methods. Biofoundry provides the necessary infrastructure, offering high-throughput automation and standardized workflows for efficient execution, tracking, and analysis of large-scale combinatorial designs (Fig. 1). These facilities enable massively parallel experiments by integrating design, build, and test workflows through automated protocols. Each workflow comprises multiple unit operations-from PCR and transformation to colony selection-executed by specialized equipment including liquid handlers and automated thermocyclers.107 The build phase particularly benefits from automated DNA assembly workflows incorporating PCR, purification, ligation and transformation-operations requiring precise coordination for reproducible outcomes at scale. International initiatives like the Global Biofoundry Alliance demonstrate the advanced capabilities of these platforms.108 The Edinburgh Genome Foundry, for example, performs up to 2000 DNA assembly reactions weekly using methods such as Golden Gate and Gibson assembly, while iBioFAB produces 1000 TALEN constructs daily at less than $3 per construct.109,110 These examples highlight the feasibility of implementing automated biofoundry platforms for the systematic exploration of modular assembly strategies at a scale necessary for comprehensive PKS/NRPS engineering.
Early progress in this direction is already evident. Researchers have demonstrated combinatorial PKS library construction involving 120 plasmids ranging from 7 to 14 kb in size, assembled from 4 to 7 DNA fragments.111 However, published experimental data also reveals a significant bottleneck in applying in vivo molecular cloning approaches to PKS/NRPS modular engineering at biofoundry scale. In one study, researchers generated 882 plasmid requiring amplification of 502 unique DNA fragments through 706 PCR reactions, each incorporating 60-bp homologous overlaps for yeast-based recombination. Of these, only 623 fragments were successfully confirmed by capillary electrophoresis, enabling theoretical assembly of 715 out of the 882 plasmids. Following yeast transformation and survival screening, the final recovery rate of mutation-free plasmid constructs was approximately 14% of total colonies, highlighting limitations in throughput and fidelity.
Cell-free protein synthesis (CFPS) integrated with automated biofoundry workflows offers a promising solution to these challenges (Fig. 1). Recent studies have shown that CFPS can achieve protein expression levels comparable to in vivo systems when using the same DNA constructs.112–114 By employing PCR-amplified linear templates, CFPS circumvents the need for yeast-based cloning, transformation, colony picking, and full validation of coding sequences, thereby accelerating the build phase and reducing resource intensity. Additionally, CFPS-enabled screening allows for rapid functional assessment of engineered constructs. Emerging sequential-phase CFPS systems further enhance capabilities by enabling protein expression followed by precursor-to-product conversion in single-well reactions, effectively creating miniaturized biocatalytic testbeds for megasynth(et)ase functionality.
The integration of biofoundry platforms represents a crucial advancement in PKS/NRPS engineering, enabling experimental throughput at unprecedented scale. Building upon this foundation, computational approaches can further amplify biofoundry capabilities by guiding experimental design and interpreting results. AI-driven methods work synergistically with biofoundry infrastructure, enhancing each phase of the DBTL cycle and maximizing the value of high-throughput experimental data.
To support modular enzyme engineering in natural product biosynthesis, AI-driven structural prediction tools have emerged as valuable complements to automated biofoundry platforms. Among them, AlphaFold offers high-resolution structural predictions of domain–domain interfaces, facilitating the rational design of synthetic connections.115 While primarily developed for monomeric protein structure prediction, AlphaFold's residue-level resolution can be repurposed to inform linker placement and suggest plausible domain boundaries for modular recombination.115,116 In particular, predicted folding patterns and surface accessibility maps can help identify plausible domain boundaries that may minimize structural disruption during module fusion. However, the utility of such predictions in complex, multi-domain systems like PKS or NRPS remains largely unvalidated and demands empirical verification.
RFdiffusion contributes scaffold-generation capabilities that extend beyond structure prediction to design novel interface architectures, including specialized coiled-coil or plug-and-socket configurations.117–120 This approach enables synthetic interface scaffold recommendation-predicting which interface type will best accommodate specific domain pairs while maintaining proper orientation and communication. RFdiffusion expands the design space by generating novel interface scaffolds fit to the geometric constraints of specific domain pairs.
ProteinMPNN complements structure-based models by optimizing amino acid sequences at domain junctions, supporting folding and expression compatibility.121 This is particularly useful in modular engineering scenarios where non-cognate domains are recombined, and local sequence adjustments are needed to maintain interface integrity. While MPNN does not guarantee functional restoration for domains with altered substrate preferences, it assists in refining sequence contexts to promote structural coherence across engineered boundaries.
As illustrated in Fig. 1, these computational approaches specifically facilitate critical tasks such as AI-guided linker selection and precise insertion site prediction, further enhancing modular engineering efficiency. Advanced AI frameworks combining graph neural networks (GNNs) and large language models (LLMs) offer promising approaches for optimizing modular biosynthetic pathways. Looking ahead, next-generation AI models-such as those based on GNNs or LLMs-may offer new paradigms for pathway design by interpreting target chemical structures and predicting optimal module arrangements.122 GNNs enable modeling of the complex relationships between PKS/NRPS domains as interconnected nodes, allowing critical interfaces such as ACP-KS connections to be evaluated for structural compatibility and assigned meaningful interaction scores. These graph-based representations enhance predictive accuracy by capturing the inherent modularity of biosynthetic pathways, enabling systematic assessment of potential domain arrangements and their likely functional outcomes. Complementing this structural analysis, LLMs, leveraging biochemical data, may assist in translating target product specifications into suitable modular enzyme arrangements. These predictive systems have the potential to interpret target chemical structures (e.g., from SMILES), automatically identifying thioesterization sites, distinguishing initial starter units from subsequently added extender units, and systematically recognizing required modifications and their corresponding enzymatic domains.
The implementation of such frameworks follows an iterative learning cycle. Initial linker designs predicted by AI ensembles are experimentally validated, with performance data feeding back into model training pipelines. This recursive approach creates self-improving systems where each design cycle enhances model accuracy. Crucially, as experimental databases expand through biofoundry-scale testing in the build and test cycle, AI models transition from few-shot learning scenarios to increasingly sophisticated prediction algorithms capable of addressing complex domain interface problems in PKS/NRPS engineering efforts.
The integration of AI-driven design with biofoundry implementation provides new opportunities in PKS/NRPS engineering, enhancing our ability to analyze domain interactions. These computational approaches generate interfaces with diverse interaction properties, allowing researchers to select connections with precisely calibrated strengths for specific module pairs. While natural systems may exhibit complex interdependencies between adjacent domains, computational frameworks can provide insights into designing interfaces that function more independently, simplifying the engineering process. As experimental data accumulates, AI systems evolve from data-limited models to sophisticated predictive tools leveraging thousands of experimental outcomes, progressively enhancing our understanding of these complex biosynthetic interfaces and accelerating the discovery and research of natural products.
Footnote |
† These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2025 |