Jesse A.
Jones
a,
Robert
Benisch
b and
Tobias W.
Giessen
*ab
aDepartment of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, USA. E-mail: tgiessen@umich.edu
bProgram in Chemical Biology, University of Michigan, Ann Arbor, MI, USA
First published on 2nd May 2023
Encapsulins are a recently discovered class of prokaryotic self-assembling icosahedral protein nanocompartments measuring between 24 and 42 nm in diameter, capable of selectively encapsulating dedicated cargo proteins in vivo. They have been classified into four families based on sequence identity and operon structure, and thousands of encapsulin systems have recently been computationally identified across a wide range of bacterial and archaeal phyla. Cargo encapsulation is mediated by the presence of specific targeting motifs found in all native cargo proteins that interact with the interior surface of the encapsulin shell during self-assembly. Short C-terminal targeting peptides (TPs) are well documented in Family 1 encapsulins, while more recently, larger N-terminal targeting domains (TDs) have been discovered in Family 2. The modular nature of TPs and their facile genetic fusion to non-native cargo proteins of interest has made cargo encapsulation, both in vivo and in vitro, readily exploitable and has therefore resulted in a range of rationally engineered nano-compartmentalization systems. This review summarizes current knowledge on cargo protein encapsulation within encapsulins and highlights select studies that utilize TP fusions to non-native cargo in creative and useful ways.
Examples of prokaryotic protein cages include the small 8–12 nm ferritins, hollow shells comprised of identical protein subunits with the shell serving as a diffusion barrier and ferroxidase at the same time, allowing effective iron storage within, without the need to sequester proteinaceous cargo.8,9 In comparison, the much larger 40–200 nm bacterial microcompartments (BMCs) consist of multi-component cages sequestering multiple enzymes that act together to yield a complex metabolic organelle-like compartment.4,10 The more recently discovered encapsulin nanocompartments occupy the space between these two examples with respect to size, complexity, and ability to encapsulate cargo proteins, and will be the focus of this review.11–13
Encapsulins are icosahedral protein nanocages that range from 24 to 42 nm in size with varying triangulation numbers (T1, T3, or T4) formed via self-assembly of 60–240 subunits of the same shell protein exhibiting the HK97 (Hong Kong 97) phage-like fold (Fig. 1A–C).13–15 Notably, the eponymous feature of encapsulins is their ability to encapsulate specific cargo proteins during shell self-assembly using selective cargo loading mechanisms based on targeting domains (TDs) or targeting peptides (TPs) present at the N- or C-terminus of each cargo protein. This native, efficient, and modular cargo loading modality makes encapsulins excellent protein cargo carriers with potential broad applications as targeted drug delivery vehicles, vaccine platforms, and bionanoreactors, among others.14,16–22 Recent genome datamining studies have led to the grouping of encapsulins into four separate families that vary in sequence, operon configuration, overall structure, and encapsulation mechanism.23–25 Family 1 encapsulins are the most extensively studied, with experimental information available for multiple systems based on their shell structures, associated cargo function, and respective cargo loading process. Similar studies pertaining to Family 2 encapsulins have recently begun to emerge, though these studies remain nascent in comparison to the data available for Family 1 encapsulins. Family 3 and Family 4 encapsulins remain putative and currently lack experimental validation.
Fig. 1 Overview of encapsulin nanocompartment structure and assembly. (A) An encapsulin shell monomer from the Thermotoga maritima encapsulin system in ribbon representation (purple; PDB: 3DKT). Left: exterior view. Right: interior view (180° rotated). The interior binding site of the Family 1 encapsulin targeting peptide outlined in light blue. (B) Exterior views of the T = 1 encapsulin from T. maritima (left; PDB: 3DKT), the T = 3 encapsulin from Myxococcus xanthus (center; PDB: 4PT2), and the T = 4 encapsulin from Quasibacillus thermotolerans (right; PDB: 6NJ8) highlighting the different sizes and assembly states of encapsulins. The number of pentameric and hexameric facets that make up the shell are shown at the bottom. (C) Schematic of the Q. thermotolerans encapsulin with a T = 4 icosahedral cage overlay highlighting the respective five-fold (left), three-fold (center), and two-fold (right) symmetry axes and pores, with respective magnified views below. (D) Schematic diagram of Family 1 and Family 2 core operon layouts (top) featuring the cargo (pink), respective targeting moieties (turquoise), and encapsulin shell (purple); note Family 1 and 2 cargo genes are found up- and downstream of the encapsulin gene, respectively. For simplicity, only the upstream operon organization is shown. Figures created using ChimeraX (Goddard et al., 2018). PDB, protein data bank; TD, Family 2 targeting domain; TP, Family 1 targeting peptide. |
As experimental details only exist for Family 1 and Family 2 encapsulins, this review will focus on the current understanding and use of encapsulin cargo loading of these two families. The native in vivo mechanisms involved in cargo loading as well as efforts undertaken to date to manipulate those mechanisms will be discussed. This review will also detail the practical application of encapsulin cargo loading as it pertains to recent bioengineering efforts as well as recent studies dissecting the mechanism of cargo encapsulation. Lastly, this review will discuss potential future challenges and directions, including prospective studies that may help further elucidate or manipulate encapsulin cargo loading, as well as the future potential that such rational manipulation of encapsulin cargo loading may hold for biocatalysis, biomedicine, and biomaterials research.
Family 1 encapsulins are mainly found in the bacterial phyla Actinobacteria, Proteobacteria, and Firmicutes, and are sorted into several operon types according to their native enzyme cargo.23 The operon arrangement for Family 1 encapsulins generally follows a layout comprised of an upstream gene encoding for the respective cargo protein followed by the gene encoding for the encapsulin shell protein, with or without flanking co-regulated accessory proteins (Fig. 1D).23,32 In order for cargo encapsulation to occur, a targeting peptide (TP), sometimes also referred to as a cargo-loading peptide (CLP), is strictly necessary and is found at the C-terminus of all cargo proteins (Fig. 1D). TPs are usually separated from the catalytically active folded domain of the cargo by a flexible linker with high glycine and proline content of ca. 10–50 residues in length (Fig. 2A). This arrangement likely minimizes steric clashes between adjacent cargo proteins within the shell, thus maximizing cargo loading capacity. A corollary of this feature is that cargo proteins are generally poorly resolved in encapsulin structures due to their high mobility caused by being flexibly tethered to the shell interior. Cargo protein loading is not necessary for shell assembly. Encapsulin shells generally self-assemble very efficiently even in the absence of any cargo. This implies a cargo loading mechanism where co-expression of cargo and shell – as insured by a tight operon structure – allows efficient TP–shell interactions during shell self-assembly.
Fig. 2 Family 1 cargo loading is mediated by specific TP–shell interactions. (A) Schematic representation of Family 1 cargo components, including the catalytic domain (pink), proline- and glycine-rich flexible linker (dash), and targeting peptide (turquoise). (B) Cutaway view of the T. maritima T1 encapsulin shell (PDB: 3DKT) with one shell protein subunit highlighted (yellow) and the encapsulin (purple) and GGDLGIRK TP of the FLP cargo (turquoise) shown in surface representation (left) and zoomed-in view of the conserved binding pocket (hydrophobic representation) with the resolved residues of the bound TP shown in stick representation (turquoise; right). (C) Zoomed-in view of the H. ochraceum T1 encapsulin TP–shell interaction (PDB: 7OE2) highlighting the binding pocket and the GSLGIGSLR TP sequence of the FLP cargo as found in the closed (left) and open (right) pentamer conformations of the shell. (D) Cutaway view of the M. xanthus T3 encapsulin shell (PDB: 7S2T) with one shell protein subunit highlighted (yellow) and the SHPLTVGSLRR TP (turquoise) of the EncB FLP cargo shown in surface representation (left). A zoomed-in view of the TP–shell interaction is shown on the right. (E) Similar overview of the M. xanthus T3 shell interaction with the PEKRLTVGSLRR TP of the EncC FLP cargo (PDB: 7S4Q). (F) Analogous overview of the Q. thermotolerans T4 shell interaction with the TVGSLIQ TP of the IMEF cargo (PDB: 6NJ8). (G) Consensus sequences for TPs from each of the major Family 1 cargo classes after alignment via Clustal Omega 1.2.3 with 20 residues centred on the consensus peak or, when limited by sequence length, using the last 20 C-terminal residues; visualized using GraphPad Prism v9.0.2; n, number of cargo sequences used. (H) Schematic of general binding mode for Family 1 TPs. Figures created using ChimeraX (Goddard et al., 2018). TP, targeting peptide; PDB, protein data bank; FLP, ferritin-like protein; IMEF, iron-mineralizing encapsulin-associated firmicute. |
Several examples now exist in the literature providing reliable structural data illustrating TP–shell interactions (Fig. 2). Of these examples, the TPs of two ferritin-like protein (Flp) cargos bound to the interior surface of their respective T1 shells have been resolved – GGDLGIRK in the T. maritima system (Fig. 2B), and GSLGIGSLR in the Haliangium ochraceum system determined in both the “closed” and “open” pentameric conformations based on a shift in the encapsulin A-domain (Fig. 2C).13,33 Further, two Flp TP–shell interactions were resolved for the T3 Myxococcus xanthus system showing the TPs to be SHPLTVGSLRR for the EncB cargo (Fig. 2D) and PEKRLTVGSLRR for the EncC cargo, both found to bind to all available binding sites in pentameric and hexameric shell facets (Fig. 2E).34 Lastly, the TP of an iron-mineralizing encapsulin-associated firmicute (IMEF) cargo protein bound to the hexamers of its native T4 shell from Quasibacillus thermotolerans was determined to be TVGSLIQ (Fig. 2F).14 Based on the cumulative structural data, the TP binding site has been determined to reside on the luminal surface of each Family 1 encapsulin shell protein subunit in a conserved cleft between the N-terminal helix and the P-domain (Fig. 1A and 2). TP lengths range from 7 to 12 residues that rigidly interact with the binding site. In many cargo proteins, additional C-terminal residues, usually less than 10, can be found after the rigidly interacting binding motif. However, they do not seem to be important for TP–shell interaction based on their absence in the structural data available at this time, though further research is warranted. Because the TP binding pocket completely resides within a single shell protein subunit and does not cross subunit boundaries, the maximal cargo loading is set by the total number of shell protein subunits – 60 for T1, 180 for T3, and 240 for T4 shells. However, cargo loading capacity is likely further determined by cargo protein size and oligomerization state. Bioinformatic analyses have provided further evidence that Family 1 TPs are often comprised of 10–20 C-terminal residues containing GSL or double GSL motifs – with exceptions as exemplified by the T. maritima system – often with an immediately subsequent positively charged residue.23,35 Based on structural information and the consensus TP sequences of the main cargo types (Fig. 2G), a general TP binding mode can be derived where two or three hydrophobic residues (isoleucine, leucine, or valine) spaced one or two residues apart – the spacers often containing glycines for flexibility – specifically interact with hydrophobic patches within the binding pocket. In many instances, positively charged residues (lysines or arginines) follow this motif and seem to interact less specifically with negatively charged surface patches of the shell protein (Fig. 2H).
In sum, cargo loading in Family 1 encapsulin systems results from a combination of mass action based on the relative expression levels of cargo and shell proteins, and specific TP-mediated protein–protein interactions with the final number of encapsulated cargo proteins being additionally determined by the relative size of the shell and cargo as well as the cargo oligomerization state.
Fig. 3 Family 2 encapsulin systems utilize N-terminal targeting domains (TDs) to direct cargo to the interior of the shell. (A) Intrinsic disorder statistics plots generated using DISOPRED3 for four different Family 2 cargo types. Light blue background highlights the disordered regions while positions with relatively high sequence similarity, potentially representing conserved interaction motifs, are shown in yellow. Adapted with changes with open access permission from ref. 23via a creative common license (https://creativecommons.org/licenses/by/4.0/). (B) SDS-PAGE gel of gel-filtration chromatography fractions containing the S. elongatus T1 encapsulin refolded in the presence of desulfurase cargo with and without its native TD. Adapted with changes with open access permission from ref. 36via a creative common license. (C) Native PAGE gel showing Coomassie stain (top) and GFP signal (bottom) of purified S. elongatus encapsulin loaded with GFP reporter fused to different truncations of the N-terminal native TD of the system. Adapted with changes with open access permission from ref. 36via a creative common license. (D) View from the shell interior along the 3-fold symmetry axis (black triangle) of the S. elongatus Family 2 encapsulin (pinks and purple) highlighting additional non-shell density attributed to the native TD (turquoise). Adapted with changes with open access permission from ref. 36via a creative common license. (E) Sequence logos of conserved motifs found within different Family 2 cargo types. Adapted with changes with open access permission from ref. 23via a creative common license. DISOPRED3 outputs were visualized using GraphPad Prism v9.0.2. |
So far, little experimental evidence for Family 2 encapsulin systems has been published. However, the N-terminal targeting domain hypothesis has recently been confirmed for one cysteine desulfurase-encapsulating Family 2A system found in Synechococcus elongatus.36 Using in vitro assays, it was shown that the N-terminal 255 residue long domain found in the desulfurase cargo is necessary and sufficient for cargo encapsulation (Fig. 3B and C). Furthermore, structural analysis of the cargo-loaded shell highlighted a resolvable, low-resolution density close to the 3-fold symmetry axis of the shell, suggesting a potential binding region for TDs on the shell interior (Fig. 3D). One caveat of this analysis is the fact that cargo-loading was carried out in vitro using a protein refolding procedure which could have resulted in a non-native mode of cargo encapsulation. Computational analysis of the N-terminal TD within the desulfurase cargo identified 20–30 residue long conserved motifs of high sequence identity (Fig. 3E), separated by long stretches of divergent, mostly hydrophobic residues.23,36 To explore the contributions of each of the conserved motifs to cargo loading, different parts of the TD, containing different combinations of motifs, were N-terminally fused to a fluorescent reporter (GFP) followed by co-expression and purification. The results did not clearly identify a single motif or sub-region sufficient for maximal cargo loading. Instead, it seems that the full-length TD is needed to mediate optimal cargo encapsulation (Fig. 3C). This may have important mechanistic implications for the Family 2 cargo loading process which seems to be quite different compared to Family 1, relying on potentially multiple specific discontiguous interactions based on conserved sequence motifs separated by long flexible and hydrophobic linker regions which themselves might possess affinity for the interior of the encapsulin shell.
For other Family 2 cargo types besides desulfurases, recent bioinformatic analyses have shown similar motif-containing N-terminal domains annotated as mostly disordered.23 So far, one other putative cargo type, a 2-methylisoborneol (2-MIB) synthase with a similarly long, disordered N-terminal domain has been confirmed as a Family 2 cargo protein.38 However, no detailed structural or mechanistic analysis of this system is currently available in the literature.36,38
Of additional note regarding Family 2 encapsulins is the putative existence of two-component shells, so far only bioinformatically predicted, where the Family 2 operon encodes two distinct encapsulin shell genes.23 As experimental data for these systems is currently lacking, it is not yet known how these encapsulins might assemble. However, if the gene products do assemble into functional encapsulin shells with two different types of subunits, there is a possibility that these systems can natively encapsulate defined stoichiometries of two distinct cargo proteins into the same two-component nanocompartment based on specific interactions of two distinct TDs with two distinct shell protein binding sites. If confirmed, such systems may hold significant potential for novel more complex bioengineering applications, beyond what is currently possible with engineered Family 1 systems.
Potential benefits of encapsulating non-native cargo proteins are abundant and include improving the stability of cargo proteins under harsh conditions like elevated temperature, extreme pH, or exposure to proteases; controlling or improving catalysis; delivering a therapeutic or diagnostic payload; or a combination of the above. Below, we will first highlight efforts towards engineering TPs and modulating their targeting strength, followed by a discussion of select recent studies showcasing the progress made in employing encapsulins as bioengineering tools. Particular emphasis is placed on examples that improve cargo stability, add control over chemical reactions, or show therapeutic or diagnostic application potential (Table 1).
Achievement | Ref. |
---|---|
Use of targeting peptides to encapsulate non-native cargo | 19,20,22,35,36,41,48,51–63 |
Improved cargo stability | 57–59 |
Control of chemical reactions | 22,53,59,60,62,64 |
Therapeutic or diagnostic development | 19,20,53,58,60,65 |
Fig. 4 Characterization of T1 and T3 encapsulin targeting peptides. (A) Operon design of TP-fused sfGFP and the corresponding T. maritima encapsulin used for heterologous co-expression and downstream cargo loading analysis. Different TP truncations are highlighted. (B) Comparison of normalized sfGFP fluorescence in purified encapsulins to investigate the influence of TP truncation on cargo loading highlighting that the 15 C-terminal residues are sufficient for maximal cargo encapsulation. (A) and (B) adapted with permission from ref. 35 Copyright 2016 American Chemical Society (ACS). (C) Schematic of computational flexible docking and experimental workflow used to predict and analyze the relative strength of TP–shell binding in single residue TP mutants. (D) Heat map of computational point mutations with the color gradient representing the Rosetta Energy score (blue, improved binding; red, worse binding). (E) Experimental analysis of cargo loading for the three TP mutants highlighted in green in panel (D) highlighting that most single residue substitutions lead to decreased cargo encapsulation. (C)–(E) adapted with changes with open access permission from ref. 54via a creative common license (https://creativecommons.org/licenses/by/4.0/). sfGFP, super folder green fluorescent protein. |
One study utilized the T1 encapsulin from T. maritima and its native TP fused to a fluorescent reporter (sfGFP) in order to assess the minimal TP needed to attain maximal cargo loading (Fig. 4A and B).35 Different truncations of the C-terminal 30 residues found in the native T. maritima cargo protein were appended to the reporter. Bulk fluorescence after purification was used as a readout of cargo loading extent. Results showed that the 15 C-terminal residues of the native cargo were sufficient for optimal cargo loading. These include the 8 rigidly bound residues that could be structurally resolved (Fig. 2). It is likely that only these residues are needed for binding. In a similar study focused on the T1 encapsulin from Mycobacterium smegmatis, different truncations of the 19 C-terminal residues found in the native cargo were appended to an eGFP reporter.53 The results indicated that the 12 C-terminal residues were required to attain maximal cargo loading levels. Even though no structural information for the M. smegmatis TP–shell interaction is available, the 12 C-terminal residues contain a double GSL motif confirming its importance for cargo encapsulation.
In another recent study, a combined computational and experimental approach was taken to probe the influence of single residue substitutions within the TPs of the T1 T. maritima and T3 M. xanthus encapsulin systems.54 Rosetta-based force-field modelling was employed to predict the influence of mutations within the two TPs (Fig. 4C). Select TP mutants were then experimentally characterized. Computational prediction and experiment were found to generally agree. This approach led to further interesting insights, including the fact that most mutations were computationally predicted to have a negative effect on binding strength, in particular, the mostly conserved GSL-like motifs (Fig. 4D and E). The cumulative results of these studies highlight the fact that TPs vary with respect to native specificity and binding strength, and that TP–shell binding is significantly influenced by specific hydrophobic and ionic interactions, as well as TP flexibility.
Fig. 5 Select engineering applications of encapsulins. (A) Cargo encapsulation within encapsulin protein shells can have a variety of beneficial effects, including increased cargo stability over time, increased thermal stability, and increased resistance against proteases. (B) Encapsulation of a ruthenium-based metal organocatalyst based on a covalently modifiable HaloTag yielding a system able to catalyze de-N-allylation of the shown pro-fluorophore both in vitro and in vivo. Adapted with changes with open access permission from ref. 53via a creative common license (https://creativecommons.org/licenses/by/4.0/). (C) In vivo encapsulation of a light-controllable minimal singlet oxygen generator (mSOG) able to generate large amounts of toxic ROS via singlet oxygen species inside mammalian cancer cells resulting in cell death. Adapted with permission from ref. 58 Copyright 2021 American Chemical Society (ACS). (D) In vitro assembly of gold nanoparticles encapsulated within an encapsulin protein shell using a synthetically modified TP. Adapted from ref. 52 with permission from the Royal Society of Chemistry. Copyright 2018 The Royal Society of Chemistry. |
A recent innovative study involved the use of non-UV light to control the production of toxic reactive oxygen species (ROS) inside cancer cells.58 A TP was genetically fused to a minimal singlet oxygen generator (mSOG), a flavoprotein that produces mainly singlet oxygen ROS upon exposure to blue light. The TP-fused mSOG was then encapsulated inside the T. maritima T1 encapsulin shell via co-expression yielding a nanocage-based platform for the delivery of photodynamic therapeutics (Fig. 5C). Encapsulated mSOG was shown to generate higher ROS levels than either the free mSOG or the encapsulin control due to the encapsulin's additive O2 generative effect when combined with the free mSOG, which in turn is likely due to the observed non-specific adsorption of endogenous flavin molecules such as flavin mononucleotide (FMN), flavin adenine dinucleotide (FAD), and riboflavin to the encapsulin from T. maritima.52,56 Furthermore, the use of encapsulated mSOG led to increased cell death in a lung cancer cell culture model, being attributed to the fact that encapsulated mSOG was shown to be taken up by cells while free mSOG was not internalized, which was consistent with previous reports that show free mSOG to be incapable of penetrating tumor cells.76 This system provides a novel, highly controllable method for the light-triggered generation of toxic ROS without the use of potentially harmful UV light or the need for additional small-molecule substrates. As ROS generation can be used as both a therapeutic modality and effective bioimaging signal, the platform represents a theranostic encapsulin-based delivery system.
This mSOG encapsulin platform was further engineered to display the Designed Ankyrin repeat protein (DARPin), an antibody mimic, on the exterior of the shell through genetic fusion of DARPin to the surface-exposed C-terminus of the encapsulin shell protein.19 This enabled the targeted delivery of mSOG to Human Epidermal growth factor Receptor 2 (HER2)-positive breast cancer cells and the subsequent light-triggered induction of toxic ROS leading to apoptosis. This platform self-assembles in a single step when expressed in Escherichia coli and offers specific targeting, photodynamic therapeutic ROS generation, and potential modular functionalization with different DARPin molecules selected to specifically bind to other targets.
Another innovative example of using encapsulins as reporters or diagnostic platforms was based on the in vivo encapsulation of the enzyme tyrosinase, able to polymerize melanin inside the encapsulin protein shell.60 The sequestered and concentrated melanin could then be used for imaging and tracking purposes due to its strong near-infrared absorption. Further, cells expressing this system did not show any growth defect based on melanin toxicity which is usually observed for non-encapsulated melanin.
Another recent example of employing encapsulins as biocatalytic enzyme nanoreactors is the use of the DyP-peroxidase-loaded M. hassiacum T1 encapsulin together with free eugenol oxidase in an enzyme cascade, yielding lignin-like crosslinked reaction products.57 The main challenge for all protein-based nanoreactors encapsulating non-native enzymes is to overcome the often observed decrease in catalytic activity upon enzyme encapsulation.57 This is generally due to the protein shell not being optimized for the influx of the particular substrates and cofactors needed by a specific non-native sequestered enzyme. Future efforts aimed at pore engineering to improve molecular flux across encapsulin shells will likely be able to address this problem and result in fully catalytically active nanoreactors.22,63,81,82
This journal is © The Royal Society of Chemistry 2023 |