Philip C.
Simister
and
Stephan M.
Feller
Biological Systems Architecture Group, Department of Oncology, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford, OX3 9DS, UK. E-mail: philip.simister@imm.ox.ac.uk; stephan.feller@imm.ox.ac.uk
First published on 20th September 2011
Large multi-site docking (LMD) proteins of the Gab, IRS, FRS, DOK and Cas families consist of one or two folded N-terminal domains, followed by a predominantly disordered C-terminal extension. Their primary function is to provide a docking platform for signalling molecules (including PI3K, PLC, Grb2, Crk, RasGAP, SHP2) in intracellular signal transmission from activated cell-surface receptors, to which they become coupled. A detailed analysis of the structural nature and intrinsic disorder propensity of LMD proteins, with Gab proteins as specific examples, is presented. By primary sequence analysis and literature review the varying levels of disorder and hidden order are predicted, revealing properties and a physical architecture that help to explain their biological function and characteristics, common for network hub proteins. The virulence factor, CagA, from Helicobacter pylori is able to mimic Gab function once injected by this human pathogen into stomach epithelial cells. Its predicted differential structure is compared to Gab1 with respect to its functional mimicry. Lastly, we discuss how LMD proteins, in particular Gab1 and Gab2, and their protein partners, such as SH2 and SH3 domain-containing adaptors like Grb2, might qualify for future anti-cancer strategies in developing protein–protein interaction (PPI) inhibitors towards binary interactors consisting of an intrinsically disordered epitope and a structured domain surface.
![]() Philip C. Simister | Philip obtained his PhD in biochemistry from the University of Bristol, UK in 2004 studying the structural biology of signalling proteins. During a post-doctoral fellowship in France (CNRS, Gif-sur-Yvette) looking at uncharacterised domains of large Arf1 guanine-nucleotide exchange factors he became introduced to intrinsic protein disorder. Switching to molecular virology as a Junior Fellow of the French National Agency for AIDS and Hepatitis Research, he contributed to the molecular understanding of the RNA-dependent RNA polymerase of the hepatitis C virus by solving its structure from an HCV strain unique in its ability to replicate in cell culture without adaptive mutations. The structural insight enabled functional studies to identify a single polymorphism critical for this strain's replication fitness. Currently, Philip is a senior researcher at the University of Oxford, whose interests include understanding, and chemically targeting, the molecular interactions of proteins relevant to cancer, such as adaptors and large multi-site docking proteins. In his spare time recently he translated into English two specialist textbooks (Molecular and Cellular Enzymology; Chemogenomics and Chemical Genetics) from the original French. |
![]() Stephan M. Feller | Stephan studied biology and then joined the laboratory of renowned cancer virologist Hidesaburo Hanafusa at the Rockefeller University, New York. After some experimental detours, Stephan discovered that the SH3-N domains of Crk adaptor proteins serve as docking modules for signalling proteins like Abl kinase, SoS and C3G. He also proposed a model for conformational regulation of protein binding of c-Crk through phosphorylation by c-Abl. This work led to an offer to start his own junior research group at the University of Würzburg, Germany, where he continued his research into signalling by protein–protein interactions. Since moving in 2001 to the Weatherall Institute of Molecular Medicine in Oxford, Stephan's group investigates signal protein complex formation with biophysical and structural biology tools. He has also developed a strong interest in understanding the molecular heterogeneity of tumours on the signalling protein level. More recently the group has started to explore design principles in the molecular architecture of large signalling complexes believed to function as molecular signal computation machines that are aberrantly activated by oncogenic receptor proteins. |
LMD proteins lack enzymatic activity and are involved in protein and plasma membrane binding. Well-studied examples include the Grb2-associated binder (Gab), fibroblast growth factor receptor substrate 2 (FRS2), insulin receptor substrate (IRS), downstream of kinase (DOK) proteins and Crk-associated substrate of 130 kDa (p130Cas; also known as breast cancer anti-oestrogen resistance protein 1, BCAR1) families. The one common feature of LMD proteins is their structural composition. They have folded N-termini, consisting of a small domain module i.e. pleckstrin homology, PH (Gab1–4), phosphotyrosine-binding, PTB (FRS-2α and 2β) or Src homology 3, SH3 (p130Cas family) domains. A slight variation to this is seen with IRS-1, -2 and -4, and DOK1–7, which have two adjacent folded N-terminal domains (PH and PTB). Beyond the folded N-terminus, there is a long C-terminal extension ostensibly devoid of major structural elements, and hence LMD proteins represent lesser-studied examples of intrinsically disordered (or unstructured) proteins (IDPs or IUPs). In all cases, the C-terminal tail comprises multiple phosphorylation sites for different tyrosine kinases. These sites serve as docking points for Src homology 2 (SH2) domain-containing adaptor proteins of the Crk (CT10 regulator of kinase), Grb2 (growth factor receptor-bound protein 2) and Nck (non-catalytic region of tyrosine kinase) families, which couple to a range of enzymes, and also for the direct binding of certain SH2-containing enzymes (e.g. the SH2 domain-containing phosphatase 2, SHP2 and phosphatidyl inositol-3 kinase, PI3K).
Comprehensive reviews have been recently published covering the functional aspects of LMD proteins to which the reader is referred.1–5 In this insight review we focus on a detailed understanding of the structural nature of LMDs, in particular mammalian members of the Gab family. The structural architecture of the CagA protein from Helicobacter pylori also comes under scrutiny, as it can hijack the same pathways usually associated with normal Gab protein signalling. The details of these important physical aspects of LMDs have been largely overlooked. Furthermore, the reach of IDP research has not really covered these key proteins, and thus we provide a closer investigation of what can be discerned about their structural properties from analyses of their sequences and the literature.
A wide range of receptors has been reported to transmit signals via the Gab proteins, including receptor tyrosine kinases (RTKs), G protein-coupled receptors (GPCRs), cytokine receptors, multi-chain immune recognition receptors and integrins.1 Their normal mode of interaction with the cytoplasmic portion of the receptor is achieved by indirect coupling to adaptor proteins, the most predominant being Grb2, which is a 25 kDa protein composed of a central SH2 domain flanked by terminal SH3 domains (SH3-N and SH3-C). Its SH2 domain can bind directly to phosphorylated tyrosine residues within the receptor. The SH3-C domain interacts with the Gab tail by docking onto proline-rich segments containing an atypical SH3 domain binding motif, RxxK. Interestingly, this interaction may be transient, at least in some cases.7 Gab1 is unique in having an additional means to couple to the hepatocyte growth factor receptor c-Met, an RTK implicated in a plethora of human cancers; it can bind directly with a 13-amino-acid segment present in its unstructured tail region.8
Other adaptors and docking proteins can act as accessory proteins for Gab1 and 2 in concert with Grb2 in coupling to receptors. These include Src homology domain-containing transforming protein 1 (Shc1), linker for activation of T-cells (LAT) and Nck1. Gab3 can bind to the Grb2-related adaptor, Mona/Gads, in cells of haematopoietic lineage.9 Furthermore, Gab1 has been shown to form a tripartite docking complex with Grb2 and the LMD protein FRS2α in fibroblast growth factor receptor (FGFR) signalling.10
On the other hand, Gab2 and Gab3 knockout mice live to a normal age. Whereas Gab3-deficient mice show no obvious phenotype,16 Gab2-deficient mice, while generally healthy, have distinct impairments. The high affinity immunoglobulin-epsilon receptor, FcεRI, in mast cells requires Gab2 to activate PI3K in the allergic response. Thus, Gab2 deficiency results in impaired allergic reactions such as passive cutaneous and systemic anaphylaxis.17 Also, bone marrow-derived mast cell growth was reported to be reduced in Gab2−/− mice due to defective signalling by the c-Kit receptor.18 Other bone-related deficiencies have been observed in Gab2−/− mice including osteopetrosis (an increase in bone density) and reduced bone resorption. These result from defective osteoclast differentiation involving the RANK (Receptor Activator of Nuclear Factor κ B) receptor, which requires association to Gab2.19 Additionally, normal haematopoesis is perturbed in Gab2 knockout mice as revealed by a poor haematopoietic cell response to early-acting cytokines.20 Somewhat surprisingly, Gab4 has not yet been analysed at all.
Several studies have been carried out with organ-specific knockouts in mice. For instance, liver-specific Gab1 knockouts have shown that Gab1 participates in the regeneration of the liver in conjunction with SHP2.21 Interestingly, Gab1 can also negatively modulate the effects of hepatic insulin signalling amplified by the docking proteins IRS1 and 2, in spite of its structural similarity.22 Both of these physiological effects are elicited by activation of the ERK pathway. Very recently, three independent studies have simultaneously reported the crucial role of Gab1 in promoting postnatal angiogenesis using Gab1 endothelial cell-specific knockout (ecKO) mice and hindlimb ischaemia models.23–25 Both the VEGF23 and c-Met24 receptors were implicated in mediating blood vessel development. The Gab1-ecKO mice were viable with no obvious vascular defects, which indicates that Gab1 in the endothelium plays no crucial role during developmental vasculogenesis. Effects on angiogenesis were not observed for conventional Gab2 knockout mice.24
The particular contributions of distinct protein binding sites, and hence downstream pathway effects, were unraveled for Gab1.26 In this important study, knock-in mutant mice were generated lacking the SHP2, Grb2 and c-Met binding sites. The most severe phenotype, embryonic lethality, occurred with Gab1ΔShp2/ΔShp2 mice, possibly by inactivating Ras signalling. The unusual, direct binding and Grb2 adaptor-linked recruitment mechanisms to the c-Met receptor are non-redundant. While either appears to be usable for normal limb muscle, placenta and liver development, as well as palatal closure, both the Grb2 and c-Met binding modes are jointly required. Thus, the need for c-Met-binding sites in Gab1 is tissue specific.26
![]() | ||
Fig. 1 Comparison of the structural architecture of human large multi-site docking (LMD) proteins. Representative members of each family (p130Cas, Frs2, IRS1, Dok1, and Gab1) are illustrated with their corresponding SMART domain structure and previously described secondary structure elements. Protein lengths are also indicated. The human Gab family is expanded in the bottom (enclosed) schematic, with the sites of protein binding and phosphorylation displayed, as described in the key below the panel. |
Sequence alignment of all four human Gab proteins illustrates the high sequence conservation in the PH domain (Fig. 2). However, it is clear that beyond the PH domain the Gab members bear less resemblance to one another. Curiously, the extreme C-terminus regains significant sequence homology, even when comparing across all family members. Other regions of higher identity correspond to conserved protein interaction sites, the first Grb2 SH3-C binding motif being notably absent in Gab4, which, disregarding its short length, otherwise resembles Gab2 the most closely.
![]() | ||
Fig. 2 Sequence alignment of human Gab family members. Gab1–4 are shown coloured by percentage identity using JalView.128 The PH domain and protein interaction regions on Gab1 as shown in Fig. 1 are here mapped more precisely using the same colour code. Note: Crk refers to the binding epitopes for the SH2 domains Crk and CrkL proteins; p85 is the subunit from PI3 kinase; the red point indicates the site of serine phosphorylation involved in PH domain autoregulation.44 The dashed black line at the C-terminus indicates a hitherto unrecognised region of homology, which is also predicted to contain secondary structure. |
Exploring Gab1's amino-acid sequence further using secondary structure prediction (PSIPRED server32) reveals that secondary structure is correctly predicted for the N-terminal PH domain segment, but the majority of the remaining chain towards the C-terminus lacks predicted structural elements. However, a small region near to the C-terminus is predicted with medium-to-high confidence to contain α and β structure (Fig. 3, top panel). What is more, this region maps to the C-terminal section of closer homology observed in the alignment of Gab protein sequences (Fig. 2, black dashed line). This sequence homology is observed in all human Gab proteins and in lower metazoans, including the frog species Xenopus laevis, the zebrafish Danio rerio, and to a lesser extent in the fruit fly Drosophila melanogaster (not shown). This is interesting, as the total lengths and composition of the Gab sequences are varied, yet they begin and end in a similar structural manner. This stretch of high homology is an unexpected finding, as we know of no known reports defining a structural requirement for the extreme C-terminus in the functional roles of all Gabs elucidated to date. We note, however, that this region harbours the sites of SHP2 phosphatase binding. High conservation of sequence across family members in structured proteins is generally linked to their related function and therefore it is tempting to speculate that, if not fully explained by the presence of SHP2 sites, the C-terminus may serve an additional and as yet unidentified functional role in Gab-family proteins.
![]() | ||
Fig. 3 Prediction of structural order and disorder for human Gab1. Bottom plot: Three predictors of disorder were used to analyse the protein sequence of human Gab1 for comparison: PONDR-VLXT (yellow), RONN (blue) and MetaPrDOS (red, dashed line). Values above or below the 0.5 tendency level represent disorder and order, respectively. Top panel: PSIPRED secondary structure prediction for Gab1, represented as coloured bars along the sequence: α-helices in dark magenta, β-strands in light green. The grey triangles above indicate the positions of the Gab1-PH domain-binding sites along Gab1's tail, as determined by peptide array (see Table 1 and ref. 29). |
No. | Overlapping binding sequence | Interaction partner | Local structure |
---|---|---|---|
a ID = intrinsic disorder. | |||
1 | 179DPQDYLLLI187 | — | Low IDa predicted |
2 | 219SHQTPASSQSK229 | — | Low ID predicted |
3 | 335 DTIPDIPPPRPPK347 | Grb2 SH3-C | PP-II helix (by experiment6) |
4 | 383PSRSNTISTVDLNKL397 | — | Low ID predicted |
5 | 415SDRSSSLEGFHSQYK429 | — | Low ID predicted |
6 | 491 PPPAHMGFR 499 | c-Met | Low ID predicted |
7 | 517 PPPVDRNLK 525 | Grb2 SH3-C | 310 helix (by experiment6) |
8 | 549PVRSPITRS555 | Gab1 PH | Low ID predicted |
9 | 607SNSLDGGSSPMNK619 | — | Low ID predicted |
10 | 631LDLDSGKSTPPRK643 | — | Low ID predicted |
11 | 655DERVDYVVVDQQK667 | SHP2 SH2 | Low ID predicted |
Firstly, it is interesting that all three YxxM (PI3K p85 subunit SH2 binding motifs) and all six YxxP tyrosine phosphorylation sites (putative Crk/CrkL SH2 binding sites) in Gab1 fall in regions outside of these binding sequences, which may represent specific disordered ‘domains’.45 However, the two SHP2 phosphatase SH2 interaction sites are located within or close to two of these peptide stretches; the significance, if any, of these differences remains to be understood. Another feature of these data is that a further three putative self-binding sequences overlap wholly or nearly completely (the corresponding amino-acid residues are written in bold text, Table 1) with specific known protein-binding regions, i.e. the c-Met binding motif, as well as the two Grb2 SH3-C binding segments encompassing RxxK motifs, in addition to the SHP2 site mentioned.
It is also of note that natural variants reported in Uniprot for the human Gab1 protein (accession code: Q13480; 90% sequence identity to mGab1), assuming they are not sequencing errors, occur in equivalent sequence regions in mGab1 that displayed no binding in our peptide array analysis.29 In contrast, a T387N somatic mutation discovered a few years ago during sequence analysis of 11 breast cancer samples46 lies in the centre of binding sequence 4 (Table 1), and being a non-conservative side-chain alteration could conceivably affect self-binding of the Gab1 tail. Whether these changes are driver mutations or simply passenger mutations, however, is not yet known.
Lastly, all but two of these PH-interacting regions appear to lie in areas of the Gab1 sequence that in the human sequence (shown as grey triangles in Fig. 3, top panel) are predominantly predicted to anticorrelate with disorder according to PONDR-VLXT and to some extent, RONN. Although Table 1 lists the sequences of the mouse Gab1 homologue, the PONDR-VLXT prediction for mGab1 (not shown) yields a profile that is nearly perfectly superimposable on that of the human protein; for instance, 8 of the 11 peptides are 100% identical (three are between 80 and 92% conserved: peptides 2, 5 and 9 in Table 1), thus the argument most likely holds. The two sequences lying significantly outside the more ordered regions are, incidentally, sequences 3 and 7. Ironically, these are the only two sequence regions of a Gab protein for which experimental data exist, having been crystallised in complex with the SH3-C domain from Grb2.6 In the co-complexes reported, these sequences harbouring RxxK motifs are not disordered but adopt distinct structures: a polyproline type II (PP-II) helix and a 310 helix6 (see also a related study47). Recent work indicates that Gab1–Grb2 SH3-C complexes may be transient.7 Therefore, the Gab1 interaction epitopes may well be available for intramolecular binding to the PH domain when Grb2 is absent. If the additional and as yet uncharacterised sites are genuine protein-binding sequences, their being more ordered is consistent with the notion that sequences with a low disorder probability within otherwise disordered protein regions may be involved in disorder-to-order transitions in the context of protein–protein interactions (PPIs).48 This has been shown for RNase E, for instance, where protein-binding segments including one involved in self-association, corresponds to low predicted disorder scores.49
Theoretical prediction methods are useful for predicting α and β structures. However, despite their assumed increased abundance in intrinsically unstructured protein regions,50 PP-II helices have always proved difficult to predict with great accuracy. The same applies for 310 helices. Recent attempts have reported PP-II helix prediction accuracies of about 60% based on information derived from a global analysis of tetrapeptides within the PDB,51 or 70% using a support vector machine (SVM) learning approach.52 However, these methods have not been incorporated into commonly used secondary structure prediction software. It becomes clear that only through focused experimental studies can these important structures be identified, as seen with Gab2, when peptides representing the two Grb2 binding sites bound with distinct helical backbone conformations, discussed above.6 Again, the importance of experiment is underlined here in order to elucidate the exact structural nature of intrinsically disordered regions (IDRs). Quite how the remainder of the polypeptide chain beyond the PH domain arranges itself is therefore open to study by biophysical methods, e.g. small-angle X-ray scattering (SAXS), cryo-electron microscopy and nuclear magnetic resonance (NMR) spectroscopy.
The multiple binding sites in Gab1 discovered by peptide array led to the proposal that the long tail region might actually loop back several times on to the folded PH domain as a matter of course, after ribosomal exit (N-terminal folding nucleation [NFN] hypothesis29). A key consequence of this model is that the protein's overall shape would be more compact, and specific segments of the unfolded polypeptide chain capable of moving into closer spatial proximity to other more remote sequence sites in a defined way. The principal benefit of this spatial form, aside from any advantage in terms of preventing unwanted associations and resisting degradation,53 is with respect to Gab1's critical role as a signal integration platform. Related to this, the locations of the tyrosine phosphorylation sites are not random, they are clustered in specific portions of Gab1's polypeptide chain, suggestive of intrinsic order to their positions. The interaction of specific effector molecules in particular regions of Gab may thus be facilitated by the geometry dictated by the potential anchoring sites along the chain. This could be complemented by the oligomerisation of some Gab interacting proteins, e.g. as reported for the CrkL protein.54
Regulation by distal parts of a protein becomes possible by having a highly flexible, extended linker that can fold back over a long range. A few examples of IDPs participating in such regulatory processes, each with their own unique characteristics, include the following: the cell cycle regulator p27Kip1 when in complex with Cdk2 and cyclin A55 (see also ref. 56). The N-terminus of p27Kip1 is attached to this complex and its C-terminus can fold back onto the complex, in a phosphorylation-dependent manner. Intramolecular autoregulation is observed within other proteins regulating microtubule dynamics, most notably the cytoplasmic linker protein of 170 kDa, CLIP170.57 CLIP170 contains many structured segments, but employs a flexible linker to allow long-range auto-inhibition, in this case through the folding back of a coiled-coil domain by a phosphorylation-independent mechanism.58 The disordered translocation domain of the bacterial antibiotic colicin N, from E. coli, can self-associate to its folded domain, thereby possibly affording it some protection from mammalian host proteases.59
H. pylori is a rod-shaped bacterium that infects the stomach epithelial cell layer. Infection takes place in several steps. In the early stages, H. pylori secretes proteases, which digest the cell-cell junctions. Very recently, HtrA (high-temperature requirement A), a presumed dual-function chaperone/protease, was identified as being a key virulence factor fulfilling this role. HtrA is secreted from the periplasm and able to cleave E-cadherin, the principal component of adherens junctions, as its substrate.62 This leads to the breakdown of intracellular junctions, allowing H. pylori to invade the interstitial space and to attach to cell surfaces. Briefly, attachment involves the assembly of H. pylori's type IV secretion system (T4SS). This is encoded by the cytotoxin-associated gene (cag) pathogenicity island, a region of approximately 32 genes, including cagA. The T4SS includes an extended tubular pilus, which protrudes through the bacterium's outer membrane and docks onto transmembrane α5β1 integrin receptors of the host cell.63,64 The CagL, CagY, CagI and CagA proteins are necessary for the interaction with the β1 integrin subunit,63,64 implying that they are situated on the exterior face of the pilus. CagA is additionally injected down the pilus into the cytoplasm of the epithelial cell, whereupon it undergoes phosphorylation on E-P-I-Y-A sequence motifs located towards its C-terminus by Src family kinases65 and c-Abl.66 Subsequently, CagA recruits various SH2 domain-containing proteins,65e.g. SHP2 and Grb2,67 enabling CagA effectively to hijack signalling pathways that are normally managed by Gab proteins. This process brings about rearrangements of the actin cytoskeleton, cell scattering and elongation and hence a migratory behaviour, termed the ‘hummingbird’ phenotype (covered in a recent mini-review68), reminiscent of the cellular response to Gab activation.
CagA has been categorised as a Gab mimic based on its capability to interact with partners of Gab and elicit similar functional effects in human cells.60 A remarkable demonstration of CagA's mimicry was provided by transgenic studies in the fruit fly Drosophila melanogaster containing loss-of-function Dos (Daughter of Sevenless) mutants.61 Dos is a Gab-related docking protein, which signals downstream of different receptors, including the Sevenless receptor (Sev).69 Sev is a tyrosine kinase receptor essential for R7 UV photoreceptor cell development in the compound eye of Drosophila. Dos is coupled to Sev via Drk, the Drosophila homologue of mammalian Grb2 family adaptors. Drk binding to Dos requires two RxxK motifs and deletion of these leads to a loss of all R7 cells.70 Dos inactivation is lethal during development; few pupae are formed and flies do not develop to adulthood.69 When the CagA protein was overexpressed in flies lacking wild-type Dos, more than double the number of pupae were generated. Secondly, in homozygous dos mutants, which produce few cells and photoreceptors, the overexpression of either Dos itself or CagA is able to restore cell growth to similar levels along with the generation of equivalent numbers of eye photoreceptors.61 These experiments neatly revealed how CagA can functionally mimic Dos.
In humans, H. pylori infection is a major risk factor for gastric inflammation and cancers, largely due to this capability of promoting Gab-like signalling, but this is not all that is required. CagA can also associate with the apoptosis-stimulating protein of p53 (ASPP2) in order to subvert its tumour suppressor function. Following this interaction, the cell's balance of the pro-apoptotic transcription factor, p53, becomes perturbed, whereupon it undergoes increased proteasomal degradation in an ASPP2-dependent manner.71
Curiously, CagA lacks an N-terminal PH domain or indeed any other known folded domain, apart from three short, putative coiled-coil domains predicted by SMART. Furthermore, its sequence length is nearly twice that of Gab1 and 2 and thus CagA appears structurally unrelated to the Gab family. Nevertheless, the N-terminal region has been ascribed a role in localising CagA to the membrane.72 In fact, it would seem that CagA has a more complex mechanism of membrane attachment with selective binding to distinct membrane substructures, requiring the interplay of regions within both the N- and C-termini.73 These regions remain to be characterised in more detail.
![]() | ||
Fig. 4 Prediction of structural order and disorder for Helicobacter pylori CagA protein. Bottom plot: Three predictors of disorder were used to analyse the protein sequence of CagA for comparison: PONDR-VLXT (yellow), RONN (blue) and MetaPrDOS (red, dashed line). Values above or below the 0.5 tendency level represent disorder and order, respectively. Top panel: The PSIPRED secondary structure prediction for CagA, represented as coloured bars along the sequence: α-helices in dark magenta, β-strands in light green. |
The secondary structural prediction for CagA by the PSIPRED server32 is represented in Fig. 4 (top panel). It is evident from this output that CagA has a predicted, high tendency to form mainly α-helices throughout the sequence with some β-strands in the N-terminal half, albeit with variable confidence levels (not shown). This is in stark contrast to the Gab proteins. The widespread distribution of secondary structural elements corroborates the mixed order/disorder prediction. Importantly, these combined analyses strongly support the notion that CagA is structurally unrelated to the Gab proteins. Therefore, it is remarkable that H. pylori has found a means to mimic the cellular response promoted by Gab signalling, using a protein of dissimilar architecture. Clearly, there are other constraints on H. pylori, such as (i) the need to inject this virulence factor across the host cell membrane, a process that may require specific structural handles alongside a degree of plasticity, and (ii) to interact with the pilus,64,74 that may dictate why CagA adopts its particular form. An additional difference with respect to CagA's structure, in contrast to eukaryotic Gab proteins, is that CagA can undergo multimerisation once injected, which is necessary for its ability to cause the hummingbird phenotype.75 The functional mimicry would therefore seem to reside in the presence of the correct short linear motifs, i.e. c-Src phosphorylation sites, SHP2 and Grb2 docking sites etc. This is supported by the fact that in transgenic fruit flies, CagA mediates its effects by way of the Corkscrew (CSW) protein, the equivalent of SHP2 phosphatase in Drosophila.76 In csw-null flies, CagA failed to increase the number of R7 photoreceptors.61 It is also conceivable that the structural context of these motifs is important and CagA may recreate a similar local geometry to the Gab proteins at sites of protein–protein interactions. It would be interesting to investigate experimentally whether predicted helical regions in CagA do indeed form defined tertiary structure. To date, only a short fragment has been structurally analysed: a 14-residue CagA peptide possessing no regular secondary structure was crystallised in complex with the human kinase PAR1b/MARK2.77 This was unexpected since the reported co-crystallisation experiment was set up with a 120-residue portion of CagA (amino acids 885 to 1005) encompassing the E-P-I-Y-A motifs, but only the 14 amino acids showed visible electron density, implying flexible disorder for the remaining 104 residues.77 This is quite consistent with the output from our disorder analysis: all three predictors used indicate that disorder predominates in this region (Fig. 4). Furthermore, secondary structure is predicted to be the most sparse in part of this region (Fig. 4, top panel).
Gab1's role in tumourigenesis has been implied through its essential connection with c-Met receptor signalling,87 which is activated (mutated or overexpressed) in a vast array of cancers.88 Other receptor-driven tumourigenesis pathways implicate Gab1, for example, in EGFR signalling within intestinal adenomas,89 and glioblastoma cells,90 as well as in hyaluronan-mediated CD44 signalling in metastatic breast tumours.91
Therefore, the direct link between Gab1 and 2 in numerous oncogenic mechanisms raises the possibility that they may be useful target areas for the development of novel therapeutics.
While not nearly as promiscuous in their interactions as other well-characterised hubs, such as the DNA binding proteins and tumour suppressors BRCA1 and p53, the list of Gab-interacting partners as well as the structural and functional features of Gab proteins leads to the notion that these may indeed be network hubs. As described earlier, a gab1 gene deletion gives a lethal phenotype in mice. However, deletion of gab2 and 3 produces viable mice. This may reflect their more restricted expression profile during development, since Gab2 can interact with the same effector proteins as Gab1, thus its network promiscuity is essentially equivalent.
Hub proteins are a very frequent point of dysregulation contributing to human pathologies. Their pivotal role in coordinating various inputs and outputs requires fine control and thus their integrity within the cell has multiple adverse repercussions when compromised. Given the central role docking proteins such as Gab1 and 2 play in mediating cellular signalling responses after activation of growth factor receptors, it is perhaps surprising that they are not more frequently mutated in human cancers. This seems to be contrary to the typical scenario with BRCA1 and p53, which are found to be severely mutated in many cancers. One possible explanation could be that it is practically impossible to mimic tyrosine phosphorylation by mutations in disordered tails of proteins in order to generate oncogenic signals, while it is much easier to disrupt many anti-oncogenic signals that are based on protein–protein binding events, or to activate enzymatic proteins by mutation.
In line with this, mutations in Gab-associated proteins are more frequently reported. Some examples include the D61G gain-of-function mutation in SHP2, which is linked to JMML.97,98 Deletion of Gab2 in D61G mutant mice alleviates the aberrant activity, indicating once again the important role it plays in propagating signals via the SHP2 phosphatase. Presumably, the mere presence of sufficient expressed Gab2 in this context is required to sustain the cancerous phenotype. PI3K is mutated in most breast cancer subtypes,98,99 many of which harness Gab signalling functions. PLCγ can also adopt gain-of-function mutations relevant to cancer.100
In any case, it is possible that additional disease-associated mutations in Gab proteins await discovery. To be effective in driving disease processes, we hypothesise that these mutations might: (i) enhance their interaction with the membrane and/or receptors, perhaps residing in the PH domain, or by mimicking Gab1 Ser552 phosphorylation; (ii) decrease affinity for negative regulators, e.g. 14-3-3 in the case of Gab2, or certain tyrosine phosphatases, or (iii) promote structural organisation favourable to signalling-complex assembly e.g. by modifying the potential anchor points listed in Table 1.
Is it possible to develop inhibitors that disrupt Gab protein–protein interactions in which one or more intrinsically disordered sequences participate? This is a general challenge confronted by every research endeavour into PPI inhibitors, confounded by the potentially variable nature of IDRs, yet emerging strategies aim to overcome this.102 It is known that PPI regions often have some preorganised structure and may transition to an ordered state upon binding104,105 although IDPs do exhibit variability in this regard.106 This is necessarily the case when the partner protein is a structured domain. In fact, 4 of 8 key PPI inhibitors reported to date target complexes comprising one disordered and a second ordered domain.107 The known docking proteins that couple to Gab proteins do so mostly by way of their globular domains (i.e. SH2 and SH3) and thus should present structured and quite conformationally invariant surfaces. The question may therefore be: can specificity be built into small-molecule inhibitors interfering with SH2 and SH3 interactions, as around 120 SH2 and 300 SH3 domains are encoded in the human genome and many of these contribute to regulating several signalling pathways? Interrupting Gab function does not need to be limited to the SH2 and SH3 domain interactions; however, without a precise view of the structural elements at play in other interactions, it may be challenging to design such inhibitors rationally.
A homologue of Arf1 is Arf6, sharing about 70% sequence identity. Crystal structures of both of these GTPases show high similarity.110,111 The binding sites for BFA are superimposable, with all residues contacting BFA identical in both structures, and yet BFA has no activity towards the Arf6/ArfGEF complex. It was therefore a puzzle for many years how BFA could elicit such a differential response to these Arf proteins. Although the picture is further complicated by differences in ArfGEF specificity between Arf1 and 6 it is now clear that the crystallographic state of the complex represents only one conformational possibility of the otherwise well-folded Arf protein. Further studies in solution using SAXS and 2D NMR on truncated Arf variants have revealed that the switch 1/interswitch region can transition between a folded and unfolded state, the latter never being captured in the crystallised form of the full-length protein, although observed in the truncated form.112 Thus, the local disorder in Arf6 most likely influences both its lack of BFA sensitivity and its ArfGEF specificity. Incidentally, the switch 1 region that can unfold is not predicted to be disordered (our finding, data not shown), underlining the importance of experimentation to determine such features, on an individual protein basis.
This leads to the possibility that, despite having such a large portfolio, even SH3 proteins may be divisible into subsets with differing solution states, which may contribute to specificity upon interaction with their protein partners. For instance, the solution dynamics of the SH3-N domain from the Drosophila adaptor protein Drk has been studied in detail by NMR, and it exists in roughly a 1:
1 equilibrium between the folded and the unfolded states in vitro.113,114 Systematic solution studies would be necessary to determine the complete dynamical profile of SH3 domains, or other targets of interest. Alternatively, differences in the folding–unfolding characteristics of target proteins may be exploited in strategies to accommodate specificity.
One notable attempt to target more suitable SH3 ligands—a library of small aliphatic and aromatic hybrid peptide-peptoids—demonstrated that affinities as high as 40 nM are possible, in this case using Grb2 SH3-N.120 It would appear, therefore, that an exhaustive search space has not been covered and it is conceivable that a fragment-based screening approach could prove to be the most effective means to develop drug-like molecules against PPIs such as SH3s. Briefly, there has been some success in small-molecule inhibition of the SH3 domain from cortactin binding to AMAP1, an effector of Arf6-mediated invasion in breast cancer cells.121 However, the molecule UCS15A, a product of the bacterium Streptomyces, blocks the PPI probably not by adhering to the SH3 but by directly binding to the proline-rich sequence in the protein partner.122
Alternatively, there may be value in exploring peptido-mimetics for PPIs involving helical structures, as found in Gab1 and 2, since these can be readily made to match the geometries of polypeptide chains. Scaffolds based on terphenyl or teraryl groups are showing promise for other targets.123 It may also be possible to employ this strategy for non-helical or non-beta structures, indeed for less regular polypeptide conformations even, so long as detailed structural information is available, i.e. a co-complex of the two proteins involved in the PPI. Even in peptide–protein complexes, the conformation of the peptide is most likely to reflect the actual conformational state found in the folded protein.124 There is evidence to suggest that regions of intrinsically unfolded proteins that participate in protein–protein interactions begin with the residual structure adopted upon binding, or at least transiently form this structure dynamically when uncomplexed49,125 (an illustrative example is provided by the transient helix appearance prior to formation of the MYPT1–PP1 complex126). Thus, the structure of a peptide–protein co-complex should be usable as a reliable starting guide for mimetic design, expanding the chemical possibilities for targeting interactions involving a disordered protein binding to a structured surface. This has been demonstrated, for instance, by current Smac (second mitochondrial inhibitor of caspases) mimetics, which bind with nanomolar affinity to multiple anti-apoptotic XIAP (X-linked inhibitor of apopotosis) proteins. The apoptotic response elicited is equivalent to that induced by Smac itself both in vitro and in animal models, and hence the compounds are presently in phase I clinical trials.127
It is evident that while intrinsic disorder seems to be fully compatible with Gab functions in eukaryotes, the strong prediction of structure for the functional Gab-mimic CagA indicates that disorder is not a strict requirement for at least some Gab functions. There are significant differences in the functional repertoires of these human and bacterial proteins. For instance, intrinsic disorder may benefit Gab in mediating multiple interactions and signalling responses, whereas for CagA promiscuous binding is not necessarily the goal as it presumably performs only a subset of the roles accomplished by the endogenous Gab proteins. Its functional requirement is to elicit particular cellular responses (breakdown of intercellular junctions, cell scattering, apoptosis, etc.) that benefit the pathogen. Also, CagA must physically transition from a bacterial environment to the host cell, possibly adding further to constraints on its structure.
The NFN hypothesis for LMD protein compaction is an appealing model, which provides a convenient explanation of how Gab proteins and similarly structured LMD proteins might coordinate multiple downstream signalling events and segregate, or bring together, binding partners in a directional manner. This hypothesis by no means attempts to rebuild order and detract from the clearly established reality of protein intrinsic disorder, rather it serves to illustrate how versatile and varied protein surfaces and geometries can be, in order to carry out multiple cellular roles, e.g. ‘fly-casting’ to trap partners; moulding to different partners as in the case of hub proteins like p53; folding-back in autoregulatory processes, thereby protecting interaction sites. Thus, not only is there a wide spectrum of spatial forms within the proteome, individual proteins themselves can adopt a range of disordered and ordered states, which are context-dependent. This of course makes simple categorisation very tricky and potentially less meaningful. Whereas the traditional concept of protein structure encompassed globular proteins, molten and pre-molten globules, and those with inherent disorder, it is evident that disordered sequences themselves can be sub-classified into flexible disorder, constrained disorder and non-conserved disorder of no apparent function43 (though possibly fulfilling a basic role as linkers).
The fact that Gab-related proteins and many of their protein partners exist throughout the Metazoa kingdom, indicate that these mechanisms were primitively formed and probably represent one of diverse means to deal with the complexities of signalling systems in the context of multicellularity. Interestingly, individual Gab orthologues across species share more sequence identity than non-identity in the disordered tail, and are more closely related to one another than its paralogues are within a given species. This argues that the constraints on these supposedly intrinsically disordered sequences are nearly as great as for folded domains, although the PH domain is more highly conserved. Also, it suggests that constrained disorder is common in Gab proteins, a feature that may be related to their function as multi-site docking platforms and, consequently, the sheer number of conserved phosphorylation and protein binding sites within their sequences. Therefore, a simplistic classification of the Gab protein family is difficult as they constitute: a folded domain, followed by presumed constrained and some flexible disorder, punctuated with at least two defined secondary structural elements (PP-II and 310 helices) and potentially through compaction by NFN, a form of intrinsically disordered tertiary structure, with any number of uncharacterised putative disorder-to-order transitions. Outstanding issues surrounding the true geometric form of Gab proteins include understanding how their conformations might alter in receptor-stimulated or unstimulated cells. These questions feature among several currently under investigation in our laboratory.
Finally, while initial attempts to develop small molecules to inhibit PPIs were met with difficulty, modern fragment-based screening approaches are emerging as the strategy of choice. With this in mind, early high-throughput screening with larger molecules would most certainly not have covered a comprehensive chemical search space optimised for PPI surfaces. Furthermore, the future of IDPs as drug targets is more realistic as experimental data begin to become available revealing stable interaction surfaces, such as disordered sequences that adopt transient structure, capable of being stabilised or trapped by association to small molecules, or as a template for compounds based on peptido-mimetics. In view of this, the PPIs involving Gab-family proteins and their associated effectors and adaptors are certainly viable candidates. Validation of these propositions by focused experimentation is now essential.
Footnote |
† Published as part of a Molecular BioSystems themed issue on Intrinsically Disordered Proteins: Guest Editor M. Madan Babu. |
This journal is © The Royal Society of Chemistry 2012 |