Sara
Birtalan‡
,
Robert D.
Fisher‡
and
Sachdev S.
Sidhu§
*
Department of Protein Engineering, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA. E-mail: sachdev.sidhu@utoronto.ca
First published on 9th April 2010
We tested the functional capacity of the natural amino acids for molecular recognition in a minimalist background of binary Tyr/Ser diversity. In phage-displayed synthetic antibody libraries, we replaced either Tyr or Ser with other residues. We find that Tyr is optimal for mediating contacts that contribute favourably to both affinity and specificity, but it can be replaced by Trp, which contributes favourably to affinity but is detrimental to specificity. Arg exhibited a limited capacity for mediating molecular recognition but was less effective than either Tyr or Trp, and moreover, was the major contributor to non-specific interactions. Nine other residue types (Phe, Leu, Ile, Asn, Thr, Pro, Cys, Ala, and Gly) were found to be ineffective as replacements for Tyr. By replacing Ser with Gly or Ala, we found that Gly is as effective as Ser for providing conformational flexibility that allows bulky Tyr residues to achieve optimal binding contacts, while Ala is less effective but still functional in this capacity. For some antigens, high affinity antibodies could be derived using only Tyr/Ser/Gly diversity, but for others, additional chemical diversity was required to achieve high affinity. Our results establish a minimal benchmark for the generation of synthetic antigen-binding sites with affinities comparable to those of natural antibodies. Moreover, our findings illuminate the fundamental principles underlying protein–protein interactions and provide valuable guidelines for engineering synthetic binding proteins with functions beyond the scope of natural proteins.
We have previously used synthetic antibody libraries to define the minimal requirements for molecular recognition.8 We arrived at a simple but functional repertoire built on a single antigen-binding fragment (Fab) framework with diversity restricted to four of the six complementarity-determining region (CDR) loops and only two amino acids (Tyr and Ser).9 The structures of several minimalist antibodies revealed that bulky tyrosines act mainly as contact residues that mediate interactions with antigens, while small serines act mainly as conformation residues that help to shape the CDR loops for antigen recognition.9–12 Herein, we take advantage of this simplified background to compare the capacity of different amino acids to act as either contact or conformation residues by assessing their ability to replace either Tyr or Ser in minimalist antigen-binding sites.
Many studies aimed at understanding the role of amino acid diversity in molecular recognition have relied on inference from in silico analysis of structural databases.13–20 In contrast, our study provides direct, empirical assessment of the functional capacity of natural amino acids for contributing to the affinity and specificity of molecular recognition.
![]() | ||
Fig. 1 Design of libraries. The main chains of the humanized 4D5 heavy- and light-chain variable domains are coloured grey and blue, respectively. The CDR positions that were diversified are shown as coloured spheres as follows: CDR-L3, purple; CDR-H1, yellow; CDR-H2, orange; CDR-H3, red or grey. The grey positions were diversified as follows: 100b, Ala/Gly; 100c, Phe/Ile/Leu/Met. The other coloured positions were randomized with various binary combinations, as described in the main text. In CDR-H3, the red positions were replaced by all loop lengths between 6 and 17 residues in all repertoires except repertoire H3-YSGX in which all loop lengths between 4 and 17 residues were used. In the heavy-chain of repertoire H3-YSGX, position 28 was not diversified and additional positions were diversified as follows: 29, Phe/Ile/Val; 34, Ile/Met; 52a, Pro/Ser; 55, Gly/Ser. In CDR-L3 of repertoire H3-YSGX, positions 91–94 were replaced by four, five or six codons encoding Tyr/Ser and additional positions were diversified as follows: 95, Ile/Pro; 96, Phe/Ile/Val. Positions are numbered according to the nomenclature of Kabat et al.38 The figure was generated from crystal structure coordinates (PDB entry 1FVC) using the computer program PYMOL (http://pymol.sourceforge.net). |
Each of these repertoires was cycled through rounds of binding selection against four human antigens: vascular endothelial growth factor (VEGF), epidermal growth factor receptor 2 (HER2), insulin, and insulin-like growth factor-1 (IGF-1). We also selected for binding to protein A, which recognizes the heavy-chain variable domain22,23 and can be used to select for correctly folded protein.11,24 We assembled large panels of unique binding clones for statistical analysis to determine which amino acids are best suited to enable antigen recognition with high specificity.
![]() | ||
Fig. 2 Antibodies from repertoires derived by combining Ser with different amino acids. (A) Chemical composition of antigen-binding CDR-H3 loops from repertoire H3-SX, in which the CDR-H3 loops contained Ser combined with one of 12 different amino acids (Tyr, Trp, Arg, Phe, Leu, Ile, Asn, Thr, Pro, Cys, Ala or Gly), and CDR-H1, -H2 and -L3 loops contained Ser combined with Tyr. A total of 174 unique clones were analyzed (Fig. S1, ESI†). (B) Chemical composition of antigen-binding sites from repertoire All-SX, in which all four randomized CDR loops contained Ser combined with either Tyr, Trp, Phe or Arg. The following populations and numbers were analyzed: naïve (n = 125, white bars), protein A binding (n = 346, grey bars), antigen binding (n = 105, black bars, Fig. S2, ESI†). Statistically significant deviations (an unadjusted p value < 0.05) from the naïve populations are indicated with an asterisk (*). (C) Relationship between nonspecific binding and the chemical composition of antigen-binding sites. Groups of antibodies with different chemical composition (x-axis) in CDR-H3 (white bars) or the entire antigen-binding site (black bars) were assayed for mean nonspecific binding (y-axis) using a phage ELISA to measure nonspecific binding to a panel of noncognate antigens. The number above each bar indicates the number of clones in the group. |
To assess the capacity of different amino acids to function as contact residues across the entire antigen-binding site, we next designed four libraries, in each of which, all four randomized CDR loops were diversified with a binary combination of Ser and Tyr, Trp, Arg or Phe. The resulting repertoire “All-SX” yielded 105 unique antigen-binding clones after selection against the four antigens. The Phe library yielded only a single clone, which targeted IGF-1 (Fig. S2, ESI†). The Phe library was also depleted amongst the protein A-selected clones relative to the naïve repertoire (Fig. 2B), suggesting that high densities of solvent-exposed Phe residues compromise the structural integrity of the Fab protein. The Arg library was less well represented as only eight clones were raised against a single antigen, insulin. In contrast, the Tyr and Trp libraries were well represented by numerous clones raised against four or three antigens, respectively.
To assess specificity, we used a phage ELISA to measure binding of the antigen-binding clones against a panel of proteins and calculated a mean nonspecific binding signal for each by averaging the ELISA signals against these non-cognate antigens (Fig. S1 and S2, ESI†).25 To determine the level of nonspecific binding associated with each amino acid, we binned the sequences into groups based on the amino acid type combined with Ser and, for each group, quantified nonspecific binding as the mean of the mean nonspecific binding signals for all clones in the group (Fig. 2C). From the H3-SX repertoire, clones containing Tyr in CDR-H3 exhibited the lowest nonspecific binding signals and those containing Trp or Phe were also relatively specific. In contrast, those containing Arg exhibited high non-specific binding signals. From the All-SX repertoire, the clones containing Tyr and the single clone containing Phe in all four CDR loops were highly specific, while the clones containing either Arg or Trp exhibited high nonspecific binding signals. Taken together, these results show that binding surfaces containing high levels of Tyr are very specific. In contrast, surfaces containing Trp limited to CDR-H3 are also fairly specific but high contents of Trp across all four randomized CDR loops cause substantial nonspecific binding. Surfaces containing Arg are very nonspecific, as this residue causes nonspecific binding when present in only CDR-H3 or in all four randomized CDR loops.
![]() | ||
Fig. 3 Antibodies from repertoires derived by combining Tyr with different amino acids. (A) Chemical composition of CDR-H3 loops from repertoire H3-YX, in which CDR-H3 loops contained Tyr combined with either Ser, Gly or Ala, and CDR-H1, -H2 and -L3 loops contained Tyr combined with Ser. The following populations and numbers were analyzed: naïve (n = 50, white bars), protein A binding (n = 151, grey bars), antigen binding (n = 197, black bars, Fig. S3, ESI†). Statistically significant deviations (an unadjusted p value < 0.05) from the naïve populations are indicated with an asterisk (*). (B) Chemical composition of antigen-binding sites from repertoire All-YX, in which all four randomized CDR loops contained Tyr combined with either Ser, Gly or Ala. The following populations and numbers were analyzed: naïve (n = 47, white bars), protein A binding (n = 123, grey bars), antigen binding (n = 178, black bars, Fig. S4, ESI†). (C) Relationship between nonspecific binding and the chemical composition of antigen-binding sites. Groups of antibodies with different chemical composition (x-axis) in CDR-H3 (white bars) or the entire antigen-binding site (black bars) were assayed for mean nonspecific binding (y-axis) using a phage ELISA to measure nonspecific binding to a panel of noncognate antigens. The number above each bar indicates the number of clones in the group. |
When all four randomized CDR loops were constructed by combining Tyr with Ser, Ala or Gly (repertoire “All-YX”), selection for binding to protein A resulted in the depletion or enrichment of CDR sequences that contained Ala or Gly, respectively, while the abundance of CDR sequences that contained Ser was not changed (Fig. 3B). These results suggest that high densities of Ala in the antigen-binding site tend to destabilize the antibody fold relative to antigen-binding sites containing Ser residues. High densities of Gly may stabilize the antigen-binding site relative to Ser, but it is also possible that the enrichment for sequences that contain Gly may be due to higher non-specific binding of these clones (see below). Following selection for binding to antigens, the sequences of 179 binding clones revealed that the Gly library generated binders against all four antigens, while the Ala and Ser libraries each generated binders against three antigens (Fig. S4, ESI†). However, the Ala library only generated a total of six binding clones, as most of the binding clones were from the Gly or Ser libraries (Fig. 3B). Taken together, these results show that Gly is as functional or perhaps even more functional than Ser for generating functional antigen-binding sites in combination with Tyr. Antigen-binding sites that contain high densities of Ala residues appear to be compromised for stability and Ala residues are not as effective as Ser or Gly residues for mediating antigen recognition.
For specificity, the addition of Ala to the CDR-H3 loops appears to be detrimental in comparison with Ser (Fig. 3C). Surprisingly, the clones that contain Ala in all four CDR-H3 loops appear to be more specific than those with Ala in CDR-H3 only, but this may be due to the limited amount of data since we only isolated six clones of this type. Adding Gly to the CDR-H3 loops increases nonspecific binding, and in this case, the clones that contain Gly in all four CDR loops exhibit even higher nonspecific binding. These results show that Ser-rich binding sites are more specific than Gly-rich binding sites and are at least as specific as Ala-rich binding sites.
![]() | ||
Fig. 4 Characterization of anti-HER2 Fabs. (A) Sequences of the heavy chain CDR loops. Residues in grey are at positions that were not diversified in the libraries. Residues at diversified positions are coloured as follows: Tyr (yellow), Ser (red), Gly (green), Ala (blue), Trp (orange). (B) Kinetic analysis by SPR for Fabs binding to immobilized HER2. (C) Epitope mapping. HER2 was captured with the indicated immobilized antibody (x-axis) and, subsequently, phage-displayed Fab was added and simultaneous binding of the two antibodies was detected by phage ELISA (y-axis). The following Fabs were analyzed: Fab-H-YS (white bars), Fab-H-WYS (grey bars), Fab-H-WS (black bars). (D) Flow cytometric analysis of Fabs binding to NR6 cells (grey trace) or stably transfected NR6 cells expressing HER2 (black trace). |
We analyzed in further detail those Fabs containing Ser combined with Tyr and/or Trp. In total, we had a panel of 43 antibodies of this type and we used competitive phage ELISAs to determine affinities for the entire panel (Fig. S1–S4, ESI†). On average, the 12 clones that contained Trp/Ser exhibited the tightest affinities (mean IC50 = 1.1 nM), the 18 clones that contained Trp/Tyr/Ser exhibited intermediate affinities (mean IC50 = 4.4 nM) and the 13 clones that contained Tyr/Ser exhibited the lowest affinities (mean IC50 = 10.4 nM). However, the average non-specific binding activity for the group that contained Trp/Ser was significantly higher than for the other two groups.
We purified the highest affinity Fabs with antigen-binding sites that were derived from diversity restricted to only Tyr/Ser (Fab-H-YS), Trp/Tyr/Ser (Fab-H-WYS) or Trp/Ser (Fab-H-WS) (Fig. 4A). Kinetic analysis of the purified Fabs by surface plasmon resonance (SPR) agreed with the results of the competitive phage ELISA, showing that all three Fabs bind tightly to HER2 and Fab-H-WS exhibits the highest affinity (Fig. 4B). We also conducted epitope mapping experiments to compare the three Fabs to each other and to 4D5 and 2C4, two anti-HER2 antibodies with non-overlapping epitopes.25–27 A phage ELISA was used to test whether HER2 captured with each immobilized antibody could bind simultaneously to other phage-displayed Fabs. Fab-H-YS, Fab-H-WYS or Fab-H-WS cannot bind simultaneously but each can bind to HER2 captured by either 4D5 or 2C4, suggesting that the three Fabs recognize a common epitope that is distinct from the epitopes recognized by 4D5 and 2C4 (Fig. 4C). All three Fabs also recognize antigen on cells, as evidenced by flow cytometric analyses showing that the Fabs specifically label cells expressing cell-surface HER2 (Fig. 4D).
![]() | ||
Fig. 5 Chemical composition of CDR-H3 loops. Clones from repertoire H3-YSGX were selected for binding to the indicated antigens and were binned into groups on the basis of amino acid types found in addition to Tyr/Ser/Gly in the CDR-H3 loop. The dash (-) indicates CDR-H3 loops that contain only Tyr/Ser/Gly residues. The number above each set of bars indicates the total number of binding clones for each antigen. |
All of the libraries are well represented following selection for binding to protein A, indicating that, as expected, all of the different amino acid types are well tolerated in CDR-H3. There appears to be a slight bias in favour of the positively charged Lys/His residues, but this may be due to favourable electrostatic interactions with protein A, which has a net negative surface charge (data not shown). Notably, there is also a high abundance of CDR-H3 sequences composed solely of Tyr/Ser/Gly residues, and these arise from a fraction of each library that does not contain additional chemical diversity.
For clones selected for binding to insulin or IGF-1, there is a strong bias in favour of CDR-H3 sequences that contain Lys residues, and this is likely due to the fact that these antigens have overall negatively charged surfaces (data not shown). We used competitive phage ELISAs to survey affinities and found that most of the clones recognize antigen with low affinity. In the case of IGF-1, only eight of 94 clones exhibited greater than 50% inhibition of binding to immobilized antigen in the presence of 100 nM solution-phase antigen (Fig. S5A, ESI†). The CDR-H3 sequences of these eight clones all contained additional chemical diversity beyond the basic Tyr/Ser/Gly background, and detailed analysis by competitive phage ELISA revealed that the two clones exhibiting the lowest IC50 values contain Lys residues in their CDR-H3 sequences (Fig. 6A). In the case of insulin, 12 of 54 clones exhibited greater than 25% inhibition of binding in the presence of 100 nM solution-phase antigen but none exhibited greater than 50% inhibition (Fig. S5B, ESI†), and thus, we could not determine accurate IC50 values for these low affinity interactions. However, seven of the 12 clones that exhibited greater than 25% inhibition contain Lys residues in their CDR-H3 sequences (Fig. 6B).
![]() | ||
Fig. 6 Sequences and affinities of antigen-binding Fabs. Heavy-chain CDR sequences are shown for the highest affinity Fabs selected from repertoire H3-YSGX for binding to (A) IGF-1, (B) insulin, (C) HER2 or (D) VEGF (Fig. S5, ESI†). Residues in grey are at positions that were not diversified in the libraries. Residues at diversified positions are coloured as follows: Tyr (yellow), Ser (red), Gly (green), additional diversity in CDR-H3 (purple). Binding parameters (kon, koff, Kd) were determined from kinetic analysis of Fabs binding to immobilized antigen by SPR. IC50 values were determined by competitive phage ELISA. |
For clones selected for binding to HER2, there is a bias in favour of clones that contain only Tyr/Ser/Gly in their CDR-H3 sequences (Fig. 5), suggesting that additional chemical diversity does not significantly improve binding to this antigen. In this case, we found numerous clones that exhibited almost complete inhibition of binding in the presence of 100 nM solution-phase antigen (Fig. S5C, ESI†), and detailed analysis of 16 clones revealed IC50 values in the low to sub-nanomolar range (Fig. 6C). We purified five of these anti-HER2 Fabs and analysis of binding kinetics by SPR confirmed high affinity binding in the sub-nanomolar range. Two of the four highest affinity Fabs (H-23 and H-35) contain only Tyr/Ser/Gly in their CDR-H3 sequences, thus confirming that high affinity binding to HER2 can be achieved with only this limited chemical diversity.
For clones selected for binding to VEGF, no single population dominates, but there is a significant proportion that contains only Tyr/Ser/Gly in CDR-H3 and a slight bias in favour of CDR-H3 sequences that also contain His residues (Fig. 5), suggesting that high affinity binding to VEGF can be achieved with only Tyr/Ser/Gly diversity, but the addition of His residues may improve affinity. As in the case of HER2, many of the anti-VEGF clones exhibited substantial inhibition of binding in the presence of 100 nM solution-phase antigen (Fig. S5D, ESI†), and the IC50 values of the 14 best clones were in the low to sub-nanomolar range (Fig. 6D). SPR analysis of four purified Fabs confirmed that a Fab that contains only Tyr/Ser/Gly (V-8) or a Fab that contains only Tyr (V-11) in the CDR-H3 loop is capable of binding to VEGF with affinity in the single-digit nanomolar range. However, we found that the highest affinity anti-VEGF Fab (V-38) contains two His residues in addition to Tyr residues in its CDR-H3 loop.
In summary, the effect of additional diversity beyond Tyr/Ser/Gly diversity in the CDR-H3 loop depends on the antigen. For the negatively charged antigens IGF-1 and insulin, the addition of positively charged Lys residues into the CDR-H3 loops produces Fabs that dominate in the binding selections and recognize antigen with higher affinities than Fabs that contain only Tyr/Ser/Gly residues in their CDR-H3 loops. In contrast, extremely high affinities can be achieved for binding to HER2 by using only Tyr/Ser/Gly diversity and additional chemical diversity has no appreciable effect on affinity. The case of VEGF appears to be intermediate between these two extremes, because Tyr/Ser/Gly diversity is sufficient to achieve affinities in the single-digit nanomolar range but the highest affinity Fab contains two His residues in its CDR-H3 loop.
Our results also confirm the favourable contributions of Ser residues acting as conformation residues, but in this case, we find that Gly is comparable to Ser for facilitating antigen recognition, albeit with somewhat reduced specificity. These results are also consistent with our previous findings that Gly residues located at key positions within synthetic antigen-binding sites can be crucial for high affinity antigen recognition.11,25 We have reduced the requirements for molecular recognition beyond even our previous minimal system of binary Tyr/Ser diversity9 and have achieved affinities in the single-digit nanomolar range using only Tyr side chains presented by combinations of Tyr/Gly main chains.
Overall, our results with synthetic antigen-binding sites are consistent with studies of natural protein–protein interactions, which have revealed that interfaces are enriched for Tyr, Trp and Arg residues that are often “hot spots” of binding energy.13–16,20,29,30 But our work goes further to provide an empirical assessment of the relative fitness of the various natural amino acids for mediating contacts in molecular recognition. Moreover, while previous studies focused on large residues that mediate intermolecular contacts, our results also highlight the importance of small residues that provide space and conformational flexibility for productive binding. Clearly, cooperation between large and small residues is critical for optimal molecular recognition, as demonstrated by the remarkably tight affinities achieved by our minimalist Fabs.
For the practical purposes of synthetic library design, our results show that Tyr/Ser/Gly diversity is sufficient for generating high affinity antibodies against some antigens, but additional chemical diversity in CDR-H3 is necessary for achieving high affinity for others. Overall, our findings suggest a library design that biases diversity in favour of Tyr/Ser/Gly residues but also adds small quantities of other amino acid types, and indeed, we have recently used this strategy to construct a highly functional synthetic antibody library.11 This library has provided numerous high affinity antibodies against diverse protein antigens and has enabled the development of exquisitely specific antibodies for targeting structured RNA,31 protein post-translational modifications,32 conformational epitopes33 and integral membrane proteins.34 Our new findings will enable further optimization of antibody design to produce synthetic repertoires with recognition capacities beyond the scope of natural repertoires.
Phage from the antibody repertoires were cycled through three rounds of binding selection against antigen or protein A coated on 96-well Maxisorp Immunoplates (NUNC, Rochester, NY), as described.11,35 Clones that bound antigen in phage ELISAs were subjected to DNA sequence analysis.
Footnotes |
† Electronic supplementary information (ESI) available: Sequences, affinities and specificities of antigen-binding Fabs. See DOI: 10.1039/b927393j |
‡ These authors contributed equally to the work. |
§ Current address: Banting and Best Department of Medical Research, Department of Molecular Genetics, and the Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario, Canada, M5S 3E1. |
This journal is © The Royal Society of Chemistry 2010 |