Photoaffinity labeling of transcription factors by DNA-templated crosslinking

A dual-probe system can specifically capture DNA-binding proteins with an unmodified binding site.


Introduction
Transcription factor (TF) is the major class of DNA-binding proteins that recognize and bind to specic double strand DNA sequences. [1][2][3] By binding to DNA, transcription factors modulate transcription levels of target genes and play central roles in many fundamental biological processes, usually in response to various exogenous and endogenous cellular signals in both healthy and disease states. 1,[4][5][6][7] Consequently, transcription factors have been intensively pursued as drug targets in pharmaceutical research. [8][9][10][11] Characterization of TF-DNA interactions is instrumental in elucidating transcription factors' regulatory mechanisms. Previously, many methods have been developed to identify known transcription factors' binding DNA sequences, 12 such as footprinting, 13,14 electrophoresis mobility shi assay (EMSA), 15,16 chromatin immunoprecipitation (ChIP), 17 chemiluminescent pull-down assay, 18,19 protein binding microarray, 20 and HT-SELEX. 21 Once a transcription factor's binding DNA sequence is known, it can be embedded into various probes for sensitive detection, such as the bimolecular proximity assay, [22][23][24][25] proximity-ligation assay, 26,27 nuclease protection assay, 28 transcription factor beacon, 29 uorescence recovery assay, 30 and enzyme amplication assay. 31 On the other hand, characterization of unknown transcription factors that bind to specic DNA sequences is also highly important. 32,33 As many transcription factors bind to DNA transiently with low affinity, 7 the interactions are oen lost during typical affinity purication; therefore covalent affinity probes equipped with chemical and photo-crosslinkers were developed, serving as a powerful tool to study protein-DNA interactions. 14,34-51 However, since the crosslinker is usually located in the protein-binding site of the affinity probe, it oen contributes to or interferes with protein binding (Fig. 1a). The probe's performance strongly depends on the nature and position of the crosslinker. 38,41,42 Considerable efforts were undertaken to minimize the crosslinker's impact by screening for the optimal position, 41,43,49,52 adjusting the crosslinker's orientation, 46,47,[52][53][54] and using smaller crosslinking groups. 47 In a recent report, Famulok and co-workers conjugated the crosslinker at the end of the aptamer so that binding interference was avoided. 40 Indeed, ideally an affinity probe should contain a protein-binding site free of modications, but is still able to specically deliver the crosslinker to the proximity of the target protein for effective labeling.
Recently we reported an affinity labeling method for identifying small molecule's target proteins, in which the functions of target recognition and covalent crosslinking are separated into two probes. 55,56 We reason that this strategy may be employed in studying transcription factor-DNA interactions to circumvent the requirement for a crosslinker within the affinity probe. Our design is shown in Fig. 1b, a native, modication-free hairpin DNA containing the bait sequence (shown in red) is used as the "binding probe" (BP). Another DNA modied with a photoreactive 3 0 -diazirine group serves as the "capture probe" (CP), which also bears a 5 0 -tag customizable for subsequent analysis (e.g. a uorophore for in-gel imaging or a biotin group for affinity pull-down). Diazirine has been widely used as the crosslinker in numerous biological applications for its small size, high reactivity, and biocompatibility; 47,57-60 it also exhibits very low non-specic protein crosslinking with moderately elevated salt concentration. 55,56 Aer the transcription factor binds to BP, CP hybridizes to the binding probe DNA and then photo-crosslinks BP-bound protein under light irradiation. BP is free of any modication so that the original protein-DNA interaction is maintained, while CP is able to deliver the crosslinker close to the target protein for efficient crosslinking.

Results and discussion
We initiated the study with a model transcription factor p50, a subunit of nuclear factor-kappa B (NF-kB) transcription factor, 61 which plays key roles in cell's immune responses to stimuli 62 and is implicated in many diseases. 63 First, a binding probe embedded with the p50-binding sequence (p50-BP) and a sequence-complementary capture probe with a 5 0 -uorescein tag (FAM-CP) were prepared ( Fig. 2a and see details in Fig. S1 †). The mixture of p50-BP, FAM-CP, and p50 was irradiated under 365 nm before denaturing SDS-PAGE analysis (Fig. 2b). Results show that p50 can be specically labelled by the dual probe (lane 4). Two product bands were observed in lane 4: by comparing with the standard samples in lane 2 and 3, they are considered to be the p50-CP conjugate (p50-CP, lower band) and the DNA duplex formed by the p50-CP and BP DNA (p50-CP/BP, upper band), as a DNA duplex may partially renature in SDS-PAGE. We have observed and experimentally conrmed this phenomenon previously. 55 Furthermore, little non-specic labeling was observed when additional BSA was added along with the p50 protein (1 eq. in lane 8 of Fig. 2c; 10 eq. in lane 8 and 9 of Fig. S5 †). Other negative controls (without p50-BP, with a CREB-1-binding BP, and without light irradiation; lane 5, 6, and 9) also did not give noticeable p50 capture. An FAM-CP with mismatched DNA sequence for the p50-BP showed some low level of labeling, possibly resulting from the BP-CP duplex partially formed at the incubation temperature (0 C). In addition, similar labeling specicity was also observed with a 5 0biotin-tagged CP (Fig. S5 †). Collectively, these results have demonstrated that the observed p50 labeling requires both specic protein-DNA interaction and photo-crosslinking mediated by a complementary capture DNA probe.
Next, for comparison, a series of "conventional probes" were prepared with the diazirine crosslinker directly conjugated at the major groove side of the DNA duplex ( Fig. S2 †), either inside the p50-binding site (p50-T1, T2), immediately next to it (p50-T3), or 1-base away from the binding site (p50-T4; Fig. 2c). These probes were subjected to the same p50 labeling procedures as in Fig. 2b. However, in contrast to the dual probe, none of these affinity probes was able to effectively capture p50, either in buffer, in cell lysates, or in nuclear extracts (Fig. 2c & S6 †). Intrigued by this result, we further tested more transcription factors: TATA-binding protein (TBP), 64 Myc-associated factor X (MAX), 65 and CREB1. 66 Matching pairs of TF-BP/FAM-CP and several series of "conventional probes" were prepared for each transcription factor respectively (Fig. 3). These probes were subjected to the same labeling procedures as in Fig. 2 and their performances were compared. First, all pairs of BP/CPs can capture their respective protein targets (Fig. 3; lane 1 and 2) and also showed specicity similar to the p50 probes (Fig. S7 †). Interestingly, although TBP is known to primarily interact with DNA's minor groove, 67 none of the "conventional probes" (with the crosslinker in the major groove) showed detectable labeling (Fig. 3a; lane 3-8). However, MAX-T2, which has the diazirine crosslinker immediately next to the binding site, was able to capture the MAX protein (Fig. 3b), and MAX-T1 and T3, with the diazirine inside and away from the binding site respectively, showed very little MAX capture. Although MAX and CREB1 are both leucine zipper family proteins and they bind DNA's major groove very similarly, 68,69 all CREB1 probes can capture the CREB1 protein.
We reason that there may be two possible underlying reasons for these observations: (i) the diazirine crosslinker may have sterically hindered the protein binding, as suggested by several crystal structures of TF-DNA complexes; 70,71 (ii) the specic structure and conformation of the "conventional probes" do not allow for a productive crosslinking (e.g., the linker connecting the diazirine to DNA may be too short or lack sufficient exibility). 43,52 With the dual-probe method, the crosslinker may have better exibility and its spatial position can be feasibly varied to access the protein target without having to be part of the binding probe. In order to test this, we compared the labeling of p50, MAX, and TBP with BP/CP pairs having different "n values" (n represents the number of protruding or recessing nucleobases aer BP/CP hybridization; Fig. 4a). Results show that, in general, capture probes with positive n values gave higher yields than the ones with negative ones, possibly because protruding bases provide better protein access for the crosslinker (e.g.: similar to a long and exible linker). n ¼ 0 appeared to be optimal in most cases (Fig. 4b).
Collectively, these results have demonstrated that the effectiveness of probes with directly conjugated crosslinkers indeed depends on the specic probe structure and the specic protein-DNA interaction, while the dual-probe strategy is more generally applicable, and it has the advantage of having a separate, tuneable, and target-binding independent probe that can effectively capture and label the protein target.
Furthermore, we tested our method with endogenously expressed proteins. Taking advantage of the method's modularity, we used a 5 0 -biotin-tagged capture probe to pair with the existing p50-BP so that any p50-BP-binding proteins can be isolated by affinity pull-down. Aer incubation of these probes in p50-overexpressed HEK293T cell lysate, light irradiation at  365 nm, and then ultracentrifugation to remove free probes (MWCO: 50 kDa), the biotinylated species were captured by streptavidin beads. Aer elution, Western blots with anti-biotin and anti-p50 antibodies show protein bands matching the expected molecular weight of the p50-CP conjugate (Fig. 5a, lane  1; Fig. 5b, lane 2), which was not observed with a non-p50binding negative control probe. These probes have shown excellent capture specicity in cell lysate with no signicant enrichment of other proteins observed; a few protein bands appeared at high molecular weight in the anti-biotin blot, which may be from endogenous biotinylated species as they also showed up with the negative control (Fig. 5a, lane 2).
Further, we investigated whether our strategy can be used conversely to select protein-binding sequences from a "DNAencoded probe library" for a particular transcription factor target, conceptually similar to the selection of DNA-encoded small molecule libraries against protein targets. 56,72-82 Our design is shown in Fig. 6a, a "DNA-encoded probe library" contains many BP/CP pairs with different sequences. The DNA sequence of the TF-binding site (S1) in BP is encoded by the DNA sequence of the CP-hybridization site (S2). Correspondingly, the hybridization site in the complementary CP (S2 0 ) is further encoded by a 3-base sequence (S3) at a distal location. In a library selection, the transcription factor target binds to the BP which contains matching S1 sequence, then BP templates target photo-crosslinking with the complementary CP to form the protein-CP conjugate. Therefore, the original target-binding S1 sequence can be decoded by reading the base sequence in the S3 site. In order to demonstrate this, rst, a "probe library" composing of ve equal ratio BP/CP pairs was prepared; in this library, only one BP/CP pair contains the matching p50-binding site, which is encoded by a "TTT" sequence in the S3 site (see details in Fig. S8 †). This probe library was incubated with p50 and irradiated at 365 nm; the p50-CP conjugate generated was gel-puried, PCR-amplied and then sequenced. Results show that the p50-binding-encoding "TTT" was clearly enriched at the S3 site aer selection (Fig. 6b). In a second "probe library", a pair of p50-binding BP/CP, encoded by a "TGC" sequence at the S3 site, was mixed with 100-fold excess of MAX-binding BP/CP (see details in Fig. S9 †). This library was also selected against the p50 target and again the encoding "TGC" was distinctly enriched (see the ESI for details; Fig. S10 and S11 †). These selection results suggest that our strategy may be used as a selection method for the identication of target sequences for DNA-binding proteins.
Finally, we studied proteins recognizing DNAs containing 5methyl-C (mC) and 5-hydroxymethyl-C (hmC) sequences, two important epigenetic marks implicated in gene transcriptions. 83,84 We prepared binding probes containing mC and hmC sites (mC-BP and hmC-BP; Fig. 7a), respectively, and a control probe without cytosine modication (C-BP). 85 With the capture probe (C-CP), these probes were applied to pull-down experiments in HEK293T lysate overexpressing MeCP2, a well-known protein recognizing both of these two modications. 85,86 For mC-BP, Western blots showed specic enrichment of the MeCP2 protein (Fig. 7b, le and middle panels). Importantly, it was not observed with the control probe C-BP. mC-BP also specically enriched another band at $65 kD, which can be blotted by the anti-MBD1 antibody, and MBD1 is known to bind mC sites on DNA. 87,88 Similarly, for hmC-BP, specic enrichment of MeCP2 was also observed (Fig. 7c). The band at $40 kD was identied as possibly to be MBD3, another protein reported that is able to bind hmC. 86,89 In addition, pull-down experiments in lysates without protein overexpression have identied several other mC-and hmC-binding proteins ( Fig. S9 and S10 †).
Collectively, these results have demonstrated that our method  may also be extended to study 5-methyl-C and 5-hydroxymethyl-C-binding proteins in epigenetic studies.

Conclusions
In summary, we have developed a dual-probe method for characterizing transcription factor-DNA interactions and proteins recognizing epigenetic marks. By separating target recognition and capture, affinity probes can be feasibly designed to specically capture and label DNA-binding proteins without affecting the original protein-DNA interactions. Binding probes are completely native DNAs which can be rapidly prepared in large quantity by automated DNA synthesis, making this method potentially suitable for high throughput identication of DNA-binding proteins in genomic studies. 90,91 On the other hand, chip-based large-scale de novo DNA synthesis 92 could be used to prepare probe libraries with diverse sequences, suitable for selections to identify DNA binding sequences for transcription factors and other DNA-binding proteins. Currently our laboratory is actively exploring these opportunities.