Sequence-specific recognition of methylated DNA by an engineered transcription activator-like eﬀector protein †

A 5mC-selective TALE-repeat was created by screening a TALE repeat library containing randomized amino acids at repeat variable diresidues and their neighboring residues. The new repeat showed high 5mC discrimination ability. An artificial TALE containing the new repeat activated an endogenous gene in a genomic methylation status-dependent manner.

The structure (PDB: 4GJP) shows an interaction between the RVD ''NG'' repeat and 5mC.Four amino acid residues (11-14), including RVD, are shown in red.(B) The TALE-o fusion protein targets the sequence containing Dcm methylated cytosine (red) on the promoter of the HIS3 reporter.The TALE contains 14.5 repeats with RVDs ''NG'', ''HD'', ''NI'', and ''NH'' for T, C, A, and G recognition, respectively.Target DNA sequences in the reporter vector are aligned to the TALE repeats.Four amino acid residues (11-14) of repeat 2 were randomized and are shown as XXXX.
the methylation discrimination ability of TALEs was evaluated only by in vitro analysis.In cells, methylated DNA regions are often bound by 5mC binding proteins and tend to form heterochromatin that may inhibit TALEs from binding.It is necessary to evaluate the function of RVD ''NG'' in living cells.In addition, it has been reported that the methylation-discrimination ability of RVD ''NG'' is insufficient to completely regulate TALE binding in a methylation-dependent manner. 17Therefore, we searched for RVDs more specific to 5mC within living cells.
To search for non-native RVDs with the desired base preferences, comprehensive analyses of potential RVDs, which covered all 400 possible combinations of amino acid diresidues, were previously performed toward A, C, G, and T. [18][19][20] However, in those studies, the specificities of those RVDs to 5mC were not evaluated.It is possible that there were RVDs that discriminate methylation of cytosine.2][23] Taking into account these points, artificial TALE repeats with ideal 5mC preferences may be generated by modifying both RVD and their neighboring residues.In the current study, a TALE repeat library containing randomized amino acids at RVD and their neighboring residues was screened for 5mC selectivity, and a highly 5mC-selective repeat was successfully identified.
For the screening of new TALE repeats recognizing 5mC, we developed a modified bacterial one-hybrid (B1H) screening that relied on the Dcm methylation system of E. coli, because of the integrity of the Dcm methylation system.Here, the binding of TALEs fused with the omega subunit of a bacterial RNA polymerase to the reporter vector allows the host E. coli to survive. 24The E. coli Dcm methylation system was used to specifically methylate the cytosine base in the target sequence of the reporter vector.Most E. coli strains contain Dcm methylase that methylates the second cytosine in the sequences CCAGG and CCTGG. 25We designed a TALE that targeted the DNA sequence 5 0 -TCCTGGTATATCCCCC-3 0 containing a Dcm methylation site (underlined) and denoted it TAL Dcm (Fig. 1B).As expected, the reporter vectors extracted from the selection strain were not cleaved by a methylation-sensitive restriction enzyme, PspGI, targeting the sequence CCTGG (Fig. S1A, ESI †).This suggested that the reporter vectors were methylated appropriately.Previous reports indicated that N-terminal repeats were more sensitive to mismatches. 26,27Therefore, to maximize the effect of the repeat corresponding to 5mC, the Dcm methylation site was placed at the 5 0 -end of the TAL Dcm target sequence.To select 5mC-specific TALE repeats, a TAL Dcm library was generated by randomizing four residues (RVD and their neighboring residues) of repeat 2 that corresponded to Dcm methylated cytosine (Fig. 1B).Subsequent B1H screening gave several TALE repeats, but no specific sequence pattern was identified from the obtained mutants (Fig. S2, ESI †).Therefore, the DNA binding preferences of all mutants were evaluated individually by luciferase reporter assays by expressing a TAL Dcm -based transcription activator in HeLa cells.As reporter plasmids, 3 Â TAL Dcm binding sites were inserted at the promoter of the luciferase gene, creating 3 Â TAL Dcm /pGL3.The plasmids were prepared using Dcm (À) and Dcm (+) E. coli strains, resulting in an unmethylated and methylated status of the second cytosine bases in the 3 Â TAL Dcm binding sites, respectively (Fig. S1B, ESI †).
Initially, we evaluated the methylation discrimination ability of pre-existing RVDs.As expected, TAL Dcm with a C-specific RVD ''HD'' at repeat 2 showed significantly higher activity for C than for 5mC.In contrast, comparable activation levels of the C and 5mC reporters were observed when using RVD ''NG'' at repeat 2 (Fig. S3A, ESI †).These results are in good accordance with the results of an electrophoretic mobility shift assay (EMSA) (Fig. S3B, ESI †), confirming that the discrimination ability of RVD ''NG'' is not always sufficiently high.
Next, the discrimination ability of all mutants selected from the B1H screening for a methylated cytosine was assessed by luciferase reporter assays.Some mutants showed better discrimination ability than RVD ''NG'', but their activation levels were intolerably low (Fig. S4A, ESI †).Intriguingly, the three mutants with ''QSAA'', ''RNAA'', or ''RMAA'' repeats, having the consensus sequence ''XXAA'', showed relatively high 5mC selectivity.Subsequently, we created an ''XXAA'' library and screened it by B1H screening.Several of the TALE repeats that were obtained showed a high ability to discriminate 5mC from C. Among them, the ''ASAA'' repeat showed the highest activity for the 5mC reporter (Fig. 2 and Fig. S4B, ESI †).TAL Dcm with the ''ASAA'' repeat activated the luciferase gene in proportion to the methylation percentage of the reporter vectors (Fig. S5, ESI †).This result indicates the methylation-dependent base recognition of the ''ASAA'' repeat.
The methylation discrimination ability of TAL Dcm with the ''ASAA'' repeat in living cells was confirmed by real-time monitoring of luciferase luminescence (Fig. S6, ESI †).At each time point, luciferase activity was always higher in the cells transfected with the methylated compared to the unmethylated reporter.In addition, luciferase activity was greatly reduced for the reporter vector containing mutated TAL Dcm binding sites, indicating that introduction of the ''ASAA'' repeat does not impair the overall sequence-specificity of the original TALEs.
EMSAs also supported the methylation-discrimination ability of the ''ASAA'' repeat (Table 1).Specifically, TAL Dcm with the ''ASAA'' Fig. 2 Base specificity of the ''ASAA'' repeat.Luciferase reporter activities of TAL Dcm having ''ASAA'' or RVD ''NG'' at repeat 2 for the reporter vectors with C and 5mC binding sites (blue and orange, respectively).Luciferase activities were normalized to that of TAL Dcm with RVD ''NG'' for the 5mC reporter.
This journal is © The Royal Society of Chemistry 2016 repeat showed 1.9-fold stronger binding to 5mC than C, whereas TAL Dcm with the RVD ''NG'' repeat showed 1.2-fold stronger binding, although the dissociation constant of TAL Dcm with the ''ASAA'' repeat for 5mC was higher than that with the RVD ''NG'' repeat.
In mammals, cytosine methylation mainly occurs at CpG dinucleotide sites.Therefore, to verify whether the ''ASAA'' repeat could also recognize 5mC within the CpG context, we designed a TALE that targeted a CpG methylation sequence (Fig. S7A, ESI †).EMSAs showed that the TALE with the ''ASAA'' repeat had a lower dissociation constant for the 5mC target than for C, while the TALE with the RVD ''NG'' repeat instead of the ''ASAA'' repeat had comparable dissociation constants between 5mC and C targets (Table 2).These results indicate that the ''ASAA'' repeat also preferentially recognizes 5mC within the CpG context, and that the discrimination ability of the ''ASAA'' repeat is higher than that of the RVD ''NG'' repeat.Unfortunately, a TALE with two ''ASAA'' repeats failed to preferentially bind to the target DNA containing two 5mC (Fig. S7B and C,  ESI †).This may be because the affinity of the ''ASAA'' repeat to 5mC is not very strong.
Finally, we explored the ability of the ''ASAA'' repeat to regulate endogenous gene expression dependent on the methylation status of genomic DNA.A TALE targeting an endogenous gene, Ras association domain-containing protein 2 (RASSF2), was designed and denoted TAL RASSF2 .RASSF2 works as a tumor suppressor gene and the RASSF2 protein induces apoptosis in tumor cells. 28,29n many colorectal tumor cell lines, the promoter region of RASSF2 is highly methylated and thus RASSF2 hypermethylation is a potential marker for cancer diagnosis. 30SW480 and HCT116 cells were reported to have different RASSF2 methylation statuses. 28isulfite sequencing showed that RASSF2 was highly methylated in SW480 but not in HCT116 cells (Fig. S8, ESI †).TAL RASSF2 , which targets the RASSF2 promoter region including CpG dinucleotides, was fused to a transcription activator, p300 histone acetyltransferase, to activate RASSF2 by binding TAL RASSF2 to the target genomic region (Fig. 3A).TAL RASSF2 -p300 was expressed in SW480 and HCT116 cells, and the expression levels of RASSF2 mRNA were evaluated by RT-qPCR.When using the ''ASAA'' repeat, significant gene activation was induced only in SW480 cells that had a highly methylated RASSF2 promoter (Fig. 3B).This result was not due to differences of cell types or chromatin states because TAL RASSF2 with the RVD ''NG'' repeat instead of the ''ASAA'' repeat activated the gene in both cell types (Fig. S9, ESI †).Therefore, the result suggested that the ''ASAA'' repeat contributed to the selective binding to 5mC.
In conclusion, we successfully created a highly 5mC-selective TALE repeat by Dcm methylation-dependent B1H screening of the TALE library.The new repeat showed high 5mC discrimination ability even for genomic DNA.One reason for the successful screening targeting 5mC may be the expanded randomization strategy.Although further improvement of the affinity is desired, the new repeat enabled us to use TALEs as an easily designable tool to detect 5mC at user-defined sites in living cells.For example, fluorescently labeled TALE has been used to visualize endogenous sequences to study chromatin dynamics. 31,32Using 5mC-selective TALEs, time-lapse observations of the methylation status at specific sites can be realized.4][35] Thus, methylation status-dependent   gene regulation is possible using the 5mC-selective TALE repeat in contrast to the existing 5mC identification methods.In addition, there are many kinds of modified nuclear bases besides 5mC. 36,37Our strategy to obtain new functional TALE repeats is applicable to other modified nuclear bases.This study should provide new ways of exploring the biological functions of 5mC and other modified nuclear bases.We thank Feng Zhang for the plasmids used to construct the TALEs, Warner Greene for plasmids, Scot Wolfe for plasmids and cells for B1H screening, and Hiromu Suzuki for the SW480 and HCT116 cells.This work was supported in part by JSPS KAKENHI 16H03281 (M.I.) and 15J09770 (S.T.), JST CREST and the Naito Foundation.

Fig. 1
Fig. 1 Schematic representation of the bacterial one-hybrid screening for 5mC-specific TALE repeats.(A) DNA recognition mode of a TALE repeat.The structure (PDB: 4GJP) shows an interaction between the RVD ''NG'' repeat and 5mC.Four amino acid residues (11-14), including RVD, are shown in red.(B) The TALE-o fusion protein targets the sequence containing Dcm methylated cytosine (red) on the promoter of the HIS3 reporter.The TALE contains 14.5 repeats with RVDs ''NG'', ''HD'', ''NI'', and ''NH'' for T, C, A, and G recognition, respectively.Target DNA sequences in the reporter vector are aligned to the TALE repeats.Four amino acid residues (11-14) of repeat 2 were randomized and are shown as XXXX.

Fig. 3
Fig. 3 Methylation status-dependent activation of an endogenous gene by TAL RASSF2 -p300.(A) Design of TAL RASSF2 -p300 that targets the RASSF2 promoter region.The target sequence of TAL RASSF2 is highlighted with light blue.Target 5mC is colored in red.(B) Twenty-four h after TAL RASSF2 -p300 (''ASAA'') transfection, relative expression levels of RASSF2 mRNA in HCT116 and SW480 cells were examined by RT-qPCR.The expression levels were normalized to those of GAPDH.Data are expressed as means AE SD. n = 3; *P o 0.05.

Table 1 K
d values of 11.5TAL Dcm a having ''ASAA'' or RVD ''NG'' at repeat 2 Because of the difficulty in purifying the proteins, EMSAs were performed using 11.5TAL Dcm obtained by truncating three C-terminal repeats from TAL Dcm .
a b Determined by EMSA.

Table 2
a Determined by EMSA.