Pu
Guo
a,
Abdelbasset A.
Farahat
ab,
Ananya
Paul
a,
David W.
Boykin
a and
W. David
Wilson
*a
aDepartment of Chemistry, Center for Diagnostics and Therapeutics, Georgia State University, 50 Decatur St SE, Atlanta, GA 30303, USA. E-mail: wdw@gsu.edu; Tel: +1 404-413-5503
bDepartment of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura 35516, Egypt
First published on 2nd November 2021
This report describes a breakthrough in a project to design minor groove binders to recognize any sequence of DNA. A key goal is to invent synthetic chemistry for compound preparation to recognize an adjacent GG sequence that has been difficult to target. After trying several unsuccessful compound designs, an N-alkyl-benzodiimidazole structure was selected to provide two H-bond acceptors for the adjacent GG-NH groups. Flanking thiophenes provide a preorganized structure with strong affinity, DB2831, and the structure is terminated by phenyl-amidines. The binding experimental results for DB2831 with a target AAAGGTTT sequence were successful and include a high ΔTm, biosensor SPR with a KD of 4 nM, a similar KD from fluorescence titrations and supporting competition mass spectrometry. MD analysis of DB2831 bound to an AAAGGTTT site reveals that the two unprotonated N of the benzodiimidazole group form strong H-bonds (based on distance) with the two central G-NH while the central –CH of the benzodiimidazole is close to the –CO of a C base. These three interactions account for the strong preference of DB2831 for a -GG- sequence. Surprisingly, a complex with one dynamic, interfacial water is favored with 75% occupancy.
As part of a major effort to re-design AT-specific heterocyclic cations for recognition of G·C bps as well as A·T bps we now have modules that strongly and specifically bind single G·C bps in an AT sequence context.2 Such new compounds have broad potential applications, for example, in targeting transcription factors (TFs) to modulate gene expression.3 Mutations, aberrant regulation, or other disorders that modify the activity of TFs lead to a number of different kinds of diseases.4 Given the variety of roles and diseases controlled by TFs, a strong interest in targeting them to modulate their activity with small molecules has developed.4,5 A problem with this approach is that TFs have evolved to bind high molecular-weight nucleic acids but not, typically, small molecules.5 Binding sites on TFs that interact strongly and specifically with small molecules are difficult to find and TFs are often defined as “undruggable”.5 We are pursuing an entirely different approach to target TF–DNA complexes by designed agents to bind to specific DNA sequences and to subsequently inhibit the promoter sequence interactions and functions of specific TFs.
As an example, expression of the TF PU.1 is frequently impaired in patients with acute myeloid leukemia (AML).3,6 We developed a series of PU.1 inhibitors based on the AT-rich 5′-flanking sequence of many conserved PU.1 promoter sequences, which have a central -GGAA-.3,7 As ligands and PU.1 do not directly compete for the same DNA binding site, inhibition of major groove binding TFs by minor groove binding small molecules is a multipart mission.3,7 To improve our PU.1 targeting ability, additional GC-specific compounds that can recognize a broad selection of DNA sequences throughout the PU.1 promoter as well as promoter sequences for other TFs (Fig. 1A and B) are needed. Binding specifically to such sequences will be very useful in biotechnology and in the design of new therapeutic agents.
Currently, for example, we lack specific binders which can target the conserved -GGAA- promoter binding site of PU.1. To bind strongly and specifically to different, multiple GC-containing sites, new types of cell-permeable DNA binding agents must be engineered. The conserved GGAA site is a logical next step in ETS protein targeting. The success of single G·C bp binders gives us a powerful combination method to design new types of compound structures to match the requirements of multiple G·C bps recognition.
The PU.1 recognition sequence has a number of sites with different GC-containing sequences. The central λB PU.1 promoter sequence is: a typical 5′AT-rich PU.1 promoter sequence. Heterocyclic organic cations to target the GG sequence have not previously been possible to prepare. Targeting the central -GGAA- component of the PU.1 promoter gives us the key critical site for inhibition of ETS TF binding and is essential for complete PU.1 promoter recognition. One way to target the -GGAA- binding site is to design or combine two single G·C bp recognition units in close proximity to bind the two adjacent GC bps with flanking AT sequences.
A new design concept was used to engineer DB2830 (Fig. 1B) with two directly linked thiophene-N-i-Pr-BI modules. The compound, however, has relatively weak binding with the test GG sequence, AAAGGTTT (Fig. 1B). A curvature evaluation mechanism (described below) suggested that the DB2830 structure was too curved to bind optimally in the DNA minor groove. Considering the curvature issue of the molecules and the short distance between two G·C bps, we introduced the benzodiimidazole structure in DB2831 to provide two H-bond acceptors for G-NH2 with the appropriate curvature for minor groove recognition. The compound was successful with strong and specific binding to the GG sequence. The effects of N-substituents, amidines on different aromatic groups, and chloro-substituents were also investigated by modifying the DB2831 chemical structure and are also reported here. We have used the hydrophobic N-substituents to potentially enhance the cell uptake as these compounds are targeted to inhibit the PU.1 transcription factor. The compounds are already relatively polar with two charges and have low KD values.
The new benzodiimidazole DNA recognition structure is based on the single GC binding modules in DB2429 and DB2457 (Fig. 1A), which have a thiophene-N-alkyl-BI interaction that provides a recognition unit best characterized as a σ-hole, a successful tool for designing G-NH2 minor groove binding modules.2 Compounds that incorporate the σ-hole motif (thiophene N-RBI) are a significant step forward in our molecular design and synthesis project for the recognition of mixed bp DNA sequences.2 The σ-hole interaction preorganizes the thiophene-N-R-BI unit for GC interaction in the minor groove. The σ-hole module is regarded as an essential component of the thiophene N-R-BI type G·C bp binders and is part of the benzodiimidazole module.
Scheme 1 Reagents and conditions: (a) amines/EtOH, rt; (b) NaBH4, Pd(C), CH2Cl2, MeOH; (c) Na2S2O3/DMSO, 130 °C; (d) (i) LiN(TMS)2/THF, (ii) HCl/E. |
DB2830, a linked-benzimidazole–thiophene, was designed to target the two adjacent G·C bp sequence by two adjacent single G·C binding modules. This planar structure is a new development in recognition of the DNA minor groove (Fig. 1B). The compound, however, shows only a moderate binding affinity with our desired DNA binding site, AAAGGTTT (ΔTm = 6), due to an excessive curvature for the minor groove. To reduce the curvature of the compound for a better match to the minor groove surface, an entirely new structure with a more planar core, benzodiimidazole-bisthiophene was designed and synthesized, DB2831 (Fig. 1B). The benzodiimidazole-bisthiophene, with a central fused ring, is a new idea for DNA recognition, especially minor groove binding. DB2831, is found to exhibit high stabilization towards AAAGGTTT (ΔTm = 10), and AAACGTTT (ΔTm = 10). In addition, DB2831 shows a weak stabilization potential for both pure AT and single G·C bp containing sequences (Table 1), illustrating the excellent sequence selectivity of the compound.
AAA | AAA | AAA | AAA | AAA | AAA | |
---|---|---|---|---|---|---|
TTT | G | GC | CG | GG | GTG | |
TTT | TTT | TTT | TTT | TTT | ||
a ΔTm = Tm (the complex) − Tm (the free DNA). 3 μM DNA sequences were studied in Tris–HCl buffer (50 mM Tris–HCl, 100 mM NaCl, 1 mM EDTA, pH 7.4) with the ratio of 2:1 [ligand]/[DNA]. An average of two independent experiments with a reproducibility of 0.5 °C. Full DNA sequences as described in Fig. 1C. | ||||||
DB2830 | <1 | 5 | 2 | 4 | 6 | 3 |
DB2831 | 1 | 3 | 4 | 10 | 10 | 2 |
DB2833 | 1 | <1 | 1 | 4 | 4 | 1 |
DB2834 | <1 | 2 | 1 | 5 | 5 | <1 |
DB2835 | <1 | <1 | <1 | 4 | 2 | 1 |
DB2836 | 1 | 2 | 2 | 8 | 8 | 1 |
DB2838* | <1 | 1 | <1 | 1 | <1 | 1 |
We have previously observed that, in single G·C bp binding sequences, bulky N-alkyl substituents always facilitate the sequence selectivity for comparatively wider minor groove sequences.2 With this idea, an isobutyl derivative of DB2831, DB2833, was prepared to determine the N-alkyl substitution effect on binding. Surprisingly, this bulky substitution caused a marked decrease in binding affinity for DB2833 (Table 1 and Fig. 1B) and this substitution was discontinued.
Truncated compounds with terminal thiophene amidines (DB2834 and DB2835) were synthesized and tested to see how the molecular size of the compound affects sequence binding affinity and selectivity. The low ΔTm values indicate that terminal phenyl groups play significant roles for DB2831 affinity. A 2-Cl phenyl amidine proved to be an effective modification to increase the binding specificity of single G·C binders.2 For this reason, DB2836 was designed and synthesized and showed strong binding to AAAGGTTT (ΔTm = 8) with excellent selectivity. DB2838 with N–Me substituents was also prepared and tested, but that compound provided the surprising result that no binding was detected with the selected sequences. Analysis with organic solvents suggested extensive aggregation of this compound under the experimental conditions accounting for the lack of binding.
AAA | AAA | AAA | |
---|---|---|---|
G | GG | GTG | |
TTT | TTT | TTT | |
a All the results in this table were obtained in Tris–HCl buffer (50 mM Tris–HCl, 100 mM NaCl, 1 mM EDTA, 0.05% P20, pH 7.4) at a 100 μL min−1 flow rate. NB means no measurable KD under our experimental conditions, see Fig. 2D and E for examples. The listed binding affinities are an average of two independent experiments carried out with two different sensor chips and the values are reproducible within 10% experimental errors. Full DNA sequences as described in Fig. 1C. | |||
DB2830 | 571 | 553 | NB |
DB2831 | NB | 2 | NB |
DB2833 | NB | NB | NB |
DB2834 | NB | 286 | NB |
DB2835 | NB | NB | NB |
DB2836 | NB | 62 | NB |
DB2838 | NB | NB | NB |
Fig. 2 (A and B) SPR sensorgrams (color) and global kinetic fits (black overlays) for DB2831 with the AAAGGTTT DNA hairpin sequence at 100 mM and 200 mM NaCl; the concentrations of DB2831 in these SPR experiments are 5–30 nM from bottom to top. (C) Salt dependence of KA for DB2831 binding as determined by SPR. The KA values were obtained by 1:1 kinetic fitting; (D and E) representative SPR sensorgrams for DB2831 in the presence of AAAGTTT, and AAAGTGTTT hairpin DNAs. The concentrations of DB2831 in these SPR experiments are 5–500 nM from bottom to top. Full DNA sequences as described in Fig. 1C. |
The SPR binding results revealed that DB2831 has an optimized size and curvature for selective recognition and strong affinity for two adjacent two G·C bps in an A-tract sequence.
Biosensor-SPR experiments are well-suited for the kinetic and thermodynamic analysis of many types of interactions. The main limitation of this method is mass transfer for tightly bound ligand–DNA interactions as observed for the DB2831-AAAGGTT complex at 100 and 200 mM NaCl concentrations (Fig. 2). Difficulties with DB2831–DNA complexes include (i) mass transfer limits on kinetics, where the rates of transfer of components from the injected solution to the immobilized component is slower than the association reaction,12 (ii) very slow dissociation rates due to rebinding during the dissociation phase, and (iii) limited time for the association reaction due to volume limitations in the injection syringe, have been observed for our compound of interest, DB2831. To overcome the mass transfer problem for DB2831 with AAAGGTTT complex, we have conducted SPR experiments at different salt concentrations (from 100 to 400 mM NaCl concentrations) at 25 °C (Fig. 2A, B and S1†). The equilibrium association constants (KA) obtained by 1:1 kinetic fit also point to the relatively rapid dissociation rates of this complex at higher salt concentrations. According to the counterion condensation theory,13 the logarithm of the equilibrium binding constants KA (=1/KD) (from kinetic fits) should be a linear function of the logarithm of salt is reasonable for a dication on DNA complex formation.13. The KA values decrease significantly as the salt concentration increases as is typical for DNA–cation complexes.13 The slopes of the linear fits are ∼1.8 which reasonable for a dication on DNA complex formation. The number of phosphate contacts (Z) between DB2831 and hairpin duplex DNA can be obtained by slope/Ψ (Ψ = fraction of phosphate shielded by condensed counterions and is 0.88 for double-stranded B-DNA),14 and this gives a Z of 2 ± 0.2. Thus, there are two phosphate contacts between DB2831 and DNA which is a very realistic value for this dicationic molecule. The two thiophenes cause a second problem with DB2831 because of their binding to the sensorchip surface. Due to this problem, we could only work with these compounds at low concentrations with biosensor chips (Fig. 2). The sensorgrams become increasingly distorted as the concentration is increased above 30 nM for DB2831.
Removal of terminal phenyl rings of DB2831 (DB2834 and DB2835) causes a large decrease in binding (KD = 286 nM). The initial compound, DB2830, also binds weakly to AAAGGTTT (KD = 553 nM), which strongly supports the thermal melting results. The replacement of i-Pr with i-Bu substituents (DB2833) causes a considerable reduction in binding ability in agreement with Tm results. DB2836 with –Cl modification at the ortho position of the amidines keeps the excellent binding specificity, though the binding affinity with AAAGGTTT, KD = 62 nM, is reduced.
Fig. 3 Fluorescence emission spectra for 10 nM DB2831 titrated with sequence AAAGGTTT (A), or AAAGTTT (C) in TNE100 buffer at 25 °C. The excitation wavelength is 390 nm, respectively. The slit widths are [20, 20 nm]; fluorescence binding curve between 10 nM DB2831 and tested sequences in TNE100 buffer to determine equilibrium constant. Full DNA sequences as described in Fig. 1C. |
Fig. 4 Circular dichroism spectra for the titration of representative compounds, DB2831, DB2834 with a 5 μM AAAGGTTT sequence in TNE100 buffer at 25 °C. Arrows indicate the changes. Full DNA sequences as described in Fig. 1C. |
Analysis of a range of strong binding minor groove binding compounds by this method provides a calibration value of around 140° ±5° for compounds to bind strongly in the DNA minor groove. As can be seen, DB2831 falls in this range while DB2830 is too curved to make optimum contacts with the groove. While this is a relative comparison, it is a useful number to determine before synthesis of a new compound. It is also helpful in relative comparison of compound-minor groove binding constants. It should be noted that the optimum value for curvature is a range since both DNA and the compound can make structural changes to optimize the binding energetics on complex formation.
Phenyl –CH interactions with dT and dC –CO and dA-N3 groups provide some additional stability for the ligand–DNA complex. No 180° rotational motions are observed for the phenyl groups of DB2831 throughout the 600 ns MD simulation, as expected for the optimum indexing of the compound. DB2831 tracks optimally along the minor groove with an appropriate twist to match the minor groove curvature. The strong binding results from the H-bond network between the compound and DNA as well as electrostatic and van der Waals interactions. Extensive interactions are formed by the conjugated aromatic system of DB2831 with the sugar–phosphate walls of the minor groove. There is also an extensive terminal amidine–water network linking the compound to the floor of the minor groove.
The ETS family is attractive for inhibitor development because many members of the family are well-characterized and have important functions in cell biology and human diseases. Several key promoters of the PU.1 ETS TF have AT sequences on the 5′ side of the -GGAA- central, conserved recognition site.7 The AT sequence is targeted by many known minor groove binders from netropsin to synthetic heterocyclic cations such as Hoechst dyes and heterocyclic diamidines. In an exciting development with new synthetic diamidines that have extended AT recognition sequences, we have found PU.1 inhibitors active at the cellular level, including against AML cells, as well as against an animal model of AML.3,7 With this approach to disease treatment, the application potential of the heterocyclic diamidine platform is greatly extended. To reach the full potential of diamidines, however, it is essential to expand their sequence recognition capability past pure AT sequences. To do this an entirely new class of minor groove binders that can recognize mixed base-pair sequences, including the -ETS GGAA promoter sequence2 is required. To accomplish this goal, we have initiated a project to add new mixed bps DNA binding motifs with variations in solubility, chemical properties, and cell uptake properties to provide the best chance of successful cellular TF inhibition.
Our design motif is built around heterocyclic diamidines that have good solubility, cell uptake, and reasonable synthesis. DB2429, DB2457 (Fig. 1A), and analogs were successful thiophene compounds in recognition of AT sequences with a single G·C bp.2 In those compounds, the thiophene-N-alkyl-BI motif formed a preorganized sigma-hole stabilized conformation for minor groove-specific binding and that concept has formed the basis of the new compounds described here.2
With the successful preparation of new synthetic agents that recognize the AT sequence of the PU.1 promoter,7 the next most important sequence for the design of new compounds was the central, conserved 5′-GGAA-3′ and closely flanking regions. The key to strong selective binding to this sequence for minor groove agents is the GG unit that has proven very difficult to target. For this recognition, two GC binding modules must be linked very close together. After evaluation of all of the successful, single G·C bp binding modules previously prepared, the seven compounds of Fig. 8 (from DB2818 to DB2824) linked through the benzimidazole nitrogen that faces away from the amidine were prepared. The goal with these compounds was to maintain the alkyl-benzimidazole-thiophene G·C bp recognition module in an arrangement that could stack together in the minor groove to bind to the two adjacent G·C bps unit with flanking A·T bps recognized by the terminal, substituted phenyl groups. Surprisingly, all of these compounds had very poor solution properties with extensive aggregation in aqueous solutions that prevented their successful use. Extensive compound and solution variations were not successful in providing useful non-aggregated compounds.
Fig. 8 Two benzimidazoles linked compounds with the goal to form a stacked complex to bind two adjacent GC bps. |
With the failure of this design concept, new ideas were evaluated and DB2830 with two alkyl-BI thiophene amidines in close proximity was prepared (Fig. 1B) for adjacent G·C bp binding. Although the compound was the first of our diamidines to successfully recognize adjacent GC bps, the binding was weaker than desired for cellular use. As noted in the results section, the compound is too curved for best fit to the minor groove shape and new compound structures were designed. DB2831 is a more extended and less curved structure than DB2830. There are no known benzodiimidazole minor groove binders, but modeling studies suggested that group could be a key element in minor groove binding, particularly at GG sites. A successful synthetic strategy was designed for DB2831, and analogs (Fig. 1B) and the compound displayed excellent affinity and specificity for the target 5′-AGGAA sequence of the PU.1 promoter. As described in the Results section, DB2831 had close to the ideal curvature for minor groove binding and represents a breakthrough in our design efforts.
A summary of the primary experimental results for DB2831 includes a high ΔTm with the test -AAGGTT- sequence. Biosensor SPR studies support the strong binding of DB2831 to the test sequence with a KD of (2 ± 2) nM. This value is confirmed by a similar KD from fluorescence titration experiments. Competition mass spectrometry results also support strong binding and very clearly demonstrate the excellent binding selectivity of DB2831 to -AAAGGTTT-. A minor groove binding mode is indicated by CD and modeling studies. The minor groove binding is also expected from the compound structure and DNA sequences to which it binds.
MD analysis of DB2831 bound to an AAAGGTTT site reveals some very interesting features of the complex that are difficult to obtain from experimental analysis. The compound fits well between the walls of the minor groove and is able to twist to match the groove curvature (Fig. 7). The fit to the floor of the minor groove is more complex. The two unprotonated N of the N-isopropyl-benzodiimidazole group form strong H-bonds (based on distance) with the two central dG-NH (dG6 and dG7) that project into the minor groove. In addition, the central –CH of the benzodiimidazole that faces into the groove is close to the –CO of dC19 which H-bonds to dG6 and forms a stabilizing interaction. These three interactions between the benzodiimidazole and two G·C bps account for the strong preference of DB2831 with the minor groove of a -GG- sequence. Through the MD simulation, the benzodiimidazole and two thiophene groups remain in close proximity to the floor of the minor groove. The two sulfur atoms of the thiophene are an average of 3.3 ± 0.2 A from the floor of the groove and the two AT base pairs that are adjacent to the -GG- sequence. As shown in the models in Fig. 7, the terminal phenyl amidines of DB2831 are much more dynamic than the thiophene–benzodiimidazole–thiophene center of the bound molecule. A complex without any interfacial water involvement is formed with two inner facing amidine –NH groups forming H-bonds with the dTO of dT9 and dT21. While this complex would seem to be the optimum, surprisingly it is found in only approximately 10% of the simulation. It clearly is not the minimum Gibbs energy. A complex with one interfacial water with a dynamic amidine (–NH)–water–dTO interaction is the most favored with a 75% occupancy. The final complex has interfacial water molecules at both amidine groups to link the amidines to –dTO groups. This complex has approximately 15% occupancy and is closer in Gibbs energy to the complex with no interfacial water. Surprisingly, the central thiophene-benzodiimidazole-thiophene center of the complex remains in a very stable position throughout the simulation. The flexibility to allow 0, 1, or 2 interfacial waters of interaction comes about due to single bond rotations of the bonds linking the terminal phenyl groups to the thiophene and amidine groups. It seems clear that dynamic, terminal interfacial water molecules can cost a minimum amount of entropy of complex formation while allowing stronger compound–DNA interactions than in their absence. A challenge for drug design will be to determine how to incorporate this type of dynamic water interaction into design efforts.
Previously, two of the N–MeBI-thiophene-phenyl units were linked with alkyl chains to create DB2528 (Fig. 1A) and analogs for the desired recognition of two GC base pairs separated by AT base pairs.2 DB2528 strongly and specifically binds to the target sequence, AGAAACA, in agreement with the length of the three-methylene linker. Using the curvature procedure described in the Results section, we obtain a value of 124° for DB2528. DB2528, however, binds to the -AAGAAACTT binding site very strongly with a KD of 5 nm. The shape and curvature of DB2528 clearly match the DNA minor groove to allow the N–MeBI-thiophene-phenyl units to bind to both G·C bps. This is unique to compounds with a structure like DB2528: (i) all four H-bonding groups (BI acceptor for G-NH2 and amidine donor for AT H-bonds) are at the periphery of the molecular structure and interact strongly with the floor of the minor groove; (ii) the central section of the molecule, thiophene-phenyl-linker essentially stacks with the walls of the minor groove, and this is not as distance-dependent as H-bond formation. In addition, the flexibility of the central linker allows the compound to twist to match the minor groove shape and can also alter the molecular curvature calculated by our procedure based on a rigid molecular structure. Although DB2528 has a relatively large size, its excellent solubility and fluorescence properties make it an attractive compound, along with DB2831 for recognition of two G·C bps in an AT context.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1sc04720e |
This journal is © The Royal Society of Chemistry 2021 |