Muhammad Ayaz Anwar,
Dhanusha Yesudhas,
Masaud Shah and
Sangdun Choi*
Department of Molecular Science and Technology, Ajou University, Suwon 443-749, Korea. E-mail: sangdunchoi@ajou.ac.kr; Fax: +82-31-219-1615; Tel: +82-31-219-2600
First published on 15th September 2016
The potential role of sex determining region Y-box 2 (SOX2) and octamer-binding transcription factor 4 (OCT4) are increasingly discussed in stem cell maintenance either in the context of iPSCs (induced pluripotent stem cells) generation or cancer stem cell growth. These proteins bind to the enhancer and drive the transcription of a multitude of other factors that facilitate stem cell propagation. Here, we elucidated the mechanism of changes in DNA shape and the precise role of the interaction with the proteins, which is necessary to manipulate this ternary complex. Besides bending the DNA, SOX2 drove the DNA into the A-form, whereas OCT4 preferentially shaped DNA into a B-like conformation. SOX2 binding expanded the minor groove with simultaneous shrinkage of the major grove. Greater fluctuation in the DNA and bound proteins was observed after disruption of the protein–protein interaction. Dynamic cross-correlation of DNA atoms was found to be variable, and entropy of DNA atoms from DNA-wild-type-SOX2/OCT4 (DNAWT) was the lowest among the various complexes. Moreover, essential dynamics-based conformational analysis revealed vivid conformational variation both in DNA alone and in protein bound complexes. Physical parameters such as the diffusion coefficient and dipole moment were also substantially different for DNA from the DNAWT complex. Taken together, our results establish a link between protein–protein and protein–DNA interactions, which will facilitate devising various strategies to modulate this complex in order to regulate the transcription of various proteins.
Aside from sequence-specific binding, DNA has minor and major grooves that may also be recognized by proteins in a sequence-independent manner.9 For instance, the OCT4 (octamer-binding transcription factor 4) protein, which belongs to the POU family, is composed of two distinct domains, a POU-specific domain (POUS) and the POU homeodomain (POUHD), which are connected by a flexible linker. The helix of POUS and the N-terminal part of POUHD can bind to major groove and minor groove of DNA, respectively, thereby establishing protein–DNA contacts for subsequent transcription.10 The characteristic 8-mer pattern for OCT4 recognition is ATGC(A/T)AAT, where POUS and POUHD bind to first and second half of this consensus sequence consecutively.11 This transcription factor is vital for induced stem cell pluripotency and performs its function in combination with other proteins, mostly SOX2, as determined by whole-genome chromatin immunoprecipitation analysis.12–14 The binding sites for OCT4 and SOX2 are juxtaposed in the regulatory region of many proteins, which are necessary to generate, maintain and propagate stem cells.
The consensus sequence of the binding site for the SOX (Sry-related HMG box) family is CTTTGT,15 with minor variations among cell types. The relative positions of SOX2 and OCT4 binding sites on DNA may have 0- or 3-bp separation,10 whereas OCT4 can bind to SOX17 in a shortened motif (lacking 1 bp); however, this combination lacks stem cell induction. A mutated variant of SOX17 can bind to OCT4 on canonical motif and can induce cell transformation.16
Both OCT4 and SOX2 bind to DNA in a cooperative manner, and POUS of OCT4 forms an interface with SOX2 upon binding; however, SOX2 facilitates and influences the OCT4 binding in vivo,13,17,18 and in most cases, SOX2 is the first to bind to DNA.13 In contrast, it has also been reported that OCT4 alone can bind to DNA during the initial stage of reprogramming, thus pointing to another possible mechanism of DNA recognition in a cell context-dependent manner.19 Although OCT4 and SOX2 can bind to these sites separately, in many instances, only their association activates transcription efficiently.20,21 The direct interaction between OCT4 and SOX2 is DNA dependent18 and involves the POUS helix α1 and the HMG helix α3. These observations revealed different orientation and spacing between these proteins and therefore uncovered the plausible mechanism of interaction of OCT4 and SOX2 proteins that is required for effective iPSCs (induced pluripotent stem cells) induction.22,23 To further test whether members of the SOX and OCT families evolved features to cooperatively target specific enhancer elements, a global assessment of the SOX/OCT pairing profile is highly desirable. Besides, the role of SOX2 and OCT4 are increasingly acknowledged in cancerous cells and cancer stem cells referring their vital role in cell propagation. The frequent overexpression of these proteins in various cancers and their target genes are evident in literature.24,25
Resolution of atomic structures of OCT4 and SOX2 proteins has shed light on their binding preferences;10,26,27 however, there are no studies on how the binding of one protein to DNA can facilitate the binding of the other protein, or what conformational alterations in DNA perpetuate such allosteric mechanisms. There are many studies on protein–DNA interaction, particularly, studies that are focused on protein flexibility and adaptational behavior. In contrast, how DNA modifies its structure and undergoes a conformational change (that provides a feasible framework for the whole complex to perform its functions) has long been a neglected topic. Conformational changes in DNA play a critical role in protein–DNA interactions that result in replication, transcription and DNA modification and thus regulate expression of various genes in a tissue- and condition-dependent manner. Accordingly, this study was designed to explore further details of conformational alteration in DNA allostery by means of molecular dynamics (MD) simulations.
The relative movement of the protein–DNA complex was estimated by root mean square deviation (RMSD) that measures the structural similarity between two structures after overlapping them, and we found that tetrameric complexes (DNAWT, DNAMUT and DNANS) are highly stable; however, complexes DNASYN, DNACRY, DNASOX and DNAOCT fluctuated slightly (Fig. 1A). DNASYN and DNACRY fluctuated slightly more than other systems that are having SOX2/OCT4 proteins; this effect may influence the overall RMSD value in complexes (ESI Fig. S2A and S2B†). The plausible reason includes lack of any protein bound to these systems. Moreover, these systems comprised of only DNAs, which are held-together by non-covalent H-bonds that has comparatively less influence on the structure to reduce the fluctuations. The number of hydrogen bonds (H-bonds) between two DNA strands was quite stable over time (ranged from 55 to 57), except for DNANS, which showed on average 52.3 H-bonds per frame (Fig. 1B). This result corresponds to the potential number of H-bonds that reinforce the force field capability to reproduce the empirical values in that particular sequence. The number of H-bonds between protein and DNA was proportional to the size of the protein, i.e., SOX2 is a smaller protein with a smaller number of H-bonds (ESI Fig. S3†). Nonetheless, the number of H-bonds between DNA and wild-type (WT) proteins (SOX2/OCT4) was slightly higher than in DNAMUT and DNANS, indicating stability of DNAWT. Moreover, higher energy was required to reform the H-bonds within dsDNA in DNASYN, followed by DNACRY, DNASOX, DNAWT and DNANS as per Luzar and Chandler's description of H-bond kinetics.28 DNAOCT and DNAMUT required the least energy for reforming of DNA H-bonds (ESI Table S1†). DNASYN represents the perfect B-form of DNA and should require more energy for breaking its H-bonding network. Finally, a list of residues that participate in H-bond interaction has been provided to highlight the role of various amino acids and base-pairs in protein–DNA interaction (ESI Table S2†).
Root mean square fluctuation (RMSF), which provides a measure of the atomic mobility, was employed to study conformational changes in the protein and DNA at the atomic level independently (Fig. 1C and D). RMSF for DNA was evaluated per base; this assay revealed that DNAWT displayed the least fluctuation, whereas DNAMUT, DNANS and DNAOCT showed intermediate fluctuations. Other systems, DNASYN, DNACRY and DNASOX, showed greater fluctuation. It is relevant to mention that DNAWT and DNAMUT are identical except for one amino acid, yet the mutated complex showed greater fluctuation. Similarly, proteins from the DNAWT system showed intermediate fluctuation; however, proteins alone from DNASOX and DNAOCT showed less fluctuation than did proteins from DNAWT. Proteins from DNAMUT showed greatest fluctuations particularly in SOX2 and in the POUHD region of OCT4; this result is suggestive of a crucial interaction between SOX2 and OCT4. Notably, in DNANS, SOX2 and POUHD showed greater fluctuation, whereas POUS fluctuation was similar to that of DNAWT. This relative fluctuation of different complexes is critical for stable protein–DNA binding.
In protein–DNA interactions, the role of various positively charged amino acids is crucial and has been evaluated. In DNASOX complex, the terminal loops of SOX2 (N-terminal D39-N46 and C-terminal H105-K117) showed an abrupt binding and dissociation with DNA apart from other amino acids that established a continuous interaction. Moreover, various residues from N-terminal loop (K42, R43, M45), α1 (first α-helix of the protein) (F48, M49, R53/56/57), α2 (S72, K73, G76, W79), α3 (R98) and multiple positively charges residues from C-terminal loop kept a continuous interaction with surrounding DNA residues. In DNAOCT, various amino acids were interacting with DNA through noncovalent bonding. However, the prominent amino acids include R157, T163, Q164, S180, Q181, T182, R186, K195 and N196 from POUS domain, and R242, K254, R275, N280 and Q283 from POUHD domain that were consistent in their interactions. The intermediate loop connecting these domains has shown to heavily interact with DNA since it was wrapping around the DNA. Moreover, this intermediate loop has shown interactions with other amino acids of POUS and POUHD domains, prominently, multiple residues of α2 (G172, V173) and α5 (A223, E224, L226, V227) of the POUS domain, and the α1 (R240, V241, R242 and less frequently L245) of POUHD domain. It has been observed that W277 from POUHD was also in a consistent interaction with this loop due to its large hydrophobic side chain.
In DNAWT, the C-terminal loop of SOX2 (H105-K117) was making extensive bonding network not only with DNA but also with POUS of OCT4. The consistently interacting amino acids from OCT4 are form α1 and α2 of POUS domain, such as K154, T163 and D166. Besides, substantial interactions have also been observed with K151, I158 and G161 of POUs domain. This loop has shown interaction with initial arginine residues (R40 and R43) of SOX2 as well. As it has been stated that this loop has shown moderately stable association in DNASOX complex, it was highly stable in DNAWT and no dissociation event has been observed. This is mainly due to the presence of OCT4 that helps to stabilize the complex. The amino acids that interacted with DNA were largely the same as have been reported in individual complexes.
In case of DNAMUT, SOX2 loop (H105-K117) could not establish the interaction with K154, while the interaction with T163 and D166 were preserved. Moreover, most of the interactions as observed in DNAWT were lost. This SOX2 C-terminal loop has unstable interaction with α5 of POUS and the initial region of intermediate loop of OCT4 (E219-T225). It is apparent that by mutating a positive residue with a negative residue (R113E), the stability effect due to electrostatic interaction and arginine intercalation has been wiped out, which has reduced its affinity with DNA. In DNANS complex, SOX2 C-terminal loop significantly displaced from the minor groove of DNA that abolished most of the native contacts (except T163 and D166). Similar to DNAMUT complex, this loop was establishing more contacts with α5 of POUs domain and the intervening loop that connects two POU domains. Moreover, this C-terminal loop showed a few interactions with α3 of SOX2 (R98, M102). In DNANS complex, SOX2 and OCT4 lack specific DNA binding sites, thus the strange behavior of protein–DNA interaction is not surprising.
DNAWT showed the longest time for relaxation, (i.e., τc = 44
851.2 ps) which may have correlated with a stable interaction with proteins that rigidifies the DNA core, as opposed to DNAMUT (35
815.98 ps; where the protein interaction was disrupted), which showed the smallest relaxation time with the same DNA sequence. This finding points to a different kinetic and relaxation mode for the DNA from DNAWT and DNAMUT. Moreover, it was evident that there was a linear relation between the size and nature of the complex with relaxation time in both Legendre polynomials (ESI Table S3†). DNANS has the smallest relaxation time among these complexes: less than a half of DNACRY relaxation time.
The SOX2 protein belongs to the helix-turn-helix group of proteins that can bend DNA curvature by up to 50–90° and this bending is essential for this protein's activity;29 therefore, protein-induced DNA bending is necessary to gain an insight into DNA's structural deformation. The axis bend angle of DNA was measured by CURVES+,30 which revealed that DNAWT and DNASOX maintained the bend of ∼70°. DNA curvature in other systems quickly dropped to lower values either due to the absence of SOX2 or due to SOX2-mediated protein–protein interaction (Fig. 2B).
Another way to measure stability of the complex is to evaluate the binding free energy between a protein and DNA. In this molecular mechanic Poisson Boltzmann surface area (MM-PBSA) approach, various contributions of binding free energies have been calculated by solving Poisson Boltzmann equation for solvation energies and then adjusting the hydrophobic terms empirically. Moreover, vacuum based binding free energy (ΔGvacuum) is the average interaction energy between receptor and protein, and to account for entropy, normal mode analysis could be performed. Interesting results were obtained: SOX2 showed higher binding free energy toward DNA than OCT4 did (Table 1). Proteins in the DNAWT complex showed higher stability in comparison with DNAMUT, and this result can be rationally explained as mutation renders the complex unstable, thus leading to abrogation of the transcriptional activity. The DNANS complex showed positive energy, indicating thermodynamically unfavorable binding of the protein and DNA.
| Energy terms | DNASOX | DNAOCT | DNAWT | DNAMUT | DNANS |
|---|---|---|---|---|---|
| van der Waals | −175.95 (4.09) | −338.26 (6.01) | −360.54 (6.96) | −391.37 (4.89) | −151.6 (3.4) |
| Electrostatic | −9634.8 (33.8) | −19084.8 (90.7) | −20180.4 (71.3) | −19155.18 (32.4) | −9105.5 (47.5) |
| Polar solvation | 9606.3 (24.9) | 19 138.4 (74.8) |
20 140.26 (76.6) |
18 994.48 (27.3) |
9195.5 (38.05) |
| Nonpolar solvation | −109.7 (0.7) | −224.27 (2.4) | −49.13 (0.34) | −261.06 (2.04) | −100.2 (1.9) |
| Dispersion | 225.5 (1.4) | 465.08 (4.8) | 449.8 (13.8) | 510.5 (4.4) | 227.1 (2.06) |
| Emolecular-mechanics | −9810.8 (32.14) | −19423.02 (90.8) | −20540.97 (73.8) | −19546.43 (31.8) | −9257.06 (49.7) |
| ΔGsolvation | 9722.07 (24.6) | 19 379.23 (75.6) |
20 091.13 (76.4) |
19 243.89 (27.7) |
9322.4 (37.96) |
| ΔGbinding | −88.73 (10.4) | −43.8 (18.04) | −360.54 (6.96) | −302.54 (17.8) | 65.3 (20.4) |
A better insight into the configurational space sampled by the DNA during the simulations can be obtained by calculating configurational entropy of DNA atoms along the trajectories. The entropy of DNA was calculated in two ways, the Schlitter formula and quasi-harmonic approximation analysis (Fig. 3A and B) and both method relied on covariance matrix generation after superposition of DNA heavy atoms over the trajectory. The fluctuation from this covariance matrix then used to estimate the entropic values. The entropies in all cases reached a plateau value, but DNASYN and DNACRY showed higher entropies on average. DNA bound to a single protein or a mutated protein and non-specific binding all showed almost identical values. DNA from the DNAWT complex was an exception: it showed the least entropy in both instances. The specific binding can shrink the configurational subspace; this notion is evident in our results and in the literature.32 In DNASOX and DNAOCT, SOX2 and OCT4 also bound to the specific regions, but the absence of a cognate protein could not force the DNA into a lower entropic state.
Furthermore, DNA is a charged molecule that bends in an asymmetric manner, creating a dipole moment, which is crucial for its activity. Therefore, the dipole moment for the heavy atoms of DNA was calculated to deduce a possible link describing the effect of protein-induced variation on the DNA dipole. Binding of SOX2 could not influence the dipole (DNASYN = 76.6 ± 24.9 D, DNACRY = 90.8 ± 28.1 D and DNASOX = 89.7 ± 23.3 D); however, OCT4 dramatically reduced the dipole (60.6 ± 18 D) and this change was prominent in DNAWT (46.8 ± 22.7 D). DNAMUT (66.5 ± 24.6 D) showed an intermediate dipole moment, while DNANS (33.1 ± 14.1 D) showed the smallest dipole moment.
A DCCM is a good indicator of how various atoms communicate with one another during evolution of coordinates, which demonstrated substantial variations in correlation among DNA atoms (excluding H atoms) in different complexes (Fig. 4). DNA termini showed a strong positive correlation, as expected, a uniform and higher degree of positive correlation was evident in DNANS and DNACRY, respectively. The binding region for SOX2 (which is from atom 38 to atom 178 on the 5′–3′ strand and atoms 589–752 on the 3′–5′ strand) and OCT4 (atoms 243–406 and 812–957 on 5′–3′ and 3′–5′ strands, respectively) favored a positive correlation in DNASOX, DNAOCT, DNAWT and DNANS; in contrast, these regions in DNAMUT lacked any substantial positive correlations. Moreover, the intermediate region between these two binding motifs showed a good positive correlation in DNAWT and DNANS, whereas DNAMUT again lacked any substantial positive correlation. During comparison of SOX2 and OCT4 binding with DNA, it was apparent that the binding of both proteins had distinct effects on correlative behavior of DNA atoms, even the self-correlation among the atoms was restricted. Because of dsDNA, two diagonal lines in the matrices appeared, which also locally influenced the counterparts of each base pair. Dynamically, the binding of proteins influenced not only local but also global DNA movements.
The coupled distribution of roll with twist and slide of all systems is of particular interest; it revealed that the binding of SOX2 to the DNA altered its conformation to a non-standard B-form (or close to A-form), whereas free DNA showed a conformation close to the B-from as other systems did: DNAOCT, DNAMUT, DNAWT and DNANS complexes. The distribution of DNAWT was congested in both cases within a limited space, indicating a strict conformational requirement for the DNA in terms of efficient transcription (Fig. 5). Moreover, there was no correlation in DNAWT, but all the other complexes showed a weak-to-moderate negative correlation in roll/twist parameters, whereas in roll/slide distribution, DNAWT yielded a strong positive correlation, whereas the other systems were moderately-to-weakly positively correlated (ESI Table S6†). Furthermore, coupled distribution between an inclination and helical twist (h-twist) further illuminated the conformational variations in DNA, e.g., with OCT4 or in dual protein-bound complexes. Again, DNASOX showed a higher inclination and lower twist: a characteristic similar to A-form DNA. Only DNAWT showed a weak positive correlation, whereas the other complexes were either uncorrelated or negatively correlated. This finding further clarified the conformational changes of DNA in response to the protein interaction.
Slide and X-displacement coupled distribution is also different, where DNAWT, DNAMUT and DNANS are restricted on a limited space. However, DNASYN and DNACRY are occupying a different space. However, DNASOX and DNAOCT representing a bridge between these two conformational spaces that refer to a critical balance between these two proteins to achieve a stable conformation.
Significant differences were observed in protein-induced DNA bend and twist when conjoined distribution was evaluated. In the DNAWT system, DNA occupied a well-pronounced space as opposed to all the other systems. Moreover, free DNAs and DNASOX were overlapping, with a greater spread of DNASOX in the bend-twist conformational space. Moreover, OCT4 binding restricted the complex in a limited space in contrast to SOX2 binding and it was overlapping with the DNANS region, thus plausibly pointing to the underlying fact that, in the absence of SOX2, OCT4 binding to the specific DNA sites is likely unstable and may behave as a binding to a non-specific region. Moreover, the fact that the SOX2 binding could not achieve a stable space may be related to its small binding domain; in addition, the SOX2 binding triggered a configurational change of DNA to an appropriate conformation that may facilitate the stable binding of OCT4 and yield a productive conformation. Strikingly, DNAWT and DNAMUT had a weak and moderate positive correlation, respectively, but the other systems showed a weak negative correlation between the parameters (Fig. 5).
The coupling distribution of the helical rise and h-twist yielded a similar pattern, i.e., DNAWT, DNAMUT, DNANS and DNAOCT were all in a separated region overlapping one another (Fig. 5), with DNAWT in agreement with the B-form of DNA, in contrast to DNASOX, DNACRY and DNASYN. Other than DNANS, all the complexes showed a weak positive correlation, and the correlation in DNAWT was the weakest. It should be noted that twist (rotation of bp step along z-axis) and helical twist (helical geometry of DNA strand) should be identical in ideal B-DNA conformation, however, it is not the case here that refers to the distorted geometries. Next, X-displacement was plotted against a propeller twist of DNAs that segregated the DNANS complex. Nonetheless, as seen in other distributions, DNAs from DNAOCT, DNAWT and DNAMUT complexes occupied the same region, although the separation was well pronounced, whereas free DNAs and DNASOX occupied an entirely different region. Either no correlation (DNAWT only) or a weak negative correlation was observed (ESI Table S6†). X-Displacement and twist coupling distribution yielded a pattern with different systems occupied a different conformational space, where DNASYN and DNANS showed no correlation, DNAWT showed moderate correlation, while the all other complexes showed the strong positive correlation.
Finally, minor and major grooves of DNA that are critical for its activity have been analyzed in various systems (Fig. 6). SOX2 is a minor-groove-binding protein, which is also known to bend the DNA; therefore, its binding causes the minor groove to expand while the major groove shrinks. This behavior was evident in DNASOX, DNAWT and DNAMUT and in all these cases, SOX2 was binding to its specific binding site. However, the system where SOX2 is binding to non-specific DNA and the systems that do not have SOX2 showed the usual width of minor and major grooves in DNA.
In DNAWT, SOX2 was pointing toward OCT4, with the POUS domain of OCT4 showing restricted movements; however, POUHD conserved its domain movement. The dominant direction of SOX2 in DNAMUT was perpendicular as seen in DNAWT, with POUS showing abrupt movements; however, POUHD movement was influenced by the R113E mutation in the tail part of SOX2. This result is in line with our previously shown RMSF graphs (Fig. 1D), where the mutant complex showed greater fluctuations in POUHD domains. In the DNANS complex, SOX2 was moving away from the dimerization interface, with POUS showing the same pattern as evident in the DNAOCT complex. Nonetheless, the POUHD domain of OCT4 was moving in a clockwise manner (Fig. 7). This is the only instance where the POUHD domain's movement was anomalous. In a non-specific binding, the protein should search for other sequences, which is possible when the protein is flexible and moves along the DNA to find its consensus binding sequence.
The covariance matrices that were calculated by PCA were then cross-multiplied to evaluate the overlap between the matrices, to analyze the most prominent directions of DNA motion, and to determine whether they remain well-defined throughout the trajectory. The normalized value representing the overlap between the matrices was also obtained, which was 1.0 when sampled subspaces were identical and 0 when the sampled subspaces were orthogonal to each other. The root-mean-square inner product (RMSIP) compared the overlap of the first 10 eigenvectors that captured >80% of the magnitude of the overall direction and measured the similarity of the modes that have captured the largest deformational propensity in each structure (Fig. 8). All complexes were compared with DNAWT because it represents the most appropriate conformation. It was noted that the third eigenvector of DNAWT and the first eigenvector of DNACRY and DNASYN shared the same subspace of DNA movement. DNASOX, DNAOCT, DNAMUT and DNANS all shared the passing similarity, but the contribution from the first three eigenvectors from all complexes was substantially different, indicating different directional movement or subspace sampled. In DNAMUT, the overlap of the first eigenvector from both the WT and mutant complexes was significantly smaller. The first eigenvector captured the largest fluctuation of the macromolecule, but in the mutant complex, the dominant direction of the first eigenvector was not overlapping, indicating a loss of cumulative movements of DNA due to the mutation.
Protein and DNA alone and in complex were stable throughout the simulation, but their residual absolute energies as measured by RMSF were in agreement with the fact that DNAWT has lowest, whereas DNACRY, DNASYN and DNASOX have the highest values, arguably due to tightly bound proteins and absolutely free and/or SOX2 bound DNAs in the latter cases. The protein residues also have lower RMSF in the DNAWT complex than in DNAMUT and DNANS. In the DNAMUT complex, POUHD of OCT4 showed greater fluctuations, which may also indicate a protein–protein interaction's being perpetuated among the protein domains.8 Substitution of one amino acid residue can considerably influence the protein's conformational flexibility (Fig. 1C and D).33 This may also indicate the delicate balance of interactions among SOX2, OCT4 and DNA to tightly monitor the induction of target genes that is a stringent requirement of this process, and is also in line to previous studies that highlighted the protein–protein interaction through SOX2's tail part in various complexes.6,8 The number of H-bonds within dsDNA is within the potential number of H-bonds between the bases and indicates the ability of the force field to reproduce the natural phenomenon. Moreover, these analyses add an additional layer of confidence over the results.
Conformationally, DNA is not limited to the standard A-, B-, or Z-forms. A growing amount of evidence suggests that protein binding causes DNA to assume an A-like or an A–B intermediate conformation in terms of DNA parameters such as roll and twist.34,35 Moreover, there are >15 parameters that can be measured using the standard DNA parametric algorithms, such as CURVES+,30 but a large number of parameters are overlapping and lack the decisive power for DNA classification. On the other hand, the slide and X-displacement hold this potential; for instance, it is widely accepted that a lower and higher slide value (−0.8 Å) are characteristics of A- or B-form DNA, respectively.36 In line to this, we observed overlapping parameters for the standard A and B conformations. In general, SOX2 binding caused DNA to acquire an A-like conformation, while OCT4 alone binding is unable to alter the canonical shape of the DNA. It is arguable whether OCT4 sparsely bound to DNA at the first place, so OCT4 binding could not influence the DNA conformation significantly. In contrast, when both proteins bind to DNA, the resultant conformation is either in between A and B forms, or close to B form. However, free DNAs are more like the B-form and strikingly overlap each other that indicates the convergence of simulations irrespective of the starting conformation. Moreover, SOX2 bent DNA by up to 50–90° (Fig. 2B and other studies37,38); this phenomenon may also switch its conformation from a stranded B-form to non-stranded B- or A-like conformations.
In addition to experimental approaches, theoretical studies have also pointed to the conformational transition in DNA from B- to non-B and/or A-like conformation upon protein binding or an intermediate stage between A and B during DNA distortion.39–41 For instance, the binding region of TATA box-binding protein in DNA converged to an A-like conformation in longer simulations irrespective of starting conformations;42 in Escherichia coli, cyclic AMP receptor protein can induce A-form upon binding when its recognition motif has an 8-bp spacer,43 which may be necessary to shorten the motif. A polymerase-induced A conformation of DNA has also been detected during HIV reverse transcriptase binding to DNA44 that eliminates error during replication.45 Our study also predicts that DNAOCT plausibly assumes the B-form, DNASOX preferably adopts an A-like form and DNA in the DNAWT complex attempts to adopt an intermediate conformation between the A- and B-forms (Fig. 5 and ESI Table S6†). This finding may explain why for regulation of a complex process like stem cell generation, a very specific conformational requirement has been imposed during evolution to ensure precise control over the process.
SOX2 and OCT4 can bind to DNA in two predominant modes, with a 0- and 3-bp gap within their binding sites. Both motifs organizations produce different binding interfaces that result in different intensity of the expression of target genes.46,47 Nonetheless, the binding-motif gap is evident in other complexes like SOX2/OCT4 in the Zfp206 enhancer,48 SOX2 and PAX6 (paired box genes 6) interaction,49 SOX2 and brain-specific homeobox/POU domain protein 2 (BRN2)50 and SOX1/2/3 and the POU family.51 In the present study, we used the available crystal structure data, and the results can be extrapolated to other similar complexes, where SOX2 plays a vital role in transcription. Moreover, there are numerous complexes, for instance, dyskerin52 and trimeric xeroderma pigmentosum complementation group C (XPC)-nucleotide excision repair complex53 that used to modulate SOX2/OCT4 activities over DNA. The involvement of many proteins in regulation and monitoring of the activity of this SOX2/OCT4 pair points to the importance of their regulated behavior during embryogenesis and development.
DNA exhibits dipole moment due to its inherent asymmetric bending nature, but when SOX2 binds to DNA, it also bends the DNA even though it cannot influence the dipole moment. Whereas, OCT4 is a bulky protein that has influenced the dipole moment. In order to confirm the results, simulations have been conducted again for 100 ns in each case, but the similar results were observed. Besides, AMBER force field is good at reproducing the structural features, but they are lagging behind with respect to the physical parameters.54 Moreover, the number of water molecules around the DNA is different in different systems that may also influence the dipole moment. Nevertheless, the choice of a force field is a compromise among certain structural and physical properties.
The interactions of biomolecules and ligands in solution depend not only on their binding affinity but also on their translational diffusion coefficients. DNAWT has the smallest diffusion coefficient with the longest relaxation time, which indicates a lower mobility and a rigid molecular configuration (ESI Table S5†). The longer relaxation time of a polymer like DNA is fundamentally important because it defines the natural time constant of the molecule and characterizes how rapidly the polymer can react in response to an imposed flow. DNAs from mutant and non-specific complexes have anomalously greater diffusion coefficients (by a factor of ∼30 and ∼300, respectively), but the values of their relaxation time are shorter than that of DNAWT. The flexibility of DNA in mutant and non-specific complexes allows for fast interaction with different proteins during the search for specific binding between the protein and DNA. Experimental and theoretical studies estimated the diffusion coefficient of DNA-binding proteins that are in close proximity;32,55 however, in the present study, unconstrained atomic simulations were conducted to study the DNA behavior. Furthermore, rotation-coupled translational diffusion is a popular concept in terms of protein movement along DNA that is also influenced by polarity of amino acid residues and pH of the solution.56,57 The DNA in the mutant complex was not encountering the same level of hindrance from the bound protein, whereas in non-specific complex, DNA was behaving randomly; this finding corroborates a non-specific protein–DNA interaction.57 For the other systems, relaxation time gradually increased from free DNA to bound DNA, while an inverse pattern was observed for the diffusion constant.
Binding energies were calculated by the method of molecular mechanics with Poisson–Boltzmann surface area (MM/PBSA), though DNA is not comparable to ligand in size, electrostatic density is higher and the ligand (dsDNA) is a double-stranded molecule.47 Despite these flaws, qualitative results can be obtained by MM/PBSA, with a uniform dielectric constant to make the comparison straightforward; however, the higher uncertainty in the computed binding free energies may be attributed to the algorithm that was used to compute free energy.58 These uncertainties are an important factor for ranking of DNA sequences by binding energies, but ranking was not the purpose of this study. In short, the DNAWT complex has the highest binding free energy, followed by DNAMUT, DNASOX and DNAOCT (Table 1). For complexes DNAWT, DNAMUT and DNANS, if entropic contribution could be ignored, the values showed stability of WT protein to their cognate DNA complex. Moreover, a superficial estimate can be made between DNASOX and DNAOCT. SOX2 has fewer amino acid residues (79) than OCT4 does (152), but DNASOX showed greater values than DNAOCT did. Even if an equal entropy contribution can be assumed, the SOX2–DNA complex is more stable.
The direction of SOX2 motion is significantly influenced by OCT4, whereas OCT4's motion direction is also perturbed by SOX2 but to a slightly lesser extent because OCT4 has two domains that wrap around the DNA. In DNAWT complex, their cumulative movement was substantially different from that in the DNAMUT and DNANS complexes (Fig. 7). Subspace sampling revealed significant directional discrepancies in DNA atoms; these data reinforce the concept that protein binding changes magnitude and alters the direction of bound DNA (Fig. 8).59 Further theoretical and practical studies are underway to prove and design various strategies to manipulate this complex.
Besides the theoretical studies such as molecular modeling and molecular dynamics simulations, X-ray crystallography and nuclear magnetic resonance (NMR) can be employed to decipher the structural details; however, these techniques are laborious and have limitations. Atomic force microscopy is the latest technique that holds potential to unravel the structural details without modifying the biological material. Moreover, single molecule force microscopy can be also a good tool to study the protein–DNA interaction. In conclusion, despite DNA-mediated allostery, DNA shape alteration that is mediated by protein–protein interaction plays a crucial role in complex organization, and the relative participation of each molecule, where SOX2 has a greater influence on DNA shape then OCT4 does. Nonetheless, an intense interplay between SOX2 and OCT4 determines the complex's biological efficiency. Moreover, SOX2-mediated DNA bend, SOX2/OCT4-mediated switch in DNA shape and a balance between these DNA conformations are critical factors that substantially influence the complex's biological efficiency. Further, there can be various ways to either disrupt or stabilize their interaction, such as hindering the binding of SOX2 to DNA, or altering the DNA structure despite their binding using peptide-nucleic acids. However, the manipulation of this ternary complex is feasible after in depth study of the molecular architecture, the respective role of each component in complex formation and atomic level details of their interaction. Therefore, this study will be the starting point of a broad spectrum analysis that can provide a reference to target a multitude of challenges in regenerative medicine and cancer therapy.
Seven systems were generated: DNA was extracted from the crystal structure 1GT0 (DNACRY); with the same sequence, a perfect B-DNA was created using 3DAART (DNASYN), crystal structure with DNA and SOX2 (DNASOX) and of DNA and OCT4 (DNAOCT) were also set up; the whole crystal structure with wild type SOX2 (DNAWT), mutated SOX2 (R113E; DNAMUT) and finally a non-specific complex that was generated by mutating the DNA to a non-specific sequence (DNANS). DNASYN and DNACRY represent DNA alone (without any protein) to simulate as a reference and to evaluate DNA movement.
Simulations were repeated with the same parameters with a slight difference because the starting structures were extracted approximately after 150 ns of the previous trajectory. This way, the equilibration period could be ignored during analysis in the repeated simulation. All complexes were then energy minimized, equilibrated for temperature and pressure, and simulated for additional 100 ns.
| BRN | Brain-specific homeobox/POU domain protein 2 |
| DCCM | Dynamic cross-correlation matrix |
| HMG | High mobility group |
| iPSCs | Induced pluripotent stem cells |
| MDS | Molecular dynamic simulations |
| MM/PBSA | Molecular mechanics Poisson–Boltzmann surface area |
| OCT4 | Octamer-binding transcription factor 4 |
| PAX | Paired box genes |
| PCA | Principal component analysis |
| POU | Pit-Oct-Unc family |
| RCF | Rotational correlation function |
| RMSD | Root mean square deviation |
| RMSF | Root-mean square fluctuation |
| RMSIP | Root mean square inner product |
| SOX2 | Sex determining region Y-box 2 |
| WT | Wild type |
| XPC | Xeroderma pigmentosum complementation group C |
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: 10.1039/c6ra15176k |
| This journal is © The Royal Society of Chemistry 2016 |