Directional and regioselective hole injection of spiropyran photoswitches intercalated into A/T-duplex DNA.

The electron-hole injection from a family of spiropyran photoswitches into A/T-duplex DNA has been investigated at the molecular level for the first time. Multiscale computations coupled with automatized quantitative wavefunction analysis reveal a pronounced directionality and regioselectivity towards the template strand of the duplex DNA. Our findings suggest that this directional and regioselective photoinduced electron-hole transfer could thus be exploited to tailor the charge transport processes in DNA in specific applications.

Light-driven charge transport processes through DNA play a central role in photodamage 1 and are the key to design DNAbased molecular wires. [2][3][4] Upon illumination, one electron is excited from a donor unit to an acceptor species, generating a hole -a positive charge -in DNA. 5 This electron-hole can efficiently migrate long molecular distances through the DNA helix. 2,6 Given its importance for a wide range of applications, understanding the dynamics of electron transport in DNA has been a subject of study for decades. [7][8][9] For instance, it is known that the migration of the electron-hole strongly depends on stacking 10 and on the energies of the involved nucleobases. 11 However, despite several experimental 12 and theoretical 13 setups that have been used to investigate the electron-hole injection and migration in DNA, the characterization of the initial electronic excited states is in most cases unknown.
The donor and acceptor units can be covalently bound [14][15][16] or intercalated in DNA, [17][18][19][20][21] such as the spiropyran (SP) photoswitches. After light irradiation, SPs undergo heterocyclic cleavage to yield the open merocyanine (MC) form. 22,23 Actually, the SP form (1 in Scheme 1) does not bind DNA but the protonated open MCH form (2) does. 24,25 While the photocleavage of SP in solution has been widely investigated, [26][27][28][29][30][31] and it is known that the protonated open MCH form is able to oxidize DNA nucleobases in cell culture, 32 the excited states of these photoswitches intercalated in DNA have never been investigated. This is the subject of this work. We address here the central question of how electron-hole injection operates from merocyanine derivatives to a duplex A/T DNA strand. We use atomistic multiscale calculations coupled with quantitative wavefunction analysis to model explicitly the hole injection from two MCH derivatives (2a and 2b) into 12-mer (poly-dAT) 2 , for which these compounds show selectivity. 24 The intercalation mechanism of the nitro (2a) and amidinium (2b) derivatives into a 12-mer (poly-dAT) 2 is described elsewhere. 25 An ensemble of geometries of 2a:DNA and 2b:DNA obtained by unrestrained classical and Born Oppenheimer molecular dynamics simulations in an explicit solvent is used to ensure an efficient sampling of the environment and of the vibrational space of the chromophore. 34,35 A total of 4000 excited states per complex were calculated within a quantum mechanics/molecular mechanics (QM/MM) framework, where the chromophore and the first four surrounding nucleobases (121 atoms for 2a:DNA and 125 atoms for 2b:DNA, see Fig. 1a and b) are considered quantum mechanically. Further computational details can be found in the Computational details section.
Our theoretical protocol is validated by the good agreement of the computed absorption spectra (Fig. S1, ESI †) of 2a in water and 2b intercalated in DNA with available experimental data. 33,36 The spectra consist of two absorption bands. The lower-energy band corresponds to the intramolecular excitation from a p orbital (HOMO) to a p* molecular orbital (LUMO), with absorption maxima at 382 and 398 nm, respectively. The peaks are slightly blue-shifted (o30 nm) with respect to the experimental ones, as usual at this level of theory. 34 The bright states of 2a and 2b are the same in solution as bound to the DNA. However, upon DNA intercalation, the brightest state is surrounded by dark states with strong charge transfer (CT) character involving nucleobases, see Table 1 for the excited states of 2a:DNA and Table S1 (ESI †) for 2b:DNA.
The CT character, defined by the CT number 37 from 0 to 1, measures the intermolecular light-driven electron transfer between the donor and the acceptor. To define CT numbers, the system is split into five fragments, the chromophore (e.g. 2a) and the four interacting nucleobases: adenine (c-A) and thymine (c-T) of the coding strand (5 0 -3 0 ) and the corresponding ones of the template strand (t-A and t-T, 3 0 -5 0 ), see Fig. 1c.
The total density of excited states (DOES) of the lowest absorption band of the complex 2a:DNA (black line of Fig. 2a) calculated from the ensemble of the structures and classified according to their CT character clearly shows that there are a few absorbing states (red line) embedded by many dark states with strong CT character (blue line). Fig. 2b and c illustrate the natural transition orbitals for the brightest state and one representative CT state, respectively. The brightest state corresponds to an intramolecular excitation; the CT state is an intermolecular excitation from the probe to t-A.
The fact that the brightest state of 2a in solution and in the duplex DNA is the same, indicates that upon intercalation an electron of 2a is excited creating a hole on the HOMO. This orbital is surrounded by the electron-rich HOMOs of the nucleobases. Therefore, one electron from the orbitals of the nucleobase can relax to the half-occupied HOMO of 2a, transferring the hole from 2a to one of the stacked nucleobases of the dsDNA. The hypothesis that 2a acts as a photooxidant probe of the surrounding nucleobases is supported by its substantial reduction potential E 0 (2a) (calculated value 3.94 V, see Computational details) and that the related merocyanine 540 derivative is able to oxidize the DNA nucleobases in cell culture. 32 Finding this high density of CT states for the minimum geometry of the 2a:DNA complex urged us to investigate the excited electron and electron-hole populations of these CT states within the first ten excited states of the ensemble of geometries (1000 excited states per complex, see Computational details). We found that upon light absorption, 97% of the population of the excited electron is localized on 2a. Complementarily, 70% of the hole population is found in only one nucleobase and 24% is delocalized in two different fragments. In principle, the electron hole could have been transferred from the probe to any of the four nucleobases since 2a intercalates in the middle of the site with similar distance to all the nucleobases (Table S2, ESI †). However, the four nucleobases are not chemically equivalent, as they are located in different strands and they are affected by the asymmetrical binding of 2a, recall Fig. 1b and c. As a consequence, the majority of the electron hole is localized on the template strand (3 0 -5 0 ), with 38% and 33% on t-T and t-A, respectively, and 29% on the coding strand (Fig. 3, pink bars). The strand selectivity is ascribed to the presence of the NO 2 group, a strong electron-withdrawing group (EWG), oriented to the coding strand and a hydroxyl group (OH), an electron-donating group (EDG), projected to the template strand. Due to its electronic character, the NO 2 group prevents the hole injection in the nucleobases around it. Moreover, since the differences between the electron-hole populations between t-A and t-T are small, 2a promotes the electron-hole injection into the template strand, in a static picture, in both directions (5 0 2 3 0 ).
The strand selectivity increases in 2b:DNA, as the template/ coding ratio is 90/10 ( Fig. 3a, green bars), compared to 2a (71/29).  Of greater importance is the fact that in the complex 2b:DNA, the electron-hole injection goes preferentially into t-A (66%) instead of t-T (24%). This observation suggests that the electron-hole injection into the template strand propagates unidirectionally (3 0 -5 0 ) in the presence of 2b. A possible explanation for such discrimination between t-T and t-A could be found in geometrical differences between the two complexes, which would promote the electron-hole injection unidirectionally in the case of 2b. In our previous work 25 and in Fig. S3 (ESI †), we showed that the two complexes shared the same intercalative binding site and the same average distance differences between the two nucleobases of the template strand and the probe for both complexes (Table S2, ESI †). These observations led us to exclude geometrical reasons as a discriminant for the regioselectivity and to investigate other possible reasons. The origin of the structural and electronic differences that regulate the electron-hole injection directionality into DNA can be traced back to the interactions between the p-systems of the chromophore and those of the neighbouring nucleobases. These interactions can be assessed in the minimum energy geometries with two descriptors. One is the non-covalent interaction energy between the probe and the nucleobases of each of the strands or p-p stacking interactions (Table 2), calculated via the Grimme's dispersion energy correction (D3). 38,39 The other descriptor is the energy difference between the HOMO of the probe and the HOMO of each of the nucleobases forming the binding site (Table 3 and Fig. S2, ESI †). 11   The presence of the strong nitro EWG (2a) or amidinium (2b) EWGs exerts a strong interaction with the nucleobases of the coding strand (À75.49 kcal mol À1 for 2a and À79.59 kcal mol À1 for 2b, Table 2). As a consequence, the probe induced an energy split of the HOMOs of the nucleobases. In particular, the nucleobases of the coding strand are lower in energy than the HOMO of the probes (Table 3). In contrast, the interaction with the template strand is weaker (À63.04 kcal mol À1 for 2a and À72.28 kcal mol À1 for 2b), and thus, the HOMOs of t-A and t-T are lying higher in energy than the HOMOs of the probes (Table 3). This explains why most of the electron-hole population is found on the template strand. Upon light absorption by the photoprobe, one electron from the HOMO of the template strand relaxes in energy and occupies the HOMO of the probe, injecting the hole into the template strand. Remarkably, the derivative 2a shows a stronger interaction with the coding strand than with the template one (DE = À12.44 kcal mol À1 , Table 2), while in 2b, this difference is half (DE = À6.81 kcal mol À1 ). This means that the HOMOs of t-A and t-T, although higher in energy, are more stabilized by 2b due to favourable p-p stacking interactions. In addition, whereas in 2a the HOMOs of t-A and t-T are close in energy (De HOMO = +1.82 and +1.72, respectively), in 2b t-T is much more stabilized than t-A. This is why most of the total electron-hole population (66%) is found on t-A (Fig. 3b, green bar)-the nucleobase with the higher HOMO level.
We are now in the position to propose a mechanistic model for the electron-hole injection in dsDNA by MCH derivatives containing EWGs, such as 2a and 2b (Fig. 4). It is the combination of p-p stacking interactions and the presence of an EWG that stabilizes the HOMOs of the nucleobases differently at the binding site (Fig. 4a). Upon irradiation by UV light, the brightest excited state is populated, which can decay to one of the lower-lying dark CT states. That is, one electron from the p-system of the probe (HOMO) is promoted to an excited state with p* character fully localized on the probe, creating a hole within the intercalated photoprobe (circle, Fig. 4b). The proximity in the energy of the HOMOs of t-A and t-T allows the migration of one electron of the nucleobases to MCH. The probe oxidizes thus the neighbouring nucleobases, injecting the hole into the DNA (arrow, Fig. 4c) and triggering hole migration through the double strand. This process is directional because the probe oxidizes the DNA, injecting an electron hole; is asymmetric because the binding mode of the open merocyanin species projecting the EWG to the coding strand promotes the hole injection into the template strand; and is regioselective because the nature of this EWG affects the energy levels of the HOMOs of the nucleobases of the template strand. As an example, in 2b the hole injection happens mainly into t-A, allowing a 5 0 -3 0 electron-hole propagation.
In conclusion, we have portrayed how chemical modification of spiropyrans can modulate the directionality of the hole transport in DNA, arguably offering many application prospects. The quantitative direct observation of the CT states between a photooxidant and nucleobases, where the ligand is intercalated, has no precedent in the study of spiropyran photoswitches and evidences the importance of characterizing their excited states in order to prevent or enhance the photoinduced process involving DNA. Our findings open new questions on how the temporal evolution of these excited states is influenced by this selectivity and its biological implications. To answer these questions, further theoretical and experimental studies on the photodynamics of the injected electron hole are necessary.

MD simulations
The initial structures of both probes 2a and 2b intercalated into a 12-mer dsDNA (poly-dAT) 2 were obtained from umbrella sampling MD simulation studies, as described elsewhere. 25 Table 3 Difference energies (De, eV) between the HOMOs of the probe (2a and 2b) and the surrounding nucleobases on the template (t-A, t-T) and the coding strand (c-A, c-T)  In all cases, each complex was immersed in a cubic box of 30 Å from the solute to the border of the box filled with TIP3P water molecules 40 20 (2a) and 19 (2b) Na + to ensure electroneutrality.
To reproduce the experimental conditions reported by Andersson et al., 24 a final NaCl concentration of 1 Â 10 À5 M was achieved by the addition of Na + and Cl À atoms. Each of the systems was simulated for 100 ns without any restraint following the protocol described in ref. 25

QM/MM MD simulations
A total of 100 equidistant snapshots from the former MD simulations were selected to carry out QM/MM MD simulations using the sander program implemented in the AMBER17 suite. 41 The system was partitioned in two regions: the QM and the MM region. Each of the probes (2a or 2b) and the four nucleobases around them (after cutting the glycosidic bond) were included in the QM region (121 and 125 atoms, respectively). The rest of the atoms were treated classically using the ff14SB 42 force field and the TIP3P 40 model for the water molecules. The QM region was treated with density-functional tight-binding (version 3, DFTB3), 43 the semiempirical method of DFT, which is internally provided within the AMBER17 suite. The interaction term between the QM and the MM regions was calculated using the electrostatic embedding scheme. 44 In the MM part, periodic boundary conditions were used and the electrostatic interactions were computed using the Ewald method 45 with a grid spacing of 1 Å. The cutoff distance for the non-bonded interactions was 10 Å and the SHAKE algorithm 46 was applied to all bonds involving hydrogen atoms. An integration step of 2.0 fs was defined. In the QM region, the PME and SHAKE algorithms were deactivated and a cutoff of 10 Å was defined for the interaction between the two regions. Each of the 100 QM/MM MD trajectories was propagated for 1 ps, with a time step of 0.1 fs at 300 K and 1 atm. In this way, the QM/MM MD sampling included the quantum mechanical effects in the phase space sampling, obtaining more accurate initial conditions for the excited state calculations.

Static TD-DFT vertical excitations
The final geometries from the QM/MM MD simulations were used to compute the first 40 singlet states using time-dependent density functional theory (TD-DFT). As above, the QM region included the probe and the four surrounding nucleobases (c-T, c-A, t-A and t-T), in this case treated with the long-range corrected functional CAM-B3LYP 47 and the def2-svp 48 basis set. Grimme's dispersion correction D3 was considered. 38,39 The rest of the system was printed as MM point charges and they were included in the Hamiltonian by means of electrostatic embedding. These calculations were performed with the TeraChem code 49,50 on GeForce Nvidia GTX 1080Ti GPUs.

Wavefunction analysis
The quantitative wavefunction analysis was performed in a second step after calculating the vertical excitations and the corresponding orbitals in TeraChem. Such an analysis was possible using the TheoDORE software. 37,51 Detailed information can be found on the documentation of TheoDORE; 51 here we only summarize the features employed.

Vis/UV spectra
The Vis/UV spectra were convoluted as a sum of Gaussian functions (eqn (1)): where f gs is the oscillator strength in the ground state, E and E gs are the energies in the excited state and in the ground state, respectively, and FWMH is the full width at half maximum. A value of 0.5 eV was used for FWMH.

Transition density matrix, charge transfer numbers and natural transition orbitals
TheoDORE relies on the transition density matrix (D OI ) analysis, 52 computed as shown in eqn (2). Briefly, considering the states I and J and the orbitals a and b, one element of the one-particle transition density matrix is given by: where b a y a and â b are the creation and annihilation operators, respectively.
For the charge transfer (CT) analysis, the system was divided into five fragments: the probe and each of the four nucleobases. The CT numbers were computed using the Mulliken-like population analysis: where A and B are two different fragments, m and n are atomic orbitals, D OI is the transition density matrix and S is the overlap matrix, both matrices expressed in atomic orbital basis.

Natural transition orbitals (NTOs) 53
NTO is built through a singular value decomposition of the D OI given by: where U is the hole orbital coefficients matrix, V is the particle orbital coefficients matrix and X is the diagonal matrix of the transition amplitudes.

Redox potential calculation
The standard redox potential of the probe in a DNA environment is calculated using the Born-Haber cycle 54 and the Nernst equation, based on the Gibbs free energy difference in gas and solvated environments for both the reduced and oxidized species. The oxidized (closed-shell, singlet, net charge = +2) and the reduced (open shell, doublet, net charge = +1) species were optimized at the CAM-B3LYP/def2-SVP level of theory. With this, we obtained the value of G gas . Then, the geometries of both species were solvated with an acetonitrile shell (e = 35.688), which is known to reproduce a more similar DNA-like environment than water, 55 and the free energies G solv À Á were calculated with the polarizable continuum model (PCM). 56 The standard redox potential E1 was then calculated from the Gibbs free energy change DG redðsolvÞ as shown in eqn (5) and (6) These calculations are performed with Gaussian 09, version D.01. 57

Conflicts of interest
There are no conflicts to declare.