Towards large scale hybrid QM/MM dynamics of complex systems with advanced point dipole polarizable embeddings

Hybrid DFT(Gaussian)/AMOEBA(Tinker/Tinker-HP) polarizable molecular dynamics including the QM/MM mutual polarization on large complex systems. Example of the thiazole orange dye buried in a DNA double helix, embedded in a sphere of water (16 500 atoms).


Introduction
Hybrid QM/MM approaches, combining quantum mechanical (QM) and classical atomistic models based on molecular mechanics (MM), [1][2][3][4] are nowadays widely applied to the modeling of biological and other complex systems. Such models combine a high resolution, fully atomistic description of a complex system with a focused approach that introduces a distinction between the relevant portion of the system, usually named its target, and an environment that, while playing a fundamental role in tuning the properties of the target, does not participate directly in the process of interest. This has two main advantages. First, the very high dimensionality of the computational problem is largely reduced and the set of information extracted from the calculations becomes of a similar order of magnitude to, and more pertinent to, its experimental counterpart. Second, it is very cost effective, as while retaining a description that represents the system in its full complexity, the vast majority of the system is described with fairly inexpensive MM, while the computational effort is concentrated on the target, that can be described at an accurate, though expensive, QM level of theory, which would not be applicable to the whole system. Finally, QM/MM methods become mandatory when dealing with extremely complex and large systems involved in processes which require the explicit treatment of the electronic structure problem such as enzymes or photoresponsive proteins. [5][6][7][8][9] The denition of the target of such systems is not always a straightforward task, especially when the target and the environment are linked together by a covalent bond. As a matter of fact, in biological matrices (e.g.: proteins, DNA) several covalent bonds between the QM and the classical subsystem can be involved. It is evident that in all these cases, the way the system is partitioned becomes crucial to achieve a correct description of the process of interest. For instance, peculiar strong specic interactions between a small portion of the environment and the target may require to extend the QM treatment to a larger part of the system. [10][11][12][13] The modeling of mixed QM/MM bonds has been a longlasting argument of debate in the community and several different approaches have been developed in the years to address this issue. 3,4,14,15 All these models have to solve, in some way, three main problems. First, cutting a covalent bond results in a QM atom with an unrealistic, highly reactive, unsaturated valence conguration, which can introduce strong artifacts in the simulation. Such an atom has to be capped in order to complete its valence. Second, the QM density at the boundary can be overpolarized when an electrostatic embedding is applied, which can be aggravated when a polarizable embedding is used as also the classical part can overpolarize. Third, the bonding terms in the classical MM force elds involving atoms from both subsystems have to be carefully selected in order to avoid double-counting of interactions.
Three different classes of boundary schemes can be found in the literature. A rst class is made by link-atom schemes, where an additional (usually hydrogen) atom is introduced in the QM subsystem to cap the missing valence. Such an additional atom is not part of the real system and the charge next to the QM subsystem may be blurred or redistributed. [16][17][18] The second one is made by localized orbital schemes, 1,15,19 which employ hybrid orbitals placed at the boundary to cap the QM region, keeping some of them frozen. Finally, the third class uses a capping potential to terminate the QM region. [20][21][22][23] In this work we will focus on a specic version of this third class, the pseudobond scheme. [20][21][22] This implementation is integrated in our polarizable QM/ MM method, based on the AMOEBA polarizable (and multipolar) force eld, [24][25][26][27][28][29] which interfaces the Gaussian and Tinker packages to achieve molecular dynamics (MD) on the adiabatic ground state energy surface of the QM/MM system. This new implementation is extensively tested on a set of oligopeptides, for which we discuss the energy conservation along hybrid ground state trajectories. Three test systems of increasing complexity are considered to show the stability of the method: an alanine dipeptide, a 6 aminoacid oligopeptide (SAPPAS) and a 20 aminoacid oligopeptide (PEP). We then detail the applicability of our method to the computation of excitation properties on a large solvated system encompassing a chromophore intercalated in DNA for which we will compute electronic excitations within a sequential approach based on the hybrid QM/ MM molecular dynamics.
2 Polarizable QM/MM with the AMOEBA force field We will rst present the details of the pseudobond scheme adopted in the new implementation designed to treat QM/MM boundary regions in the hybrid QM/AMOEBA self-consistent eld (SCF)-based method, implemented in a development version of Gaussian 09 (ref. 30) interfaced with the MM/MD Tinker and Tinker-HP 31,32 suite of programs. A brief recall of the very fundamental equations of the method is then presented. A more complete derivation of the QM/AMOEBA equations can be found in the literature, 33-37 as well as a complete description of the polarizable (and multipolar) AMOEBA FF. [27][28][29]

The pseudobond scheme
When the target and the environment are chemically bonded, the QM/MM treatment of the system requires one to cut at least one covalent s bond A-B. A and B refer to boundary atoms of the environment and the active part, respectively. In the pseudobond (PB) approach the A PB -B bond is formed by replacing the A atom with a one-free-valence boundary atom (A PB ) to complete the open valences of the bonds that cut across the QM/MM boundary. A PB is then included in the QM subsystem and customized effective core potentials (ECPs) [38][39][40][41][42] are used to treat A PB . In the PB approach the ECP is designed to simultaneously reproduce the properties of the bond that is replaced (charge, bond strength/distance, etc.) and results in a minimal perturbation on the original system. From now on, the QM-MM covalent bond is called a pseudobond (PB) and the atom selected as the boundary atom and included in the (usually closed-shell) QM subsystem is the PB atom.
In this work we apply the recently proposed PBs by Yang and co-workers. 22,43 The use of PBs for QM/AMOEBA hybrid simulations has been previously reported and implemented in the LICHEM code. 44,45 Here, we employ the same approach as developed by Kratz et al. 44 The ECPs are of the form with a and b being empirical parameters optimized for specic types of pseudobonds, like C ps -C, C ps -N and C ps -C]O, where the subscripts ps indicate the atom bearing the pseudopotential. In this formulation the seven-valence-electron boundary atom has its own basis set, namely the STO-2G basis set with four parameters instead of six since the same exponents are assigned to the s and p basis functions. 21,22 A remarkable advantage of the PB approach is that it does not require any additional term to be implemented in the electronic structure code, as long as such a code is able to handle ECPs, which is usually the case. On the other hand, a signicant effort has been put into the design of a exible QM/ MM interface, capable of handling the double quantum and classical nature of the PB atom. Such a duality requires in fact the handling of its associated electronic density within the electronic structure code (Gaussian 09), and the inclusion of bonding interactions involving quantum atoms in the MD code (Tinker), which normally ignores such class of atoms in the energy computation.

Variational energy functional of the QM/AMOEBA embedding
The QM/AMOEBA variational energy functional is dened as 46 where E QM is the internal energy of the QM subsystem, and E MM that of the classical subsystem treated with the AMOEBA force eld, while E Coup represents the coupling between the quantum and the classical subsystems. For each energy component in eqn (2), we write explicitly its functional dependence on the variational degrees of freedom that are simultaneously optimized during the SCF procedure, namely the one-body electronic density matrix P and the induced dipole moments m at the classical polarizable sites. The QM/MM coupling term is further divided into two different contributions which represent, in order, the electrostatic interaction between the AMOEBA static multipoles (up to quadrupoles) and the QM density and the polarization interaction of the QM density and the AMOEBA induced dipoles 33,34 Here, dispersion/repulsion interactions between the classical and quantum subsystems are included in the E MM term, and they are treated using the AMOEBA "soer" buffered 14-7 functional form. 27 Finally, the last term in eqn (2), E MM , includes all the electrostatic interactions between the classical sites and all the bonding interactions of the classical FF. The bonding interactions which are accounted for in E MM include also those bonding terms arising from the QM and MM atoms connected across QM/MM boundary regions. To compute these bonding terms between classical and quantum atoms connected through a PB atom, the PB atom itself and the QM atoms involved are treated as classical atoms, using the atomic parameters dened in the classical FF for the atom type required. In Fig. 1 the bonding interactions are depicted in a schematic manner, representing respectively: bond stretching, angle bending decomposed into in-plane and out-of-plane components, bond-angle cross terms, and torsional rotation, including torsion-torsion coupling terms.
The E MM term does not depend on either the electronic density or the induced dipoles, and for this reason it does not enter the SCF procedure. It is worth noting that the PB atoms do not need any classical multipoles or polarizabilities since the description of the PB atoms is included in the nuclear and electronic part of the QM subsystem and thus any electrostatic and polarization interactions involving the PB atoms are included in the hybrid terms of eqn (3).
Taking advantage of the variational formulation, the coupled QM/AMOEBA equations needed to plug the QM/MM scheme in the quantum chemistry method of choice have been derived, and an effective QM/AMOEBA Fock (or Kohn-Sham, depending on whether Hartree-Fock or DFT theory is concerned) matrix can be obtained as the gradient of the energy functional in eqn (2) with respect to the density matrix. 33 Analytical derivatives of the QM/AMOEBA energy are obtained by differentiating the energy functional of eqn (2) with respect to both classical and quantum nuclear coordinates. The hybrid QM/MM forces can then be used to perform QM/MM Born-Oppenheimer molecular dynamics (BOMD). 34 An extended Lagrangian approach (XL-BO), in the formulation proposed by Niklasson and co-workers, 47-50 is applied to improve the guess for the electronic density at each integration time step of the dynamics to speed-up the convergence of the SCF procedure. 34

Excitation properties
The implementation has also been extended to the simulation of electronic excitations, using a time-dependent DFT (TDDFT) scheme. 33 The whole spectrum of the excitations of interest is determined in a single step calculation, by solving for the poles of the TDDFT response function. An additional contribution in the response equations is introduced by the inclusion of the polarizable embedding operator in the Hamiltonian, which is computed through the transition densities corresponding to the different excitations. Such a term takes into account the dynamic response of the polarizable environment.
The detailed derivation of the TDDFT equations in a polarizable environment can be found elsewhere. 51 Here we only report the nal modied TDDFT equations: 52 ÃB and observe that the matricesÃ andB incorporate the effects of the polarizable embedding in two ways. First, the integrals that they contain are computed using molecular orbitals (MOs) determined in a QM/MM SCF computation. Secondly,Ã andB include an explicit term involving the classical induced dipoles, namely: 33 where the a, b and i, j indices refer to occupied and virtual MOs, respectively. Such classical induced dipoles are computed at each step of the iterative procedure usually employed to solve the Casida equations.
To account for the further environment effects due to the relaxation of its polarization in response to change in the QM density in the excitation process, a so-called "state-specic" correction, 53,54 completely equivalent to the so-called "corrected Linear Response" (c-LR) scheme originally developed for polarizable continuum models, 55 was implemented in the QM/ AMOEBA method. 33 Within this framework, a relaxed density matrix is calculated for the excited state of interest and the corresponding excitation energy is corrected for the interaction of the corresponding electric eld and the induced dipole moments within the environment.

Numerical tests
The numerical stability of the pseudobond approach has been tested by analyzing the energy conservation along short trajectories of three oligopeptides: an alanine dipeptide (ALA), an hexapeptide (SAPPAS, amino acidic sequence: Ser-Ala-Pro-Pro-Ala-Ser) and a twenty aminoacid oligopeptide composed of each of the natural amino acids (PEP, amino acidic sequence: Ala-Gly-Val-Leu-Ile-Pro-Ser-Thr-Cys-Phe-Tyr-His-Trp-Asp-Asn-Glu-Gln-Met-Lys-Arg).
The oligopeptides are arbitrarily partitioned in a QM portion, modeled with DFT, and an MM one, modeled with the AMOEBA polarizable potential. The QM portion consists of one of the alanine residues for ALA, two central prolines for SAPPAS and a four aminoacid sequence (tyrosine-histidine-tryptophan-aspartate) for PEP. The boundaries between the QM and the classical portions in the oligopeptide are treated with the adequate PB, depending on whether the PB formed on the QM side is a C PB -N or C PB -C]O bond, where C PB indicates the carbon atom bearing the pseudopotential, and the parameterized basis set, and included in the QM portion as the boundary atom. Simulations where a QM/MM partition is performed within the PB scheme will be referred to in the rest of the manuscript with the tag QM/MM(PB).
When the PB approach is applied, the xed multipoles and the atomic polarizability of the classical atoms directly bonded to the PB atom (MM 1 sites according to Fig. 1) are set to zero and the point charges are distributed on the MM 2 atoms to balance the total charge of the system, as suggested by Zhang and coworkers. 20,21 ALA is the simpler and smaller system, and for this reason it is employed to fully explore the capabilities and the limits of the PB approach. For such a purpose two sets of simulations are performed on ALA: (i) gas-phase dynamics and (ii) dynamics in a small droplet of water, treated with the AMOEBA potential. For each set of simulations the QM/MM(PB) dynamics is compared with a full MM trajectory, where the ALA is described completely with the AMOEBA potential, and a full QM one, where the ALA is full DFT. The total energy conservation is compared between the different models used to describe the ALA system and also their effects on some structural parameters, whose adequate description could be crucial in the modeling of a more realistic chemical problem.
Energy conservation is discussed for the QM/MM(PB) dynamics of SAPPAS and PEP only in a water droplet (AMOEBA water), and compared with results obtained from a trajectory performed with a full MM oligopeptide. Those two systems, even though small compared to real-life MD applications, are indeed more complex than the simple ALA dipeptide. For this reason, and because the AMOEBA FF is parameterized only for simulations in solution, gas-phase tests on SAPPAS and PEP are of little interest to draw signicant conclusions on the physics of the PB scheme.
All the trajectories are run for 2 ps at the B3LYP/6-31G(/ AMOEBA) level of theory, in the NVE ensemble, with an integration time-step dt ¼ 0.5 fs and initial velocities sampled from a Maxwell-Boltzmann distribution at 80 K. The integrator applied is the velocity Verlet. The low temperature is chosen to minimize the noise in the energy uctuations. 34

Alanine dipeptide
In QM/MM(PB) simulations of the ALA system the QM/MM partition occurs through the a-carbon atom of the amide bond between the two alanine residues, which is then dened as the PB atom. The pseudobond formed in this way is a carbon-carbon s-bond between the PB atom and the neighboring sp 2 carbon of the carbonyl group in the QM portion.
In Fig. 2 the partitioned system is depicted and the total energy E tot time-series of full QM, QM/MM(PB) and full MM trajectories is reported. Either in vacuo (@ QM(VAC), @ PB(VAC) and @ MM(VAC)) and in the small droplet of classical AMOEBA water (@ QM(WAT), @ PB(WAT) and @ MM(WAT)) the dri in the total energy is barely noticeable, even in the very small scale of the energy uctuations due to the low temperature ($AE10 À5 -10 À4 hartree (E H )). Both the dris and the oscillations in the total energy appear slightly larger in the simulation in solution. This is due to the increased complexity in the dynamics, which stems from the presence of the solvent. Overall, the dri is still remarkably small ($10 À4 E H ). Furthermore, no signicant differences can be observed in the behaviors of the full QM, QM/ MM(PB) and full MM simulations, meaning that the introduction of the PB scheme does not affect the total energy conservation signicantly.
The "in vacuo" trajectories are analyzed more in detail from a structural point of view. To have a larger sampling the dynamics have been extended to 4 ps. The energy conservation of the elongated dynamics is still consistent with what was observed for the shorter ones and the equivalent analysis is reported in the ESI (Section S1 †).
The three compared trajectories all start from the same geometry but exhibit different initial velocities, and thus they can, in principle, evolve in different regions of the PES. Since they evolve for a short time, and because of the relatively "simple shape" of the PES for the ALA system and the low temperature of the simulation, this effect is not expected to play a major role in the structural differences that can be detected.
First, two dihedral angles are analyzed as they are involved in bonding terms between quantum and classical atoms connected through the PB atom within the QM/MM(PB) dynamics. These dihedrals are reported in Fig. 3 and named CNCC (green) and NCCN (orange). We also report their distributions along the full QM (red), QM/MM(PB) (blue) and full MM (yellow) dynamics in the gas-phase. For the CNCC dihedral, where two of the involved atoms are in the MM region, the full MM and QM/ MM(PB) distributions are almost overlapping.
The scenario is quite different for the NCCN dihedral, where the two involved atoms are in the QM region (see Fig. 2 (Top) Representation of the ALA system. The PB atom is represented in black, and the MM portion of the peptide is represented as ball-and-sticks, while the quantum part is represented as van der Waals spheres. (Bottom) Total energy conservation along the different 2 ps NVE trajectories where: (i) the ALA dynamics is performed in vacuo, treating the dipeptide as full QM (@ QM(VAC)), with the PB approach (@ PB(VAC)) and as full MM (@ MM(VAC)); (ii) the ALA dynamics is performed in a droplet of water (@ QM(WAT), @ PB(WAT), @ MM(WAT)). The water solvent and the classical portions of the amino acids are treated with the AMOEBA potential, while the QM systems are treated at the B3LYP/6-31G level of theory. The total energy is reported in milli-hartree (mE H ) and shifted by the average value computed along the trajectory (hE tot (t)i t ), while the time on the x axis is reported in pico-seconds (ps). ALA @ QM and ALA @ MM have been shifted by À0.1 mE H and +0.1 mE H respectively, to avoid the superposition of the different time series.  full MM distributions are mostly overlapping, even though the intensities of the single bins can vary a lot between them, and they also exhibit a similar shape, with two separate peaks in both cases centered around the intervals 70-80 and 90-100 .
The large difference between the full QM results and the other distributions can be certainly attributed to the mismatch between the DFT and the AMOEBA potentials, AMOEBA being tted on a higher level of QM accuracy. 27 The comparison also shows that the overall quality of the QM/MM dynamics is not signicantly deteriorated with respect to the one based on the AMOEBA FF by the PB approach. To conclude the analysis of dihedrals in the QM/ MM(PB) trajectory, two further ones, namely, one involving only QM atoms (OCNC, blue) and one only involving MM atoms and the pseudoatom (CCNC, cyan) are analyzed. In both cases the comparison between the distributions of such angles in the three different trajectories shows a good agreement, especially for the OCNC dihedral, where the QM/MM(PB) distribution strongly resembles a hybrid between the full MM and the full QM one. The OCNC case, however, points out that the effect of the PB, even though reasonably small, is not completely negligible in the close proximity of the QM/MM boundary.
A similar analysis has been carried out on the length of the bond which corresponds to the PB length dened in the QM/ MM(PB) simulation, compared to the length of the corresponding bond in the full QM and full MM dynamics. The resulting distributions (Fig. 3) show a Gaussian behavior, with the full QM case (red) having a C-C bond which is on average longer than that of the corresponding PB (in blue), with an average difference of $0.05Å. The full MM value (yellow) lies in between the QM/MM(PB) and the full QM. Considering the full QM simulation as a reference, the PB approach is reasonably in accordance. The differences in the bond lengths are well motivated by the effect of the smaller basis set for the valence electrons and the pseudopotential used to treat the core electrons.
It is worth reminding here that these analyses on the QM/ MM boundaries represent a limiting case for the method since, as usual in QM/MM, the QM region of interest, where structural parameters can play a fundamental role, is always kept as far as possible from the QM/MM boundary region, in order to minimize the effects of the cut.

Two larger cases: SAPPAS and PEP oligopeptides
The energy conservation within the PB approach is tested also on the SAPPAS and PEP systems on trajectories performed for such systems embedded in small water droplets.
For the SAPPAS hexapeptide two PB atoms are dened, one on each of the two alanine units (see Fig. 4, top panel). In one case the PB replaces the a-C of a C-N bond in the quantum part, while in the other a C-C]O bond is replaced, as for ALA. In Fig. 4 (Bottom panel) the SAPPAS total energy conservation along the full MM trajectory (yellow) is compared with that of the QM/MM(PB) dynamics (blue). The QM/MM(PB) energy time-series (SAPPAS @ PB(WAT)) slightly deviates from the full MM starting a positive dri aer 0.5 ps of simulation. It suddenly disappeared with an opposite sign dri. Around 2 ps the dri has been removed. A longer 6 ps trajectory is analyzed in the ESI (Section S2 †), showing a negative dri reappearing aer 3 ps. The dri appears linear in time and of very small entity, with a negligible largest energy deviation from the average of $0.3 mE H (less than 0.2 kcal mol À1 ).
Finally, as a last numerical test, the PEP oligopeptide (see Fig. 4, top panel) trajectory is analyzed. The QM/MM(PB) total energy conservation (blue) is reported in comparison with that of the full MM dynamics (yellow) shied by À0.2 mE H to avoid the superposition of the two time series. In this case, as in the ALA dynamics, no signicant effect on the total energy conservation can be seen when the PB approach is applied. For PEP the uctuation in the total energy is generally larger than those of ALA and SAPPAS, due to the larger dimension of the system (WAT simulations involve water boxes of: 630 molecules for ALA, 473 molecules for SAPPAS and 1053 molecules for PEP). Also for PEP a comparison between longer simulations (7.5 ps) is reported in the ESI (Section S2 †).

Thiazole orange absorption when intercalated in DNA
To further test the applicability and reliability of our method, we have investigated the electronic excitations of the Thiazole Orange dye (TO) 56,57 intercalated in a short double helix of DNA solvated in water (see Fig. 5).
In a previous study by some of the authors 58 the vibronic structure of the absorption band of TO in the biological matrix was simulated applying a sequential scheme based on the QM/ AMOEBA MD performed with the Gaussian 09/Tinker interface. The TDDFT including explicit effects of the polarizable environment on the excitation properties was employed. In that study, the system was partitioned in such a way that only the TO dye was treated at the QM level, while the rest of the environment was represented with the AMOEBA FF. In the trajectories used to compute excitation energies, only the TO was allowed to move in a fully classical and frozen environment. 58 Neither the role of the QM/MM partition scheme employed nor the effects of the frozen environment on the excitation properties have been investigated.
Aiming at shedding more light on these aspects, a new set of simulations is compared to the previous results, applying the PB scheme to assess how the excitation properties are affected by the denition of a larger QM subsystem which goes beyond the TO dye, including also the two pairs of nucleobases (NBs) nearest to the dye. This modied QM/MM partition scheme entangles together the effects on the dynamics of the system and on the resulting excitation properties. The PB approach is applied to the QM/MM boundaries, which are placed between the nucleobase and the sugar connecting the base to the DNA backbone (see Fig. 5). We refer to the new denition of the QM subsystem including the NBs as QM/MM(PB), while the previous one only including TO as QM/MM.
In order to better investigate how the environment relaxation affects the overall results, we performed a third type of MD simulation. Now, all the systems, not only the QM subsystem, is allowed to move; we label the test performed under these conditions as movE.
In Table 1 the notation used to dene the different QM/MM partition schemes is reported, together with the composition of the QM and MM subsystems and whether the MM one is frozen or it is moving during the QM/MM MD simulation. In Fig. 5 a pictorial representation of the whole system and of the different schemes applied in this work is reported.  Table 2 and in the text). H and P mean the hole and particle, respectively.
The structures coming from the QM/MM, QM/MM(PB) and movE dynamics are nally used to compute the excitation energies using the same QM-MM partition adopted for the dynamics. Since the present work does not aim at computing the whole absorption band shape of the embedded TO dye, instead of performing several hybrid QM/MM trajectories, and sampling the initial system conguration from a classical dynamics, a reduced set of starting geometries is extracted from the same classical trajectory used in ref. 58, obtaining two starting congurations, far in time from each other as schematically depicted in Fig. 6. The trajectories produced from the two initial congurations selected are named MD 1 and MD 2 .
The B3LYP/6-31G/AMOEBA level of theory has been used for all molecular dynamics simulations. 2 ps runs in the NVE ensemble with a time step dt ¼ 0.5 fs have been performed to keep consistency with the simulation proposed in ref. 58. No periodic boundary conditions were applied (i.e. water droplet), and the initial atomic velocities are assigned through the sampling of a Maxwell-Boltzmann distribution at 300 K.
First the effect of a larger QM subsystem is discussed. In these simulations the classical environment is kept frozen to be consistent with the previous study. The environment, composed of the DNA helix, counterions and water molecules is made up of about 16 500 atoms, most of which are due to the large solvation box (see Fig. 5). In ref. 58 the B3LYP functional has been employed also for the TDDFT calculations which follow the dynamics. When the new QM/MM(PB) partition is applied, the use of the B3LYP functional results in the appearance of charge-transfer (CT) intruder states, delocalizing the TO pp* transition of interest on the QM NBs. This effect has been investigated computing TDB3LYP/6-31+G(d)/AMOEBA excitations on the snapshots extracted from MD 1 every 80 fs (see Fig. 6) and the results are reported in the ESI (Section S3 †). Excitations computed on the same MD 1 snapshots but restricting the QM subsystem at the TO dye only, where the CTs cannot take place in the TDDFT calculation, have been used for comparison and the nature of the excitation has been analyzed in terms of the corresponding Natural Transition Orbitals (NTO). 59 In Section S4 of the ESI † we report the comparison between the B3LYP, CAM-B3LYP, M062X, PBE0, uB97X and uB97XD functionals, showing that the emergence of the intruder CT-like states can be cured by choosing a range-separated hybrid functional. 60 For this reason the CAM-B3LYP exchangecorrelation functional has been selected to compute excitation properties.
The results are reported in Table 2 in terms of the average excitation energies and the corresponding transition dipoles from MD 1  This analysis, based on the two sets of trajectories MD 1 and MD 2 and fostered by the third set of simulations reported in Section S5 of the ESI, † indicates that, as expected, allowing the local environment due to the closest NBs to move together with TO reduces the differences introduced by the different initial congurations of the environment.
If we now want to estimate the contribution of the QM-MM partition, separating it from the structural effect related to the NB geometry relaxation, a different comparison has to be done. To do that, we selected a single TO-environment conguration (namely one of the two initial congurations extracted from the classical MD trajectory), and we use it in combination with different embeddings: (i) no environment, dened as vacuum, (ii) the QM-MM partition scheme, where only TO is QM, (iii) the QM/MM(PB) scheme, where also the four closest NBs are treated at the QM level and (iv) substituting the PB approach   Table 1: QM/MM, QM/MM(PB) and movE. Each trajectory, for a total of six simulations, has been sampled every 80 fs, removing the first 480 fs as an equilibration time, to extract systemenvironment configurations to perform TDDFT calculations.
with link atoms, namely the QM/MM capping H. As usually done when the link atom is used, the electrostatic parameters of the classical MM atom that has been substituted by the hydrogen atom have been distributed between the classical atoms connected to it. The hydrogen link-atom has been compared here with the PB scheme only for single point calculations, to provide a wider range of methodologies to the reader, also considering the large diffusion of the link-atom approach. 3,4 The excitation energy, together with the NTOs of the corresponding states, has been reported in Fig. 5. Adding an environment on the selected structure has generally a nonnegligible effect on the excitation properties. The 3 eV excitation energy found for the vacuum case is red shied by roughly the same amount (0.15 eV) in the three different embeddings. As a result, we can observe how the QM-MM partition itself does not signicantly affect the excitation properties of TO. This observation is conrmed by the comparisons of the effects of different environments, i.e. for instance the (ii) and (iii) embeddings, on the many structures extracted from the MD 1 (see Section S7 of ESI †). Additionally, the same analysis has been performed modeling the classical environment with the AMBER99sb 61 force eld for DNA and the TIP3P model for water, 62 nding that within a classical, non-polarizable description of the environment, the computed properties are more sensitive to the denition of the QM and MM subsystems (see Section S8 of ESI †).
A third set of simulations has been performed, where also the environment has been allowed to move together with the QM subsystem (moving environment scheme, movE). In this test, a selection of the water solvent has also been performed: two selection radii are tested, rst including all the solvent molecules (and the counterions) within a radius of 15Å from each TO atom, and a second selection with a radius of 23Å. All classical atoms (roughly 4200 for the small radius and 8400 for the larger one) are moving and treated as polarizable with the AMOEBA potential. As no signicant differences in the computed excitation energies have been found between the two selection radii (<10 À2 eV), only the results on the larger radius simulations are shown in Table 2, while the comparison between the two tested radii is reported in the ESI (Section S6 †). As observed for the QM/MM(PB) simulations, also in this case the relaxation of the environment has a larger effect on MD 1 , with a nal red shi in the averaged excitation energy of 0.09 eV if the QM/MM(PB) value is used for comparison, and 0.2 eV with respect to that in the QM/MM scheme.
As a key parameter to describe the relation between the internal structure of the TO dye and its excitation properties, the bond length alternation (BLA) dened in ref. 58, and highlighted in Fig. 8 with the TO structure, has also been computed on the structures obtained from all six hybrid trajectories. Their average values are reported in Table 2. Also the correlation coefficients computed between the BLAs and the excitations Table 2 Average values of the properties computed along the MD 1 and MD 2 trajectories in the different QM/MM schemes are presented. Excitation energies (eV) and electronic transition dipole moments (atomic units) are computed at the TDCAM-B3LYP/6-31+G(d)/AMOEBA level. Each trajectory is sampled every 80 fs (see Fig. 6) to extract a set of system-environment configurations employed to compute the vertical excitations. The average BLA value, evaluated on the snapshot sampled to compute the excitation energy, is reported in pm, together with the correlation coefficient C corr computed between the excitation energies and the measured BLAs  . 7 TO pp* excitation energies (red lines) and BLA (blue lines) fluctuations along the QM/MM, QM/MM(PB) and movE simulation schemes adopted for the MD 1 and MD 2 trajectories. On the x-axis is reported the snapshot number while on the y-axis is reported the difference between the considered properties at the actual snapshot and the average value of the property (either the excitation energy or BLA) along the trajectory. All average values are reported in Table 2.
computed on the sampled snapshots are reported. Negative correlation coefficients highlight an anticorrelation between the two quantities, which can also be observed from the time series reported in Fig. 7. Particularly for the QM/MM simulations a non-negligible anticorrelation can be observed, with a coefficient larger in absolute value than 0.5. For MD 2 the correlation coefficient becomes signicantly smaller also in the QM/ MM(PB) scheme. The MD 1 and MD 2 BLA values start to be very close to each other only in the movE. In this case, in fact, the constraints due to the xed environment, together with the differences in the system congurations related to the initial structures are reduced.

Conclusions
In this work we demonstrated the ability of our hybrid QM/MM MD framework, coupling Gaussian and Tinker/Tinker-HP, to perform QM/MM Born-Oppenheimer molecular dynamics simulations on large (bio)molecular systems including a mutual polarization between the QM and MM regions and treating QM sites that are covalently bonded to the classical subsystem through a pseudobond approach. The main computational aspects of this approach, both concerning its implementation and its application to ground state, DFT/AMOEBA MD simulations, have been discussed together with a detailed analysis of the impact of the boundary regions. Structural effects have been studied on the QM/MM dynamics of oligopeptides, partitioning the aminoacid sequences into QM and MM portions, linked by covalent bonds. The effects of different QM-MM partitioning schemes have been investigated on the dynamics and the resulting electronic excitation of a dye embedded in a DNA fragment. The comparison shows that an extended QM subsystem going beyond the dye has small effects on the excitation properties if the structure of the whole system is kept the same. This indicates that very large QM subsystems are not required if accurate polarizable force elds such as AMOEBA are used. We also show that a non-polarizable description of the environment using a classical force eld may lead to computed properties which are more sensitive to the denition of the QM and MM subsystems. Overall, the importance of a proper handling of the coupled dynamics of the QM subsystem and the environment is also highlighted. If a discussion on the computational cost of such treatment has not been explicitly included in this work, we can add, as a nal remark, that (i) Molecular Dynamics with polarizable force elds using distributed multipoles such as AMOEBA can be achieved efficiently on very large systems using modern algorithmics; 32,63,64 (ii) the proper treatment of the polarizable environment represents only a small overhead of a few percent on the already expensive SCF procedure of the QM subsystem that dominates the total cost of the simulation.
These results demonstrate that large scale hybrid QM/MM molecular dynamics simulations of complex systems using advanced point dipole polarizable embeddings are now possible offering enhanced accuracy and new perspectives of applications in various elds. Finally, this work paves the way for a production implementation of a polarizable QM/MM scheme to appear in the distribution versions of Gaussian and Tinker/Tinker-HP, for which a more detailed technical description will be given in order to detail the general performance of the model.

Conflicts of interest
There are no conicts to declare.