Two-photon absorption of fluorescent protein chromophores incorporating non-canonical amino acids: TD-DFT screening and classical dynamics

Two-photon spectroscopy of fluorescent proteins is a powerful bio-imaging tool characterized by deep tissue penetration and little damage. However, two-photon spectroscopy has lower sensitivity than onephoton microscopy alternatives and hence a protein with a large two-photon absorption cross-section is needed. We use time-dependent density functional theory (TD-DFT) at the B3LYP/6-31+G(d,p) level of theory to screen twenty-two possible chromophores that can be formed upon replacing the amino-acid Tyr66 that forms the green fluorescent protein (GFP) chromophore with a non-canonical amino acid. A proposed chromophore with a nitro substituent was found to have a large two-photon absorption cross-section (29 GM) compared to other fluorescent protein chromophores as determined at the same level of theory. Classical molecular dynamics are then performed on a nitro-modified fluorescent protein to test its stability and study the effect of the conformational flexibility of the chromophore on its two-photon absorption cross-section. The theoretical results show that the large cross-section is primarily due to the difference between the permanent dipole moments of the excited and ground states of the nitro-modified chromophore. This large difference is maintained through the various conformations assumed by the chromophore in the protein cavity. The nitro-derived protein appears to be very promising as a two-photon absorption probe.


Introduction
Fluorescent proteins (FPs) are the family of homologues of the green fluorescent protein (GFP) of Aequorea victoria initially discovered in the 1960s. 1 FPs found great utility as spectroscopic tools after the cloning of the GFP gene 2 and the demonstration that they can be expressed in other organisms while maintaining their fluorescent properties. 3,4 The unique light-absorbing and fluorescence ability for FPs is due to the formation of a chromophore by a post-translational modification of three precursory amino acids within the protein shell ( Fig. 1). 5,6 The coarse tuning of the colour of fluorescence is generally mediated by the alteration of the precursory amino acids which lead to different chromophores upon maturation. Changes in the micro-environment of the chromophore can also strongly affect the colour of the fluorescence, as seen in some yellow variants of the GFP. 7 In addition to fluorescence wavelength, the protein environment generally offers fine tuning for all photophysical properties of the FP. A full palette spanning red to blue fluorescent proteins has been synthesized 8,9 with the red-shifted chromophores being of great interest due to their lower cell-toxicity. 10 Two-photon microscopy of FPs offers many advantages over its conventional one-photon counterpart. It is less phototoxic, as photons of longer wavelength (and less energy) are absorbed and provides better focus and less out-of-focus bleaching enabling it to have deeper penetration into thick tissues. 11,12 These advantages arise because the two-photon absorption (TPA) probability, the so-called TPA cross-section, is directly proportional to the square of incident light intensity. This property, however, causes TPA to have less sensitivity compared to its one-photon counterpart. Thus, fluorophores with large TPA cross-sections are preferred -a major motivation for this work. TPA is governed by different quantum mechanical selection rules as compared to one-photon absorption (OPA) and so structural modifications in the protein environment can significantly affect the TPA of a FP with minimal effect on its OPA. For example, in the series of red FPs, their measured TPA crosssections range from 15 GM (for mTangerine 13 ) to 119 GM (for tdTomato 13 ) for the lowest-energy excitation although they all share the same chromophore. 14 In addition to the known general difficulty of measuring absolute TPA cross-sections, 15 measurements in biological systems like FPs are more challenging due to the need for additional calibration. Drobizhev et al. comprehensively explained and cited the discrepancies in the reported absolute TPA crosssections of FPs. 14 Generally, the TPA spectrum of a FP has two regions of strong absorption: one is at (approximately) double the wavelength of the OPA peak and an additional (strong) band of absorption corresponds to a shorter wavelength. In the FPs with anionic chromophores, the TPA peak is blue-shifted with respect to the corresponding OPA peak (at half the wavelength). This has been rationalized by the enhancement of a vibronic transition in the two-photon process. [16][17][18][19] The additional band that is absent from the corresponding OPA spectrum was first theoretically predicted 20 to be present in the TPA spectra of all FPs and later confirmed through experimental measurements. 14 Theoretical investigations showed that the peak at longer wavelength is caused by the excitation to the first excited state (S 0 to S 1 ), while the other short-wavelength peak is due to a transition to a higher electronic state (S 0 to S n ). TPA corresponding to the higher-energy transitions has been shown to be amplified due to a resonance enhancement effect. [20][21][22] Being in the near-IR region, the S 0 to S 1 absorptions are of more practical relevance and thus are the focus of the present work.
Some theoretical studies of the TPA properties of FPs include the whole protein via combined quantum mechanics/ molecular mechanics (QM/MM) approaches. List et al. studied the steric factors and chromophore protein-interactions that result in the enhancement of the TPA peak corresponding to the S 0 -S 1 transition in DsRED, 23 a red FP. 19 They attributed the TPA enhancement to the increase in the difference between the permanent dipole moments of the excited and ground states. Another study on GFP succeeded in qualitatively reproducing most of the experimental features of the TPA spectrum. 24 Although the TPA cross-section of a chromophore can be largely altered by the protein environment, studying the isolated chromophore can be a good starting point to predict or understand the TPA properties of the protein. 20 A study of the chromophore and close-by residues of a yellow variant of GFP employed RICC2 and TD-DFT with CAM-B3LYP to discern the effect of p-p stacking on TPA. 25 Although there was good qualitative agreement between the two methods, the values of the TPA cross-sections had to be scaled for comparison. In a benchmark study, Salem and Brown evaluated the use of several functionals by comparing the TPA of isolated FP chromophores as computed via TD-DFT to averaged experimental data and higher-level CC2 computations. 22 Results showed that the B3LYP functional can provide a semi-quantitative description of the major TPA peaks. Recently, equation-of-motion coupledcluster with single and double substitutions (EOM-CCSD) 26,27 was formulated for TPA and applied to chromophores of GFP and photoactive yellow protein. 28 TPA transition moment values computed with this method are comparable to TD-DFT values for similar model chromophores. These studies support the use of computation to design rationally new chromophores.
While many FPs have been engineered and a subset scrutinized computationally, they have, in general, been built from the canonical 20 amino acids. However, the protein engineering toolbox has been rapidly expanding as protein chemists have developed methods for incorporating non-canonical amino acids (ncAAs) into proteins. [29][30][31][32][33] Incorporating ncAAs can generate proteins with novel properties. A number of FPs containing ncAAs, which have been incorporated into the chromophore, have been engineered and experimentally characterized for their OPA and fluorescence properties; [34][35][36][37][38][39][40][41][42] to the best of our knowledge, TPA has not been explored for FPs containing ncAAs. A notable example for OPA is the Gold FP (GdFP), 36 which is represented by model 20 in Fig. 2 where Trp57 and Trp56 in enhanced cyan FP (ECFP) have been replaced by 4-amino-Trp. These substitutions lead to a strongly red-shifted emission compared to ECFP. Site-specific substitutions of ncAAs for Tyr66 in GFP have also lead to novel chromophore structures with spectral properties notably different from the wild-type GFP. 35,38 As examples for residue-specific mutations, two tyrosine analogues (3-amino-L-tyrosine and 3-fluoro-L-tyrosine) have been incorporated into the DsRed-Monomer FP, leading to shifts in fluorescence wavelengths but, more importantly, increases in quantum yield. 39 While incorporation of ncAAs can directly influence the chromophore structure, ncAAs inserted outside the central chromophore can indirectly change its excitation and/or emission behavior. 43 Although using ncAAs in FP design clearly opens up new possibilities, the use of ncAAs is difficult. Thus, any newly designed FP must function better or differently than one that can be engineered using the 20 canonical amino acids. In this work, we use TD-DFT to screen a variety of possible chromophores that can result from the replacement of the Tyr residue of the tripeptide precursor with one of the ncAAs previously used in protein synthesis. The property at focus is the TPA of the chromophore. The most promising candidate is further simulated in a proposed protein environment using molecular dynamics (MD) to study the protein stability and the steric effects of the protein on the TPA of the chromophore.

Chromophores
A template chromophore model was obtained from GFP by breaking the chromophore connections to the rest of the protein and capping with H-atoms. The p-conjugated system necessary for TPA is preserved. Our previous work showed that methyl capping is only crucial when the chromophore has an extended conjugation beyond that in the GFP chromophore. 22 Since we only include chromophores derived from a GFPchromophore template, capping with hydrogen atoms in this present study yields nearly the same TPA values as with methyl groups. We selected candidate amino acids from those compiled by Liu et al. 30 based on the following two criteria: (1) having an aromatic ring that is necessary for extended conjugation and (2) excluding bulkier systems that are less likely to fit in the chromophore cavity (without engineering the protein to accommodate the larger sized moiety). For comparison, however, we considered 2 models (19 and 21) with two-cyclic rings that are comparable to that in the GdFP (model 20). The protein with the 2-naphthyl moiety (model 19) was previously shown to be nonfluorescent probably because the cavity needed to be further manipulated to accommodate the chromophore, as suggested by the authors. 35 Assuming a point mutation at Tyr 66, the phenol ring of the chromopohore is replaced with the corresponding moiety in the given ncAA yielding the chromophore models in Fig. 2.
For chromophores 1, 2 and 16, both the E and the Z isomers were considered and labelled a and b, respectively.

Ab initio computations
Following a previous protocol, 22 the chromophore models were optimized in the gas phase using the PBE0 functional 44,45 (optimized coordinates are given in Table S1, ESI †). Excited-state properties were computed with TD-DFT 46 within the response theory framework using the B3LYP functional 47 and the conductor-like polarizable continuum model (commonly referred to as PCM) [48][49][50][51] with parameters for water -except where noted. The basis set 6-31+G(d,p) [52][53][54][55][56] in cartesian form, i.e., 6 d-functions, was used in all computations. OPA oscillator strengths (and similarly the transition dipole moments from ground to excited states) were computed via linear response 57 while the two-photon transition matrix elements and the transition dipole moments between excited states were evaluated from the single and double residues of the quadratic response function, 58 For linearly polarized light, the transition moment for TPA is where the elements of the two-photon transition matrix are given by: In eqn (2), m a and m b refer to the dipole moment operator in a given cartesian direction (a, b = x, y and z), o n is the energy gap from the ground state, |0i, to a given state |ni, o is the photon energy and | f i is the final excited state. From the TPA transition moment and excitation energies (o f ) produced by GAMESS, the TPA cross-section is calculated in macroscopic units by: where a is the fine structure constant, a 0 is the Bohr radius, c is  (3)) and the broadening factor, G, affect the resulting values of the TPA cross-sections. In a recent study, Beerepoot et al. 65 discussed the various forms of eqn (3) and gave recommendations on presenting TPA data for computational studies.

Molecular dynamics simulation
To test the stability of a protein upon the introduction of a selected ncAA, MD trajectories were generated following the same protocol for two FPs: (1) a reference EGFP (PDB ID: 2Y0G) with the corresponding anionic chromophore which we refer to as control and (2) the same protein after replacing the chromophore with model 22 (see Fig. 2) that is assumed to be formed by replacing Tyr 66 with a Tyrosine-derived ncAA. We refer to this modified protein as nitro. The crystal structure for the control and its modified nitro version were prepared using the pdb4amber and reduce programs 66 in Ambertools 14. The 2Y0G crystal structure is missing 12 residues from the protein termini and hence these are unlikely to affect the dynamics of the b-barrel or the chromophore environment. The missing residues are not considered and the protein is renumbered, so that the chromophore is formed by residue 63. The all-atom forcefield AMBER ff12SB 67,68 was used to parameterize both protein models except for the chromophore residue. The chromophore in both cases includes all atoms between the LEU 62 and the VAL 64 so that more linker atoms are considered than in the attenuated models used for DFT screening (see Tables S2 and S3, ESI †). Although previously validated parameters are available for the control chromophore, 69 we adopted a general procedure to parameterize both nitro and control chromophores and it can be easily extended to test other residues of interest. The parameters generated here serve the purpose of determining the protein stability and conformational freedom of the chromophore. We used ANTE-CHAMBER 70 to generate parameters for the nitro chromophore that are consistent with the General Amber Force Field (GAFF). 71 We assigned similar parameters to the control chromophore. Charges were derived using the online R.E.D. server development tool 72 following the default scheme for amino acid fragments. All parameters are given in the ESI † (see Tables S2 and S3 for atom types and charges). All crystallographic water molecules were removed, including those in the vicinity of the chromophore to enable extra conformational freedom. Each protein model was solvated with approximately 71 000 TIP3P water molecules in a cuboid solvation box with an edge length of 20 Å. To neutralize the negatively charged protein, 7 Na + ions were added to each model followed by 64 Na + and Cl À ions to reach a salt concentration of 0.15 M.
The MD simulations were done with the AMBER Molecular Dynamics package 73 following a standard protocol that consists of minimization, heating, density equilibration and production. Minimization was done first with restraints on the protein atoms and then repeated without restraints. Heating was applied gradually for 20 ps with restraints on the protein atoms. Density equilibration was achieved in four 50 ps runs while gradually relieving the restraint. This was followed by a production run at constant pressure for 99 ns. Langevin dynamics were employed globally throughout the simulations. Details of the simulations are provided in terms of Amber input files in the ESI. † Trajectories were analyzed via CPPTRAJ. 74

Results and discussion
Computing the TPA of FPs involves several levels of complexity. In addition to the intrinsic nature of the chromophore, there are other factors that affect the TPA of a FP. One factor is that the protein shell can change the conformation of the chromophore to enhance or diminish its two-photon absorption cross-section. 19,22 This factor can be accounted for via TD-DFT which can capture the change in TPA associated with various conformers in a semi-quantitative fashion. 22 Another level of complexity is added by the proteinchromophore interactions, or the electric field due to the protein around the chromophore, which can greatly influence the TPA cross-section. 14,19 In the present work, we compute TPA crosssections for isolated chromophores ignoring the protein shell (Section 3.1). We then account for part of the influence of the protein shell by running a classical MD simulation for an EGFPbased protein with the chromophore predicted to have the largest cross-section (nitro) and compare it to an analogous simulation for its native form (control). The motivation is to obtain insight into the relative stability of the protein after introducing the new moiety (Section 3.2) and to account for part of the influence of the protein shell on the chromophore, through studying its flexibility over the trajectory and computing TPA for different conformations of the isolated chromophore (Section 3.3).

TPA cross-sections
The TPA cross-sections for the lowest-energy transition (S 0 -S 1 ) of all the GFP-derived chromophores with natural amino acids have been previously computed. 22 Their TPA cross-sections at the B3LYP/ 6-31G+(d,p) level of theory in PCM H 2 O range from 1 GM (for the BFP 75,76 chromophore) to 7 GM (for the CFP 76 chromophore) when scaled according to eqn (3) in the present study. As discussed by Beerepoot et al., 65 the TPA values reported previously 22 are too large by a factor of 4. An equivalent range of TPA cross sections is determined for molecules 5 through 18 in this study. TPA crosssections, as computed from eqn (3), are given in Table 1 Table S4 in the ESI. † Proteins with bromo, methoxy and amino substituted chromophores (models 10, 13 and 18, respectively) have been previously synthesized and shown to be fluorescent. 35 Their measured OPA energies are: 3.31 eV, 3.15 eV and 2.85 eV, respectively. 35 Compared to the values 3.37 eV, 3.29 eV and 3.14 eV in Table 1, TD-DFT using B3LYP/6-31+G(d,p) captures the proper trend within the expected error range. For model 20, the computed energy of 2.69 eV is very close to the measured absorption peak at 2.66 eV for the corresponding GdFP. 36 The trend in Table 1 looks very promising because, in general, the chromophores with the largest computed TPA cross-sections are the most red-shifted ones.
Although the computation was done in the response theory framework, comparison to a truncated sum-over-states expression gives insight into the factors contributing to the TPA cross section. In a 2-level model (2LM) approximation, the TPA crosssection is proportional to the square of the difference between the permanent dipole moments of the excited and ground states (h1|m|1i À h0|m|0i) 2 and that of the transition dipole moment from the ground to the excited state (h0|m|1i 2 ). The dipole elements for the chromophore models were determined in the gas phase, as the corresponding PCM computations were difficult to converge in DALTON. This change in medium does not affect the analysis, as the trend of TPA cross-sections for the first bright transition is the same whether computed with PCM or in the gas phase (Table S7, ESI †). The dipole elements and the corresponding cross-sections (s 2LM ) calculated directly using eqn (1)-(3) are given in Table 2 for the models where the first gas-phase excitation corresponds to the first PCM one. There is a significant discrepancy between the absolute s values computed via response theory (see Table S7, ESI †) and the corresponding 2LM ones (Table 2). However, the trend is the same (with the exchange of order for models 16a and 18). Since all studied molecules are nearly planar (symmetry was not enforced during geometry optimization), there is no contribution from dipole elements along the z-axis. Most of the contribution comes from the dipole moments along the x-axis which runs through the p-conjugated system. The nitro-derivative, molecule 22, has both large dipole difference and transition dipole moment which explains the large cross-section obtained via response theory computation.

Protein stability
The root-mean-square deviation (RMSD) for the backbone atoms and the root-mean-square fluctuation (RMSF) for the protein residues are shown for the two trajectories in Fig. 3. As expected due to the modification introduced to the nitro model, its RMSD is larger than the control. The RMSD deviation, however, is still within range of the crystal-structure resolution of 1.5 Å. A comparison of the average bond lengths in the chromophores from the MD simulations with the DFT-optimized values, and, for the control, those from X-ray crystallography show no significant deviations (see Tables S5 and S6 in ESI †). Fig. 4 shows a superposition of an average structure for each model generated from the trajectory between 60 and 99 ns. Residue 153 with the largest RMSF is a loop residue outside the b-barrel structure and thus is more flexible than the residues composing the b-sheets (see Fig. 4). In addition to the duly conserved 3D structure in the modified model of the protein, the unique neutral form of the nitro chromophore should make it, in principal, less sensitive to changes in the surrounding microenvironment. Table 2 The (non-zero) dipole elements (in atomic units squared) of the 2-state model computed at the B3LYP/6-31G+(d,p) level in the gas phase. Dm a is the a-component of the difference between the permanent dipole moment of the first excited and the ground states, i.e., (h1|m a |1i À h0|m a |0i).  Fig. 3 The RMSD of the protein in reference to the original minimized structure (left) and the RMSF of the protein residues (right) over the simulation period of 99 ns. Fluctuation for residue 153 is highlighted (see Fig. 4).

Conformational analysis of the chromophore
We monitored the conformational flexibility of the chromophore using three characteristic angles: the angle corresponding to the methine bridge, as well as the tilting and the twisting angles of the nitro-benzylidine moiety with respect to the imidazolinone ring (see Fig. 5). The three angles were recorded for 49 511 snapshots over the course of 99 ns. The minimum, maximum and average values for these angles are shown in Table 3. The methine bridge shows the least flexibility with more than 14% of the snapshots having the average angle of 1341 and 95% of the snapshots having an angle within 1291 and 1391. The twisting and tilting angles show more flexibility where 6.5% and less than 3.5% snapshots have the average angles for each, respectively. In 95% of the snapshots, the twist angle ranges from 1601 to 2001 and the tilt angle ranges from À301 to 301. Hence, we generated 117 conformers by varying the twist and tilt angles by 51 within these ranges and fixing the methine bridge at the average angle of 1341. For each conformer, we computed the first excitation energy, OPA and TPA at the same level of theory used in screening the chromophore models, that is, TD-B3LYP/6-31+(d,p) in PCM with parameters for H 2 O. The trends for the TPA cross-section and OPA oscillator strength are illustrated in Fig. 6. The trend in the excitation energy is similar to that for the TPA cross-section. Nevertheless, the TPA trend is not driven by the change in energy, as the largest energy difference in the set of conformers is less than 0.1 eV (Fig. S1, ESI †). Further, the same TPA trend is generated even if the same excitation energy is used to calculate the TPA cross-section for all conformers. For the OPA oscillator strength, a uniform parabola can be noticed when the tilt angle is fixed to the planar value, 01, and the twist is varied, or the twist is fixed to 1801 and the tilt is varied. In such cases, the oscillator strength decreases upon deviation from planarity. The decrease is the same whether the tilt or the twist is varied. As the fixed angle deviates from the planar value, the curve is skewed. On the other hand, the TPA cross-section increases when the twist angle deviates from 1801 and decreases when the tilt angle deviates from a planar or near-planar value. The TPA value is significantly more sensitive to the tilt angle than it is to the twist angle. To further investigate the reason for such trends in the TPA cross-sections, we computed the difference between the first excited state permanent dipole moment Fig. 4 An overlay of average structures of nitro (red) and control (black) models generated from the interval between 60 and 99 ns. Residue 153 (highlighted green) is the non-terminus residue with the largest RMSF fluctuations. Fig. 5 The nitro chromophore model showing the three angles monitored in the conformational study: the methine bridge is the angle between atoms 11 -9 -1, the tilt angle is the dihedral between atoms 11 -9 -1 -2 and the twist is the dihedral between atoms 12 -11 -9 -1. The full z-matrix is given in Table S8 in the ESI. †  6 The variation of TPA cross-section (left) and OPA oscillator strength (right) with the tilt and twist angles while fixing the methine bridge at 1341 (see Fig. 5 for the definition of the angles). The tabulated values are given in Table S9 in the ESI. † and the ground state dipole moment for each conformer (h1|m|1i À h0|m|0i) at the TD-B3LYP/6-31G+(d,p) level of theory in the gas phase. The square of the x-components of the dipole difference are plotted in Fig. 7. The resemblance in trend and the magnitude of the difference as the tilt and twist angles change strongly confirms that the TPA cross-section variation is driven by the difference between permanent dipoles. These results could further guide the protein engineering of the chromophore cavity to optimize its TPA, where a (near) planar value is needed for the tilt angle and deviation of the twist angle from planarity is desirable.

Conclusion
A group of 22 proposed FP chromophore models derived from ncAAs are screened for their TPA cross-sections. TD-DFT employing the B3LYP functional was used in the screening. Most of the studied molecules exhibited poor intrinsic TPA cross-sections similar to the naturally occurring GFP-derived chromophore models. Molecules with multiple rings, as expected due to the extended conjugation, showed relatively large TPA cross-sections. The chromophore with the largest cross-section, however, was the one derived from a nitro-based ncAA. Two-state model analysis suggests that the increase in TPA is due to both its large transition dipole moment (h0|m|1i) and the significant difference between the permanent dipole moments of its first excited and ground state (h1|m|1i À h0|m|0i). To investigate this model further, MD simulations were run on both a native EGFP (control) and a nitro-derived EGFP (nitro). Comparison with the control showed that the protein was stable after the replacement of the hydroxyl group with the nitro substituent. A conformational analysis was then performed to study the change of TPA with a range of the conformations visited by the chromophore in the protein cavity. Results show that a large TPA cross-section (24)(25)(26)(27)(28)(29)(30)(31)(32) is maintained through the various conformations and that the TPA fluctuation is, again, driven by the change in the difference between the permanent dipole moments of its first excited state and its ground state (h1|m|1i À h0|m|0i).
In this proposed model, we accounted for two degrees of complexity, that is, the nature of the chromophore and the effect of the protein environment on the chromophore conformation. There still remains the consideration of the chromophoreprotein interactions and the electric field due to the protein shell as both can affect the TPA cross-section. The sensitivity to the surrounding electrostatic environment of the chromophore is due to the dependence of the TPA cross-section on h1|m|1i À h0|m|0i. The red FPs share the same chromophore that has an intrinsic TPA cross-section of about 5 GM, as computed previously at the same level of theory used in the present work (TD-B3LYP/ 6-31+G(d,p)). 22 However, due to the protein shell, some red FP proteins reach an experimental TPA cross-section of 139 GM; 14 a 27-fold amplification. This amplification has been attributed to the sensitivity of the difference between permanent dipoles (h1|m|1i À h0|m|0i) to the electric field of the protein. 14 The nitro model, having most of its TPA driven by a large difference between permanent dipoles, seems to be a promising FP target especially if properly engineered to amplify its large intrinsic cross-section. Fig. 7 The effect of changing the tilt and twist angles while fixing the methine bridge at 1341 (see Fig. 5) on the square of the difference between the x-components of the permanent dipoles of the first excited and the ground states (h1|m x |1i À h0|m x |0i) of the nitro chromophore (model 20) computed at the TD-B3LYP/6-31G+(d,p) level of theory in the gas phase.
The tabulated values are given in Table S9 in the ESI. †