Is density functional theory accurate for lytic polysaccharide monooxygenase enzymes?

The lytic polysaccharide monooxygenase (LPMO) enzymes boost polysaccharide depolymerization through oxidative chemistry, which has fueled the hope for more energy-efficient production of biofuel. We have recently proposed a mechanism for the oxidation of the polysaccharide substrate (Hedeg{\aa}rd&Ryde, Chem. Sci. 2018, 9, 3866). In this mechanism, complexes with superoxide, oxyl, as well as hydroxyl (i.e. [CuO2]+, [CuO]+ and [CuOH]2+) cores were involved. These complexes can have both singlet and triplet spin states, and both spin-states may be important for how LPMOs function during catalytic turnover. Previous calculations on LPMOs have exclusively been based on density functional theory (DFT). However, different DFT functionals are known to display large differences for spin-state splittings in transition-metal complexes, and this has also been an issue for LPMOs. In this paper, we study the accuracy of DFT for spin-state splittings in superoxide, oxyl, and hydroxyl intermediates involved in LPMO turnover. As reference we employ multiconfigurational perturbation theory (CASPT2).


Introduction
Atmospheric oxygen is believed to have been introduced in our atmosphere 2.0-2.5 billion years ago. 1 Nature has since devised numerous ways to exploit O 2 to perform biochemical transformations, often by the use of transition metals. Copper is one of the metals employed for O 2 activation by many enzyme families. 2 A relatively new member of the O 2 -activating enzymes is lytic polysaccharide monooxygenase (LPMO). The LPMOs were discovered in 2010 3,4 and were shown to boost polysaccharide depolymerization through oxidative chemistry. This was a paradigm shift in our understanding of how highly stable polysaccharides, such as cellulose, are decomposed, which previously was believed to be solely hydrolytic.
The discovery of LPMOs have fueled the hope for production of biofuels from cellulosic biomass, [5][6][7][8][9] which could reduce the cost of biofuel production, because cellulose is cheap and estimated to be the most abundant polysaccharide on Earth. 10 The overall oxidation reaction of the LPMOs involves O 2 and two reduction steps (cf. Scheme 1). However, it should be noted that this reaction may evolve through initial gen-R−H + O 2 + 2 H + + 2 e -LPMO R−OH + H 2 O Scheme 1: Reaction catalyzed by LPMO, where R−H denotes a polysaccharide. eration of peroxide as it was recently shown that both O 2 and H 2 O 2 can be employed as co-substrate. 11,12 Further, the substrate-free LPMO can activate O 2 and produce H 2 O 2 . [13][14][15] The active site responsible for this chemistry is shown in Figure 1 in a form where O 2 is bound to Cu together with a part of the substrate. The figure also shows the two substrate carbon positions (C1 and C4) oxidised by LPMOs; the positions are at the glycoside link between the sugar units. The active site itself is comprised of the copper ion, coordinated by two histidine residues, one of which coordinates with both the imidazole sidechain and the (terminal) amino group. This active site is conserved among all known LPMOs, which otherwise show a rather large sequence variation. 16,17 This variation is evident even close to the active site. In the AA9 family of the LPMOs (which is the focus of the current article), there is a nearby tyrosine residue that can act as Cu ligand (depending on the copper oxidation state), whereas some other LPMO families lack this residue.
Regardless of whether O 2 or H 2 O 2 is used as co-substrate, the mechanism employed is still to a large degree unknown: most structural information is obtained for the LPMO resting state without substrate or co-substrates bound. [18][19][20][21][22][23][24][25][26][27] A few crystal structures of O 2 -bound LPMOs 28,29 or LPMOs complexed with substrates 30,31 have been reported. However, no structural data have been obtained for oxygen-bound intermediates complexed with substrates. Thus, the intermediate shown in Figure 1 was obtained from a combined quantum mechanics and molecular mechanics (QM/MM) optimization. 32 These calculations further showed that the species in Figure 1 is best described as superoxide ion (O 2 · -) bound to a Cu(II) ion.
A key part of the LPMO mechanism is believed to be the abstraction of a hydrogen atom from the C−H bond in the C1 or C4 positions of the polysaccharide substrate. Despite efforts from both theory and experiments, the active species that abstracts the hydrogen from the polysaccharide substrate has been the matter of controversy. Some suggestions for the mechanism employs a superoxide for the C−H abstraction, 6,19,20,33 but other suggestions have involved hydrogen abstraction from an oxyl (O · -) species. 6,8,34,35 Studies on model systems have also suggested hydroxy 8,36 and hydroperoxy complexes 37 as the reactive species.
Rather few studies have addressed the mechanism of LPMOs with quantum mechanical (QM) methods. 14,32,34,[38][39][40][41] We and a few other groups have recently initiated investigations of the LPMO mechanism employing both QM-cluster and QM/MM calculations. 32

Computational Details
We study three different intermediates. All structures were taken from our previous QM/MM calculations 32 in which they were optimized in triplet and singlet spin-states, respectively. All were optimized with the same size of QM region; an example is provided in Figure 1  Therefore, we used the structure and energies of the closed-shell singlet state.
Owing to the large computational cost, the CASPT2 calculations were performed on truncated systems (called model 1), compared to the QM system used in our QM/MM calculations 32 (which are called model 2). The truncated model 1 for all three studied states are shown in Figure 2 and selected bond lengths are given in   The DFT calculations were done with the def2-TZVPP basis set on all atoms. 59 We employed four different functionals, namely TPSS, 60 TPSSh, 61 B3LYP 62-64 and M06L. 65 These functionals were chosen because they represent four commonly employed functionals employing different design strategies: the TPSS and M06L functionals are both meta-GGA but while the former was constructed to satisfy exact physical constraints without empirical parameters, the latter is heavily parameterized (and includes transition metal compounds in its parameterization). The TPSSh functional is a hybrid formulation of TPSS, adding 10 % exact Hartree-Fock exchange, while the B3LYP functional is an empirical hybrid functional that has been a standard method in quantum chemistry since its development in the early 90's.
For non-hybrid functionals the DFT calculations were sped up by expanding the Coulomb interactions in an auxiliary basis set (the resolution-of-identity approximation), 66,67 employing standard def2-TZVPP auxiliary basis sets. The empirical D3 dispersion corrections were included with the Becke-Johnson damping. 68 All DFT calculations were performed with the Turbomole 7.1 software. 69 We additionally carried out a series of coupled-cluster (CC) calculations employing CC with singles, doubles and perturbative triples, CCSD(T) as implemented in Turbomole 7.1.
The calculations employed the cc-pVTZ basis set 52 for Cu and def2-SV(P) 70

Results
In this study, we compare energy splittings between the singlet and triplet spin states (∆E ts we employed a CAS (12,12) active space for both the singlet and triplet spin states. The chosen active-space orbitals are shown in Figure 3 in combination with natural occupation numbers (shown below the orbitals).
The active space includes one bonding ligand orbital that is located between Cu, the three nitrogen atoms of the two histidine ligands and O -2 . This orbital has an occupation number close to two (1.997) for both states. It could be interpreted as O 2 -antibonding orbital, albeit with large amplitude on histidine. We decided to include this orbital due to its large amplitude on O -2 in both singlet and triplet species. The active space is further comprised of the five 3d orbitals, of which four are doubly occupied with occupation numbers between 1.988 and 1.989 in both the singlet and triplet states. The fifth Cu 3d z 2 orbital interacts with the oxygen π orbitals and they show a pair of partly occupied orbitals: in the triplet state, they both have occupation numbers around 1.0 and are of rather pure 3d and oxygen π * character, respectively (see Table 2 where the combined weight of the constituting atomic orbitals are given). With four of the five 3d orbitals doubly occupied, and the last 3d orbital singly occupied, an Cu(II) interpretation seems reasonable. In the singlet state, the singly occupied orbitals have occupation numbers of 1.241 and 0.757, respectively, and are more mixed between Cu 3d and O 2p character (see Table 2). The remaining five orbitals have low occupation numbers (0.005-0.012) in both triplet and singlet spin states. They are included as a second shell of d orbitals, which has previously shown to be important to obtain accurate CASPT2 energies. 72 Overall, the orbitals for the singlet and triplet states are similar, which is desired for ensuring accurate spin-state splittings. We also note that a similar active space was employed in the study of a protein with an active site resembling the one studies here. 46 For the singlet state, a closer investigation of the underlying CASSCF wavefunction reveals that two configurations have large weights, 0.61 and 0.37, respectively. The four 3d orbitals and the O -2 orbital (with occupation number 1.997) are doubly occupied in both configurations, and they differ instead in the occupation of the two orbitals with occupation numbers 1.241 and 0.757. In the first configuration, the former is doubly occupied and the latter unoccupied, whereas the opposite is true for the second configuration; in combination with the fact that the two partially occupied orbitals are a mixture of copper 3d orbitals and oxygen 2p orbitals (cf.   the large mixing of copper 3d orbital and ligand (oxygen) 2p orbitals for these two orbitals leads to an interpretation where Cu(II) and a Oradical are spin-coupled to a singlet.
The singlet-triplet splittings from DFT and CASPT2 are shown in Table 3. The triplet is found to be 21 kJ/mol more stable than the singlet state with CAS(14,16)PT2, independent of the basis set. On the other hand, the DFT calculations predict that the singlet and triplet are essentially degenerate: TPSS, TPSSh and B3LYP predict the singlet to be 4-7 kJ/mol more stable than the triplet, whereas M06-L predicts the triplet to be 1 kJ/mol more stable than the triplet. We note that the splitting was also predicted to be quite small with the larger model 2 and enlarging the model stabilizes the triplet by 8-14 kJ/mol, making the triplet slightly more stable in all cases (cf. Table 4). Hence, we can expect -21 kJ/mol to be an lower limit for the singlet-triplet splitting, which is likely to increase for a larger model, assuming CASPT2 behaves similar to DFT in this regard. Since the splitting is already low, both singlet and triplet states may participate in the mechanism. We can conclude that the singlet-triplet is indeed small as predicted by DFT, and the DFT methods are in this regard in reasonable agreement with CASPT2 for the oxyl species.
The hydroxyl state In the CASPT2 calculations, we employed a CAS (14,16) active-space with orbitals and corresponding natural occupation numbers shown in Figure 5. We use the same active space as for the [CuO] + species, including five Cu 3d orbitals, five correlating Cu 4d orbitals, a pair of bonding and anti-bonding orbitals on the OH ligand as well as two lone-pair orbitals with two correlating O 3p orbitals. The two O 2p orbitals and four of the Cu 3d orbitals were found to be doubly occupied with occupation numbers close to two for both the singlet and the triplet. The fifth Cu 3d orbital (3d z 2 ) and one O 2p orbital were singly occupied in the triplet state with occupation numbers of 1.00, indicating a OH radical bound to a Cu(II) coupled to a total triplet state. The singlet state is again somewhat more complicated: the two orbitals having occupation numbers with significant deviation from 2.0 indicates that the singlet is not closed-shell. The two partially occupied orbitals (with occupation numbers 1.688 and 0.310, respectively) have large character of both 3d-and 2p atomic orbitals, although to varying degree (see Table 2). Investigating the underlying wave function shows (as expected) that more than one configuration contribute to the ground-state wavefunction: two configurations have large weights of 0.80 and 0.14, respectively. In the configuration with the largest weight, the four 3d orbitals as well as the three OH-based orbitals with occupation numbers 1.975, 1.980 and 1.688 are double occupied. For the configuration with smaller weight (0.14), the electrons in the orbital with occupation number 1.688 are now interchanged with the orbital with occupation number 0.310, which then becomes doubly occupied. Since the two orbitals with occupation numbers 1.688 and 0.310 are shared between Cu 3d and OH orbitals, we again suggest that the singlet state has one electron on the Cu center, which is bound to a OH radical, spin-coupled to a singlet i.e. Cu(II) and OH.
The calculated singlet-triplet splittings are given in Table 3 and it can be seen that CASPT2 predicts the singlet to be the more stable state by 92-93 kJ/mol. All DFT methods agree with this conclusion, but they overestimate the stability of the triplet significantly.
The results depend quite strongly on the DFT functional, but the results shows differences of 31-62 kJ/mol from CASPT2 and are thus significantly off. The TPSS functional gives results closest to CASPT2 (with a difference of 31 kJ/mol), whereas B3LYP gives the largest difference (62 kJ/mol).
The effect of enlarging the model system was again investigated by DFT and the results are compiled in Table 4. The larger model gives between 26 kJ/mol (TPSS) and 10 kJ/mol (B3LYP) lower singlet-triplet splitting energies, i.e. always favoring the triplet state. However, even by assuming the largest of these truncation errors (i.e. the one obtained with the TPSS functional), the singlet is still expected to be significantly more stable than the triplet.
In our previous study, 32 the reaction in which [CuOH] 2+ abstracts C−H from the substrate was found to involve both the singlet and triplet potential-energy surfaces and the reaction energy showed a rather strong dependence on employed the exchange-correlation functional. 32 The present results suggest that part of the reason for this dependency is that the employed DFT functionals struggle to describe the electronic structure of the [CuOH] 2+ singlet correctly.

Discussion
In this section, we compare the three intermediates in terms of calculated spin-densities and relate these findings to the findings from the previous section. We also comment on the expected accuracy for CASPT2 for spin-state splittings of transition metals. gives per definition all zero). In addition, Table 5 includes the total Mulliken 3d occupations from the copper atom, which for all intermediates and spin-states are around 9, also in good correspondence with a Cu(II) ion.
Similar to the spin-densities obtained from CASSCF wavefunctions, the DFT calculations show large spin-density on both copper and the oxygen ligands. Together with the total doccupations around 9 to 9.5, this is in accordance with a Cu(II) interpretation. However, the variation among the three intermediates is larger for DFT than for CASSCF. We note that the open-shell singlet spin-states ( 1 [CuO 2 ] + and 1 [CuO] + ) calculated with DFT brokensymmetry calculations -contrary to a CAS wave function -give rise to a (non-physical) spindensity. While this is a known artifact from the broken-symmetry approach, the resulting spin densities are nevertheless reported in Table 5 to confirm that we obtained a brokensymmetry state. Table 5: Mulliken spin-densities employing either a CAS wavefunction (with ANO basis set) or DFT (with def2-TZVPP basis set). For Cu, the total d-occupations are given in parentheses. The atom labels are shown in Figure 2. TPSS also has the smallest maximum error (31 kJ/mol), whereas B3LYP gives the largest error (63 kJ/mol).
Although CASPT2 is known to be a highly accurate quantum chemical model, several studies have shown that it occasionally overestimates the stability of high-spin states. [73][74][75] This was recently ascribed to the fact that CASPT2 can be less accurate for the correlation originating from the metal 3s3p orbitals. 76 This would imply that for [CuO 2 ] + and [CuOH] 2+ (for which the singlet was most stable with CASPT2), the singlet may be even lower than obtained here, while the splitting for [CuO] + (where the triplet was most stable) could be smaller than calculated here. The effect was estimated to 8-12 kJ/mol in the benchmark study by Pierloot et al., 76 but it should be noted that the employed benchmarks set did not include any LPMO active-site model (or any other copper system). It should further be noted that a Dunning type basis set was recommended in Ref. 76, prompting us to apply both ANO and Dunning-type basis sets. However, the obtained differences between the two types of basis sets were very small for our LPMOs models (less than 1 kJ/mol).
A remedy for inaccuracies in the 3s3p correlation with CASPT2 was recently suggested in which the 3s3p correlation was obtained employing a coupled cluster wavefunction. 77 Yet, due to the multiconfigurational nature of the complexes in this paper, we have refrained from this procedure.

Conclusion
In this work, we have investigated three intermediates of the LPMO enzyme with CASPT2,  32 The first species is expected to be a precursor for generation of the two later intermediates, which probably are involved in the abstraction of a hydrogen atom from the polysaccharide substrate.
To obtain a first indication of the accuracy of DFT, we have calculated spin-state energy splittings of the three species, which can attain both singlet and triplet spin states; both spin states are suspected to be involved in the catalytic turnover.
Comparing the obtained energies (cf. and produce a qualitatively wrong (closed-shell) description of the singlet state.
These results show that care must be taken when employing DFT for LPMOs. In future studies, we aim at employing more accurate methods also for reactions involving [CuOH] 2+ in abstraction of hydrogen from the polysaccharide substrate.

ANO Dunning
Cu