Diffusion Monte Carlo evaluation of disiloxane linearization barrier

The disiloxane molecule is a prime example of silicate compounds containing the Si-O-Si bridge. The molecule is of significant interest within the field of quantum chemistry, owing to the difficulty in theoretically predicting its properties. Herein, the linearisation barrier of disiloxane is investigated using a fixed-node diffusion Monte Carlo (FNDMC) approach, which is currently the most reliable {\it ab initio} method in accounting for an electronic correlation. Calculations utilizing the density functional theory (DFT) and the coupled cluster method with single and double substitutions, including noniterative triples (CCSD(T))are carried out alongside FNDMC for comparison. Two families of basis sets are used to investigate the disiloxane linearisation barrier - Dunning's correlation-consistent basis sets cc-pV$x$Z ($x = $ D, T, and Q) and their core-valence correlated counterparts, cc-pCV$x$Z. It is concluded that FNDMC successfully predicts the disiloxane linearisation barrier and does not depend on the completeness of the basis sets as much as DFT or CCSD(T), thus establishing its suitability.


INTRODUCTION
The simplest molecule containing the Si-O-Si bond is disiloxane or Si 2 H 6 O. Also called disilyl ether, its structure is a single Si-O-Si bond terminated by three H atoms at each end (H 3 Si-O-SiH 3 ). There have been numerous studies investigating the Si-O-Si bond, particularly owing to its importance in the modelling of silica compounds, which are the most abundant constituent of the Earth's crust. Most importantly, silica compounds range in function from glasses to quartz crystals, both of which comprise large sectors in industry. In some studies, disiloxane has been used as a sealant and as a component in cosmetics, or as a prototype region of a zeolite or clay substrate for applications ranging from catalysis to prebiotic synthesis. 1 Experimental evaluations of the Si-O-Si bond indicate an anharmonic bending potential with a low linearisation barrier, which makes it quite difficult to attain sufficient accuracy in such measurements. 2,3 Despite the significant volume of previous studies dedicated to the Si-O-Si bond, 1,4-10 the properties of Si-O-Si bond obtained in most of these studies are not consistent with each other. In ab initio studies, in particular, multiple calculation methods have resulted in different values for the bond angle and length, 9,11,12 as well as the linearisation energy 4,5,8 and Si-O-Si potential energy surface, 1,9 among other factors These properties and the bond geometry itself are shown to be sensitive to the choice of basis set and level of theory (the former more than the latter) according to at least one previous study. 8 To narrow down the possibilities, it would be ideal to employ the most reliable methods at our disposal. Quantum Monte Carlo (QMC), currently the most reliable of the many-body calculation methods, is expected to provide a reasonable and reliable prediction. 13 We used the fixed-node diffusion Monte Carlo (FNDMC) method, which has been widely and successfully applied to several molecular systems. [14][15][16][17][18][19][20][21] Although FNDMC results are also affected by the choice of basis sets, 18,22 we note that the basis-set dependence is considerably different from that for an SCF-based method, such as the density functional theory (DFT) and molecular orbital (MO) methods. In SCF-based approaches, the choice of basis set affects both the amplitude and the nodal positions of the corresponding many-body wavefunctions (although the methods do not explicitly employ a many-body wavefunction condition). With FNDMC, by contrast, the choice only affects the nodal positions. The amplitude can be automatically adjusted such that its shape may approach that of the exact solution as closely as possible under a restriction with a fixed nodal position. 13,23 A typical example is the description of electron-nucleus cusps. 24 Even when using such a poor basis set that cannot describe the singularity of the cusp based on its analytical form, an initial guess is applied for further numerical evolution driven by the FNDMC, making the amplitude at the nucleus positions singular with a cusp. 13,23 Owing to this self-healing property of the amplitude, the dependence on the choice of basis sets in FNDMC becomes considerably weaker. That is, the bias arising from the choice is reduced more than that for SCF-based ones, which is a difficulty faced by the present systems. 8 Note that "cuspless" basis sets cause a singularity of the local energy in FNDMC, thus resulting in numerical instability. However, this sin-gularity can be easily remedied by introducing the cusp correction proposed by Ma et al. 24 In this study, we applied FNDMC to investigate the basis-set dependence of the linearization energy of a disiloxane molecule in comparison with other ab initio methods including DFT and CCSD(T), as well as empirical measurements from earlier studies conducted on disiloxane. 2,3,25 The FNDMC depends on the basis set implicitly through its fixed nodal surface provided by the Slater determinant whose orbitals are expanded in terms of the basis set, whereas DFT and CCSD(T) strongly depend on the choice of basis set. The cc-pVxZ [26][27][28] and cc-pCVxZ 29 basis sets (x = D, T, Q), which are standard correlation methods, were examined in the present study and then applied to all the ab initio calculations to investigate the basis-set dependence of linearisation energy evaluation.

MODEL AND METHODOL-OGY
Our target property is simply a linearisation barrier of disiloxane Si 2 H 6 O between "linear" and "nonlinear" (bent) structures. As can be seen from Figure 1, the linear structure possesses an eclipsed D 3h symmetry 7 and a non-linear structure is a bent conformation of the C 2v symmetry. 30 For the two fixed structures, the barrier, ∆E barrier , is defined as the energy difference between the linear and nonlinear structures: where E linear and E delinear are the (electronic) total energies of the linear and non-linear structures, respectively.
Linear The accuracy of computed ∆E barrier values can be calibrated by referring to experimentally observed ∆E barrier values. To compare our computational results with the experimental results, however, we note that the experimental ∆E barrier involves the zero-point energy (ZPE) contribution, whereas our computational ∆E barrier value considers only the electronic contribution. The ZPE contributions can also be evaluated at each level of theory, although their accuracy depends on methods adopted and the basis sets. To investigate the basis-set dependence of ∆E barrier , we simply consider the electronic ∆E barrier . Thus, the experimental ∆E barrier values are corrected by adopting a theoretical estimate from Koput. 6 We address this issue in more detail later.
In the present study, the above two energies E linear and E delinear were computed through various combinations of (1) ab initio methods and (2) basis sets (1) Our methods applied are DFT with B3LYP exchange-correlation functional 31 (DFT-B3LYP), CCSD(T), 32 and FNDMC 13 with trial wavefunctions generated from DFT-B3LYP. Within the framework of DFT, DFT-B3LYP is a standard method for covalent systems. Within the correlated methods, CCSD(T) is known to be the "gold standard" in quantum chemistry. A variety of methods and basis sets were chosen in line with the previous results 8 indicating dependence of the Si-O-Si bond description on such choice. FNDMC is known to be comparable with CCSD(T); however, for the first time, we applied FNDMC to an evaluation of ∆E barrier .
(2) Our basis sets are a family of Dunning's correlation-consistent basis sets (cc-pVxZ; x = D, T, Q) [26][27][28] and their core-valence correlated variants (cc-pCVxZ; x = D, T, Q). 29 The cc-pVxZ and cc-pCVxZ basis sets were originally developed for correlated methods such as CCSD(T), but have been found to be appropriate even for DFT-B3LYP and FNDMC in properly reproducing the dynamic electron correlation. 26 In particular, the polarisation functions included implicitly in cc-pVxZ are essential for properly describing the Si-O-Si bond. To describe the correlation effects more precisely, the present study considers the dynamic correlation between core and va-lence electrons by applying the cc-pCVxZ basis sets, which can reproduce the core-valence correlation by minimising the difference in the correlation energies between all-electron and valenceonly (using pseudopotentials) calculations. 33 The number of basis functions entailed in each basis set is shown in Table 1. As shown, accounting for the core-valence correlation is more expensive for increasingly complete orbital and polarisation functions, with a quadruple zeta-level basis set adding close to 130 more basis functions to the standard correlation-consistent basis set. The number of basis functions is expected to largely correlate with the reliability of the calculations, particularly for the core-valence variants pertaining to all-electron calculations. In the present study we obtained the linear and delinear structures using the second-order Møller-Plesset perturbation theory (MP2) with the cc-pVQZ basis set (MP2/cc-pVQZ level), and previous studies conducted at the MP2/cc-pVQZ level of theory 7,11 have established a reliable favourability toward a delinear structure and provide good agreement for the Si-O-Si angle in accordance with experimental results. 30 Modelling of the Si-O-Si angle in particular has been heavily emphasised in previous theoretical studies for both disiloxane 8 and pyrosilisic acid. 9,10 This necessitates even larger numbers of basis functions, which means that the cc-pCVQZ basis set would necessitate the highest number of basis functions by far. The two optimised structures at the MP2/cc-pVQZ level of theory were used in this work to calculate all ∆E barrier values, i.e., common to all levels of theory, where the linear structure has a Si-O-Si angle of 180 • and the delinear structure has an optimised Si-O-Si angle of 146.8 • comparable with the experimental value of 144.1 • . 30 The other structural parameters are given in the Supporting Information, and are also in good agreement with experiments.
The present study adopted no pseudopotential calculations, but rather all-electron calculations with a total of 42 electrons for a single disiloxane molecule. This molecular system is not so large that the CCSD(T) calculation within the frozen core approximation is feasible, despite the computational cost of CCSD(T) scaling as N 7 , where N is the number of electrons in the system. By contrast, the DFT cost scales with N 3 (or less) and is therefore unimportant. Similar to DFT, the FNDMC cost scales as N 3 , although the prefactor of FNDMC is much larger than that of DFT. This is because a vast number of random sampling points are required to obtain a sufficiently small error bar (sub-chemical accuracy of ∼ 0.1 kcal/mol) to calibrate a small ∆E barrier (∼ 0.5 kcal/mol). Recent parallel computers, however, enable us to apply FNDMC to evaluate such a tiny ∆E barrier , because its algorithm is intrinsically parallel.
In our FNDMC calculations, we adopted Slater-Jastrow type wavefunctions as their fixed-node trial wavefunctions. 34 Molecular orbitals entering the single Slater determinants were generated by DFT-B3LYP with various types of basis sets. The Jastrow factor consists of one-, two-, and threebody terms 35 including 88 variational parameters in total. These parameters were optimised through a variance minimisation scheme, 36 and only the two-body term holds the electron-electron cusp condition. 37 The electron-nucleus cusp condition, which is a short-range one-body correlation effect, is satisfied by the cusp correction scheme applied to the Gaussian basis sets 24 instead of imposing the cusp condition on the one-body term. Note that the Jastrow factor does not change the (fixed) nodal surfaces and is responsible for the numerical stability in FNDMC. However, the quality of the nodal surfaces determines the accuracy of the FNDMC energies in terms of the fixed-node variational principles. 38 Accordingly, the FNDMC accuracy depends implicitly on the basis set adopted, which is used to expand the molecular orbitals entering the Slater determinant. In addition to the fixed-node error, another source of bias in actual FNDMC calculations arises from the short-time approximation with finite (small) timesteps. 39 To remove this bias, it is common to use multiple timesteps to make the linear regression and obtain calculation results for δt → 0. This regression is an approximation of the theoretical δt = 0 result. The present study considers both linear and quadratic extrapolations.
The software package Gaussian09 40 was used to conduct the MP2 (geometry optimisation), DFT-B3LYP, and CCSD(T) calculations, whereas the CASINO 41 code was used to apply the FNDMC calculations. An electron-nucleus cusp correction scheme 24 for Gaussian orbitals was utilised in the all-electron FNDMC calculation in CASINO. In addition, cc-pCVxZ (x = D, T, Q) basis sets for Si and O as well as the corresponding cc-pVxZ versions for the H atoms during the same calculations were obtained from the online Basis Set Exchange library. 42 In FNDMC for both structures, the number of target population was 11,520, and the number of steps in the imaginary-time evolution was set to 2,000 and 500,000 for equilibrated and accumulated phases, respectively. In addition, we carried out FNDMC calculations with different timesteps of 0.01, 0.005, and 0.001 to remove the short-time bias (see SI for more details).

Geometry and ZPE contribution
Several studies have been conducted on the geometry of the disiloxane molecule, although most ab initio methods tested did not manage to replicate the available experimental results. Rather than the Si-O bond length, the Si-O-Si bond angle and linearisation barrier have been found to be greatly dependent on the choice of basis set used to represent the wavefunction. 8 This study focuses on the linearisation barrier of disiloxane, taking the same optimised geometries for all calculations, partly owing to the notorious difficulty in optimising the geometries in FNDMC.
The computational and experimental linearisation barriers of disiloxane, ∆E barrier , defined in Eq. (1) are listed in Table 2 and plotted in Figure 2 as a visual guide. It has been commonly observed experimentally that the non-linear (bent) structure of disiloxane is energetically more favourable than a linear structure, which translates to a positive linearisation barrier. All three experimental results seem to agree on this point, and 25 in particular reports a higher linearisation barrier (at 1.1 to 1.4 kcal/mol) than the other two results, 2,3 which report a barrier of approximately 0.3 kcal/mol. Both of these measurements are more recent than the first and achieve good consilience with the DFT predictions from the highest quality basis sets. 8 Therefore, it is reasonable to infer that a linearisation barrier of approximately 0.3 kcal/mol is a reliable value for disiloxane.  It needs to be stressed, however, that the experimentally measured linearisation barrier may not be comparable to the ground state values calculated from the first principles. The ground state energy obtained through ab initio calculations are physically unobtainable experimentally because of the ZPE, i.e. the difference between the ground state and the lowest energy vibrational state. Even at absolute zero temperature, the lowest energy level achievable is a vibrational state 1 instead of the electronic ground state 0 : Consequentially, any energetic barriers measured in the experiments is at best the difference between the lowest vibrational states ∆ 1 , whereas energetic barriers calculated through ab initio methods are from the electronic ground states ∆ 0 . Therefore, a comparison between energetic barriers obtained from the theoretical calculations and experimental measurements must also account for the difference in ZPE. The energetic barrier of a transition from quantum state A to state B is calculated as follows: The difference in ZPE ∆ZPE between the initial and final states A → B is usually considered insignificant. However, barrier height discrepancies on the order of 0.1 kcal/mol might well be caused by this term.
The preceding theoretical study by Koput 6 estimates a ∆ZPE value of −20 cm −1 ≈ −0.06 kcal/mol, which in line with Equation 3 should result in a higher ground state linearisation barrier than the experimentally measured result. Therefore, this study treats the ground state linearisation barrier of 0.36 kcal/mol as a reasonably accurate "exact" linearisation barrier for a point of comparison with ab initio calculations. This value is referred to as "ZPE-corrected Raman" in Ta

DFT-B3LYP results
Both the cc-pVxZ and cc-pCVxZ basis sets show positive linearisation barriers (except for cc-pCVTZ), energetically favouring the non-linear over the linear conformer of disiloxane. The simplest basis set, i.e. the double zeta cc-pVDZ basis set, results in the highest barrier of all (0.80 kcal/mol), in certain agreement with the far infrared absorption spectra of gaseous disiloxane. 25 The core-valence correlated counterpart, i.e. the cc-pCVDZ basis set, also produces a high linearisation barrier of 0.78 kcal/mol. Triple zeta basis sets (cc-pVTZ and cc-pCVTZ) show a significant reduction in the barrier height, eventually converging to the quadruple zeta set results. These results reinforce the conclusions drawn in preceding studies [5][6][7][8]11 that the basis sets at the double zeta level are insufficient to properly describe the disiloxane molecular properties, and that the triple zeta level is the minimally reliable level for the description of the atomic orbitals. The trends of the linearisation barrier, meanwhile, is nearly identical for both standard and core-correlated variants (Figure 4), with higher quality sets producing good agreement with two of the three available experimental results. 2,3 The results for DFT-B3LYP in this study, particularly the converged results for the quadruple zeta basis sets cc-pVQZ and cc-pCVQZ, seem to indicate that previous agreement between the B3LYP results and experimental data 8 is due to coincidence instead of convergence toward the complete basis set limit. Indeed, for the largest basis sets used in this study, the linearisation barrier calculated by B3LYP is at best approximately 0.1 kcal/mol smaller than the ZPE-corrected measurement. It does not seem that the B3LYP exchangecorrelation functional provides adequate accounting of the electron correlation for properly describing a disiloxane molecule.

CCSD(T) results
CCSD(T) calculations were conducted with the same geometries and basis sets as the B3LYP calculations. The results of these calculations are shown in Table 2. The trends of the linearisation barrier agree almost perfectly with B3LYP calculations, with a universal shift in the energetic favourability toward the non-linear conformer denoted by the larger linearisation barriers. CCSD(T) calculations seem to converge to values closer to the ZPE-corrected Raman linearisation barrier of 0.36 kcal/mol. The cc-pVxZ basis sets converge to a linearisation barrier of approximately 0.44 kcal/mol, whereas the core-valence correlated set cc-pCVxZ converges to a barrier height of approximately 0.35 kcal/mol, in excellent agreement with the benchmark value. We also achieved good agreement with another CCSD(T) calculation in an earlier study 8 reporting a linearisation barrier of 0.48 kcal/mol using the cc-pVTZ basis set. These results show that accounting for the electron correlation is indeed necessary to properly model the Si-O-Si bond in disiloxane, and supports the conclusion of earlier studies citing cc-pVTZ and CCSD(T) as the minimum reliable level of description for the disiloxane molecule. 7,8

FNDMC results
The linearisation barrier calculated by FNDMC is also shown in Table 2 and Figure 2. As with CCSD(T), the linearisation barrier values from FNDMC converge to a value close to the ZPEcorrected Raman benchmark adopted in this study of 0.36 kcal/mol for the largest basis sets. Also observed is the independence of the linearisation bar-rier calculated from the basis set used to form the initial trial wavefunction using DFT-B3LYP relative to the other two calculation methods. The influence of the different basis sets on the end result of the FNDMC calculations directly translates into how they affect the trial wavefunction nodal surfaces. The amplitudes of the trial wavefunction, meanwhile, do not affect the end result at (τ → ∞), which, in turn, limits the dependence of the end result on the basis sets used to form the trial wavefunction.
The quality of the trial wavefunction nodal surface (how close it resembles the true ground-state wavefunction nodal surface) is reflected in the total energy values of the FNDMC calculation (the fixed-node variational principle) . 13 Therefore, the expected values of the total energy in the allelectron FNDMC calculations (such as those conducted in this study) are good indicators of the quality of the trial wavefunction nodal surfaces, because all-electron FNDMC calculations retain the variational principle with respect to the total energy of the ground-state. These absolute values are shown in Table 3, sorted in accord with the quality of the nodal surface (from low quality, high total energy, to high quality, low total energy), and displayed in Figure 3. It can be observed that the cc-pCVTZ produces a nodal surface of slightly better quality than cc-pVQZ, indicating that the electron correlation between the core and valence electrons can be a significant factor for improvement beyond the triple zeta level of atomic orbital description.

Effects of basis sets
Disiloxane linearisation barrier dependence on the basis set was observed for all cases. In agreement with previous studies, 8 this dependence is more significant than the methodologies used in the ab initio calculations, particularly considering the double zeta-level basis sets cc-pVDZ and cc-pCVDZ. Triple zeta basis sets seem to offer the minimum level of description to reliably predict the linearisation barrier, whereas quadruple zeta basis sets result in an extremely good prediction, especially for the CCSD(T) and FNDMC calculations. Figure 2 clearly shows very similar trends for each calculation method with a substantial dependence on the basis set indicated, particularly towards the smaller sets. Convergence of the disiloxane linearisation barrier is generally observed for all three calculation methods, albeit not necessarily converging to the same value.
Previous theoretical studies suggest that this converging trend is attributed to the increasing addition of polarisation functions 8 within the basis sets used. Adding polarisation functions serves to better reproduce dynamical correlations in the system. This is in line with previous theoretical works with semi-empirical methods implying that calculations without electron correlation favour the linear structure, thereby resulting in negative values of linearisation barrier. This is also reflected in preceding geometry optimisation calculations without polarisation functions, resulting in Si-O-Si angles close to 170 • . 4,5 Therefore, it is expected that both cc-pVxZ and cc-pCVxZ basis sets should converge to a reliable predicted linearisation barrier value because polarisation functions are systematically included with an increasing description of the atomic orbitals.
From Figure 4, a comparison can be observed between the correlation consistent cc-pVxZ basis sets and its core-valence correlated counterpart cc-pCVxZ. The B3LYP and CCSD(T) linearisation barrier trends closely follow one another, with CCSD(T) cc-pCVxZ in particular converging to a nearly identical value as that of the ZPE-corrected Raman. Meanwhile, the FNDMC results show a slight difference in the trends between the two families when observing the respective expecta-  (1) tion values of the linearisation barriers. It should be noted, however, that the stochastic nature of FNDMC diminishes the significance of the energy trends (which need to be considered probabilistically). Figure 3 shows that FNDMC is overall less dependent on the basis set used compared to both B3LYP and CCSD(T). Although previous studies have recommended treating disiloxane with the cc-pVTZ basis set at a minimum, cc-pVDZ is shown to generate a sufficiently high quality nodal surface for use in DMC calculations, giving a linearisation barrier in good agreement with the experimental values. Even for the smallest basis sets tested in this study, FNDMC shows relatively more accurate values of the linearisation barrier, and therefore less dependence on the basis set used to form the trial wavefunction per the initial expectations.

Effects of methodologies
In line with previous studies, we find the dependence on the methods weaker than that on the basis sets. With increasing levels of basis sets, DFT-B3LYP and CCSD(T) values of the linearisation barriers converge to approximately 0.1 and 0.3-0.4 kcal/mol, respectively. This slight difference is in accord with the variance in experimental measurements 2,3,25 and is clearly less significant than the variance as a result of the basis sets.
Previous expectations on the trend in methodologies are derived from previous studies, 4,43 particularly the study by Koput in 1990, 4 in which the inclusion of the electron correlation proved vital to predicting the energetic favourability of the non-linear structure of disiloxane because the SCF calculation produced a near-linear structure of disiloxane. This and other studies 43 gave rise to the general expectation that the inclusion of an electron correlation is important to properly model the structure of disiloxane. As previously mentioned, it can be observed that the B3LYP exchangecorrelation functional is insufficient to properly recover the electron correlation and thereby predict the disiloxane linearisation barrier. Meanwhile, CCSD(T) and FNDMC calculations both converge towards the ZPE-corrected Raman value, indicating a proper accounting of the dynamical correlation.
FNDMC has the added advantage of being able to reliably predict the energetics of disiloxane with smaller basis sets. With better scalability for application in high performance computing (HPC) systems and the availability of pseudopotentials, FNDMC is a promising alternative method to CCSD(T) for describing the Si-O-Si bonds, par-ticularly for larger systems.

CONCLUSION
The Si 2 H 6 O linearisation barrier was calculated using three separate ab initio methods, DFT-B3LYP, CCSD(T), and FNDMC, with six different basis set choices in line with expectations derived from previous theoretical studies on disiloxane. Similar with previous studies, we observed that the systematic inclusion of polarisation functions, along with an increasing level of description for atomic orbitals, eventually result in a reliable prediction of the disiloxane linearisation barrier. All calculation methods eventually produced converged values with increasing level of basis sets for the linearisation barrier, with 0.1 kcal/mol for DFT-B3LYP and 0.3-0.4 kcal/mol for CCSD(T), respectively, in line with previous theoretical studies, 8 and similar expectation values of the barrier for FNDMC calculations. The agreement between the experimental measurements and the DFT-B3LYP results at 0.3 kcal/mol are shown to be likely accidental. ZPE-corrected experimental measurements are in good agreement with the CCSD(T) and FNDMC results, with the ground state linearisation barrier taken at 0.36 kcal/mol. FNDMC is shown to be least dependent on the choice of basis set among the three calculation methods applied, in line with initial expectations owing to the nature of FNDMC.