Error-consistent segmented contracted all-electron relativistic basis sets of double- and triple-zeta quality for NMR shielding constants

We present property-tailored all-electron relativistic Karlsruhe basis sets for the elements hydrogen to radon. The modifications described herein use at most four additional primitive basis functions and re-optimized contraction coefficients of the inner-most segment. Thus, the shielding constants are improved while maintaining the compactness of the basis set. A large set of 255 closed-shell molecules was used to assess the quality of the developed bases throughout the periodic table of elements.


Introduction
All-electron relativistic quantum chemistry has witnessed tremendous progress in the last decade. [1][2][3][4][5][6] Especially the development of the (one-electron) exact two-component theory (X2C) 7-14 made all-electron relativistic calculations widely applicable. Here, the basis set expansion in one-particle wave functions is carried out and exploited right from the beginning. The electronic and positronic states are decoupled by block diagonalization of the four-component Dirac matrix leading to an electrons-only two-component Hamiltonian. Thus, the X2C Hamiltonian is only available in matrix form and not within an operator formalism like e.g. in Douglas-Kroll-Hess (DKH) [15][16][17] theory. In quantum chemistry, Gaussian-type orbitals (GTOs) are ''usually'' chosen for the basis expansion. In this way, the tasks are reduced to the integral evaluation and algebraic operations, which are efficiently carried out on contempory computer architectures. The employed basis sets are characterized by the fixed exponents of the Gaussian functions. For economic reasons, a number of so-called primitive Gaussian-type orbitals (pGTOs) are combined to form a contracted Gaussian-type orbital (cGTO). The conceptionally easiest way is to use the same set of pGTOs to construct every cGTO. Well-known examples of this general contraction scheme for relativistic theories are the atomic natural orbitals basis set, ANO-RCC, [18][19][20][21] and the bases of Dyall, [22][23][24][25][26][27][28][29] which are also employed in ''fully'' relativistic fourcomponent calculations. On the contrary, atomic shells can also be described by segments. Here, the first cGTO describes the 1s shell and the inner part of the 2s shell. The part of the 2s shell beyond the first radial node is then fitted by the second cGTO and so on. This segmented scheme is exploited in the Karlsruhe X2C 30 and Sapporo bases. [31][32][33] In X2C and familiar theories such as the normalized elimination of the small component (NESC), 34,35 the Hamiltonian is evaluated in the primitive basis space since the contraction coefficients of the electronic and the positronic states differ significantly. The Hamiltonian is contracted after the relativistic decoupling to be used in the self-consistent field (SCF) procedure. Thus, the contraction coefficients have to be determined for this approach. For the Karlsruhe X2C bases, exponents and contraction coefficients are optimized simultaneously within the one-component X2C approach employing the finite-nucleus model. Basis sets are usually developed by minimizing the total energy with respect to the exponents and the contraction coefficients. 30,[36][37][38][39][40][41][42][43] In relativistic quantum chemistry, the finite nucleus model based on a Gaussian charge distribution 44 is chosen instead of the point charge model to remove the singularity at the nucleus. For basis set optimizations, the finite nucleus approach is mandatory; otherwise the exponents become extremely large, but just model the non-physical singularity of the point-charge approach.
Bases developed in this way can be safely used for energyrelated properties such as bond energies, geometries and harmonic frequencies. 41 The chemically most important region of the atoms are the valence shells. Consequently, so called valence basis sets use a smaller number of basis functions in the core region and thus reduce the computational effort. Typically one cGTO is used for each inner shell (or segment) and X cGTOs are used for the valence shells. Further, sets of higher angular quantum number are needed to describe polarization and -in case of post-Hartree-Fock methods -correlation effects. Often their type and number is chosen according to a strict correlation consistent scheme. 45,46 The Karlsruhe basis sets 30,[41][42][43] follow these prescriptions concerning contraction and polarization schemes less strictly. They rather focus on consistent errors in structure parameters, bond energies and selected properties in molecular calculations for the entire periodic table.
Electric properties like higher-order multipole moments, polarizabilities and hyperpolarizabilities require additional diffuse functions. 47 These basis extensions can still be determined variationally, for instance, by maximizing atomic Hartee-Fock polarizabilities in analytical derivative theory. In contrast, magnetic properties such as nuclear magnetic resonance (NMR) shielding and spin-spin coupling constants depend on the density in the vicinity of the nuclei. Thus, additional steep basis functions may be needed, 48,49 in particular for relativistic calculations, where the density is subject to spatial contraction. In this work, we examine the present shortcomings of the Karlsruhe x2c-XVPall (X = S, TZ) 30 bases for the calculation of NMR shielding constants and present modifications to overcome these.

Design and optimization of the basis set extensions
Bases optimized for NMR shifts cannot be obtained based on the variational principle and the Newton-Raphson algorithm only. Instead a large even-tempered (ET) basis set 30 with a factor of ffiffiffiffiffi 10 4 p between its exponents up to tight functions was chosen as a reference herein. The property-tailored basis sets, x2c-SVPall-s and x2c-TZVPall-s, were constructed in several steps focusing on the valence and the core region.
At first, errors in the valence region of the existing all-electron relativistic Karlsruhe basis sets with respect to the ET bases were determined for a large set 50 of more than 250 molecules. This set is based on a previous test set, which was prepared earlier 41,51 and contains nearly every element in its common oxidation state. The corresponding structures were optimized at the BP86/def-SV(P) 36,38 level using ECPs for the heavy elements. Herein, we only study closed-shell compounds. The Cartesian coordinates of the corresponding molecules are given in the ESI. † All calculations were performed with our recent implementation to calculate NMR shielding tensors 52,53 in TURBOMOLE [54][55][56] employing the finite nucleus model for both the scalar potential and the vector potential. PBE functional 57 was selected. Symmetry was not exploited to check for basis set deficits being reflected by slightly different results for symmetry equivalent nuclei. The SCF convergence thresholds in all calculations were 10 À8 E h . All DFT calculations use grid 4a for the exchange-correlation potential, which was developed in the course of this work. It has been noted previously 58 that all-electron relativistic DFT calculations may require extended grids for numerical integration. So, grids optimized in nonrelativistic or effective core potential (ECP) calculations may not be sufficient. We have investigated the impact of the number of radial points in X2C-DFT calculations of atoms and molecules (details are given in the ESI †) and found that extended grids (indicated by the appendix ''a'') can be obtained by determining the number of radial grid points, r, for an atom based on the atom number, Z, according to where n is a parameter describing the radial grid for the Gauss-Chebyshev method. For the grids 1-5, n is given by 1, 2, 3, 6, 8. 59 The results for the x2c-TZVPall basis sets, the higher polarized variant, x2c-TZVPPall, and the variant for two-component calculations, x2c-TZVPall-2c, that accounts for spatial splitting of inner shells into subshells l AE s (induced by spin-orbit coupling) by additional GTOs, are shown in Fig. 1. It comes as no surprise that the error increases from light to heavy elements and so does the standard deviation. Additional polarizing functions lead to only minor improvements, while the spin-orbit extensions result in a significantly better agreement with the results obtained with the ET bases. However, the standard deviation for the 6p group is still rather large. The x2c-TZVPall-2c basis set employs roughly 30% more functions to describe the spin-orbit splitting. Most of these functions are of p and d type. While the extensions clearly improve the results, the increased size is somewhat disadvantageous. Nevertheless, the spin-orbit extensions of the x2c-TZVPall-2c bases provide a reasonable starting point for the choice of more economic extensions. The additional functions are employed as primitives to ensure a higher flexibility. For the heavy elements, functions with a larger exponent (280 o z o 330 for the 6p group) are added to flexibilize the description of the density in the outer-core region. The exponents were optimized with the test set above in several cycles reducing the mean absolute error in the NMR shieldings and form the x2c-TZVPall-s basis set. Exponents for cerium to ytterbium were obtained by systematically increasing that of lanthanum and for luthenium by decreasing the exponents of hafnium.
These extensions mainly correct the outer-core and the valence region. To improve the description of the inner-most shells tight p functions were introduced. In line with ref. 49, a single additional p function was added for each element. Therefore, the exponent of the inner-most primitive function of the parent x2c-SVPall or x2c-TZVPall set was scaled with a factor of 6.5 (test calculations have shown that a factor between 5 and 8 is well suited). Moreover, the outer-most primitive was excluded from the segment and utilized as augmenting function to increase the flexibility. Thus, the contraction coefficients of the new segment need to be re-optimized at the X2C level of theory. The X2C Hamiltonian, h X2C , is evaluated in the primitive space and contracted after the relativistic decoupling. Hence, the relativistic one-electron Hamiltonian, h, reads where the matrix C consists of the coefficients, c im , for the contracted function, f i , featuring the primitive functions, w m . The derivative of the electronic energy 60 with respect to the contraction coefficients (l = c im ) then becomes Here, P and G are the one-and two-electron density matrix, respectively. Z denotes the energy-weighted density matrix. All quantities except the relativistic Hamiltonian, h l , are identical to the non-relativistic limit and are readily available. 36,37 The derivative of the matrix elements of the contraction matrix, C l , is either zero or one. Thus, analytical gradients can be implemented by proper modification of the contraction routines, which also form the trace of the Hamiltonian with the density matrix. The contraction coefficients are optimized in atomic calculations utilizing a quasi-Newton algorithm based on the variational principle. The electronic state of the atoms 30 was prepared by exploiting symmetry constraints in a restricted open-shell Hartree-Fock (ROHF) 61 calculation with SCF and density thresholds of 10 À10 E h and 10 À10 , respectively. The unrestricted Hartree-Fock (UHF) method was used for the lanthanides. The molecular orbital coefficients of the decontracted p shell formed the starting point for the optimization. Table 1 shows the resulting basis set pattern. All exponents of the x2c-TZVPall-s basis set differ at least by a factor of 1.5 to ensure a solid SCF convergence behavior. Furthermore, the x2c-TZVPPall-s is constructed by augmenting the x2c-TZVPall-s basis with the usual polarization functions. 41 Test calculations have shown that the contraction pattern of the p functions of the x2c-TZVPall-s set of sulfur and chlorine (421111) needs to be modified compared with aluminum, silicon and phosphorus (511111) in order to significantly improve the shielding constants. This also holds for the x2c-SVPall-s bases where the pattern is changed from (5111) to (4211).
The existing auxiliary bases 30 for the resolution of the identity fitting approximation for the Coulomb term (RI-J) 38,62-65 can be used. Here, only the total density needs to be fitted. This does not hold for the RI approximation to the exchange integrals 66,67 (RI-K) or the application of the RI method to post-Hartee-Fock or post-Kohn-Sham theory, 51,68 where the products of the basis functions have to be approximated.

Documentation of accuracy and comparison to other relativistic basis sets
For the evaluation of the accuracy, the same test set as above is used. Further, the developed basis sets are compared to the contracted Sapporo triple-z 31,33 (Sapporo-TZP-2012 and Sappro-DKH3-TZP-2012, here abbreviated as Sapporo-TZP) and Dyall's uncontracted triple-z valence [22][23][24][25][26][27]29 (Dyall-VTZ) set. Decontraction was done as suggested by Dyall to decrease linear dependency, e.g. for La the second d function from the SCF set was omitted, and Dunning's uncontracted cc-pVTZ bases 46 were used for the lighter elements (1s to 3p, and 3d). All calculations were performed as specified in Section 2 and statistically evaluated. The individual shielding constants are given in the ESI. † There, the shielding constants and the corresponding deviations with different bases are further compared to the range of the experimental NMR shifts and the corresponding double-z basis sets are discussed. Equivalent shielding constants were counted only once. We calculated the mean absolute error (MAE) and the standard deviation of the difference of NMR shielding constants, Ds, for each group instead of a percentwise error, |Ds/s ET |, as done in ref. 49, since the denominator is very large for heavy elements. The results are listed in Table 2.
Even for the light elements (1s to 2p) the standard deviation and the mean absolute error are significantly reduced. Here, the additional tight p function and the re-optimized contraction are the main improvement. The Dyall-VTZ and the Sapporo-TZP basis sets do not feature such tight p functions and show somewhat larger errors. The ANO-RCC-unc bases perform well except for the 4s group, where a large error due to calcium occurs. The ANO-RCC-unc basis set does not employ very tight p functions. Here, the high flexibility allows for an accurate description of the density in the core and the valence region.
For the heavier elements, the x2c-TZVPall-s set shows a rather small standard deviation and is thus error balanced. The main improvement here is achieved by the higher flexibility. In contrast, the Sapporo bases show again larger errors and deviations for the d elements as well as for the 6p group. There the error amounts to 657 ppm while the other sets show errors of 189 ppm (ANO-RCC-unc), 84 ppm (x2c-TZVPall-s) and 111 ppm (Dyall-VTZ). The errors of the latter are smaller than the impact of different functional classes (PBE vs. PBE0). The large error of the Sapporo-TZP set cannot be addressed to a single molecule as all except for RnF 2 show errors larger than 500 ppm. For the 3d and 4d elements the error mainly stems from Cu, Mo, Tc and Ru compounds. Table 2 Mean absolute error and standard deviation of various triple-z basis sets and the uncontracted ANO-RCC bases compared to the large eventempered reference. The number of symmetry non-equivalent nuclei for each group is further given (n g ) In general, the Dyall-VTZ and our tailored basis sets are in rather good agreement. It should be noted that the Sapporo and Dyall's basis sets feature considerably more functions. In total, the Sapporo basis set employs more than 9000 functions for the elements up to Rn whereas x2c-TZVPall and the x2c-TZVPall-s use roughly 6000. For Rn, the x2c-TZVPall-s set uses (24s26p15d8f) functions which are contracted to [11s10p7d3f], that is less than for the x2c-TZVPall-2c set, (24s35p17d8f) contracted to [11s11p9d3f]. The Sapporo-TZP and the uncontracted Dyall-VTZ set consist of (27s21p15d12f2g) functions contracted to [10s9p7d4f1g] and (30s26p17d11f), respectively. To compare with, the ANO-RCC-unc and the even-tempered set employ (25s22p16d12f4g) and (47s35p25d18f8g) functions. Computation times with the different bases are given for W(CO) 6 in Table 3, which is one of the largest molecules of the test set. Note that the x2c-TZVPall and the x2c-TZVPall-2c set is identical for light elements such as carbon and oxygen. Therefore, the employment of the x2c-TZVPall-s results in higher computational demands for W(CO) 6 but also yielding more accurate results. The CPU times of the Karlsruhe basis sets are similar and differ by only one minute. The use of the Sapporo bases results in an increase of the computation time by a factor of 1.7 whereas the Dyall basis only leads to an increase by a factor of 1.2. This is due to the X2C response (ca. 70% of the total time) in the primitive space. Upon application of the diagonal local approximation to the unitary decoupling, 52,69,70 the two-electron integrals are the most demanding part of the NMR shielding calculation and contracted basis sets will become more advantageous. Then, calculations employing the Sapporo and the Dyall basis set take a factor of 1.6 longer compared to the x2c-TZVPall-s bases.
To allow for a simple comparison of the accuracy and the size of the basis set throughout the periodic table of elements, we propose a weighted overall error, Z, which considers the range of the shieldings, w g , and the number of symmetry non-equivalent nuclei, n g , of each group, g. The following nuclei are chosen for the determination of w g : H (1s), Be (2s), O (2p), Mg (3s), S (3p), Ca (4s), Cr (3d), Se (4p), Sr (5s), Mo (4d), Sn (5p), Ba (6s), W (5d), Pb (6p). The weighted overall error then follows as where n tot is the total number of symmetry non-equivalent nuclei in the NMR spectra. For instance, W(CO) 6 contributes to the 2p group (C, 0) and the 5d group (W). Eqn (5) ensures that the larger MAE of the heavy elements can be compared to ones of the lighter nuclei. The weighted overall errors for different basis sets and the ratio of the contracted basis functions in the spherical atomic orbital basis with respect to the ET reference are displayed in Fig. 2. The additional functions do not significantly increase the size of the basis set but clearly improve the performance of the x2c-TZVPall bases. The x2c-TZVPall-s basis set also leads to slightly better results compared to the uncontracted Dyall-VTZ bases while employing only half of the functions for the test set. The Sapporo-TZP basis set shows the largest error and its size lies in between the x2c-TZVPall-s and the Dyall-VTZ set. The uncontracted ANO-RCC bases are close to the ET reference in terms of both size and accuracy.

Comparison to other NMR-tailored bases
We further compare our tailored basis sets to the pcSseg bases 48,49 of Jensen, which are available only for elements up to krypton. Thus, only the corresponding subset of molecules are considered. Computational methods are the same as above, however, a percent-wise error as proposed in ref. 49 serves as the decisive quantity here. The errors are still measured with respect to the ET bases. Concerning the size, the x2c-TZVPall-s lies in between pcSseg-1 and pcSseg-2, and x2c-SVPall-s set is of nearly the same size as pcSseg-1. Moreover, the pcSseg-3 and pcSseg-4 bases employ functions with a higher angular momentum such as g and h functions. The percent-wise error of each group is listed in Table 4 while the individual shielding constants are given in the ESI, † and the weighted overall errors are displayed in Fig. 3. The weighted overall errors were computed as described in the last section. The pcSseg bases do not systematically approach the ET limit with increasing size for all groups due to the pcSseg-3 set.  The weighted overall error (see eqn (5)) is compared to the total number of contracted basis functions in the spherical atomic orbital (AO) basis.
The pcSseg-0 bases lead to large errors for the 2p and the 3d group. There, compounds of copper and fluorine are not well described. Moreover, the x2c-SVPall-s, the pcSseg-1 and the pcSeg-2 basis set show considerably large errors for copper. For the light elements (1s to 3p), the errors of our tailored triple-z bases mainly lie in between pcSseg-1 and pcSseg-2, as expected by its construction and size. For the heavier elements, errors of x2c-TZVPall-s clearly drop below pcSseg-2 and even below pcSseg-4 for the 4s and 4p group despite the missing g and h functions. Here, the additional primitive p of the 4p group or d function of the 3d group results in a significant improvement due to the additional flexibility. The x2c-SVPall-s basis set lies in between the pcSseg-0 and pcSseg-1 ones considering performance for the 1s up to the 3s group. However, the error of the 3d group is halved compared to the pcSseg-1 set. For the 4p group, it is the same as obtained with the pcSseg-2 bases. Overall, the x2c-TZVPall-s bases perform significantly better than the pcSseg-3 basis set but use less than half of the functions for the test set. The x2c-SVPall-s bases show a similar accuracy as the pcSeg-1 basis set and are of very similar size (total of 12292 functions vs. 12187).

Conclusions
We presented segmented contracted all-electron relativistic Karlsruhe basis sets for NMR shielding calculations optimized at the scalar X2C level employing the finite nucleus model for the scalar and the vector potential. The basis sets employ at most four additional functions and thus do not significantly enlarge the parent basis. The quality was assessed with a large set of 255 molecules.

Conflicts of interest
There are no conflicts to declare.  . 3 Comparison of NMR-tailored basis sets for molecules consisting of the elements H to Kr. The weighted overall error is compared to the total number of contracted basis functions in the spherical AO basis.