Nonseparable Exchange–Correlation Functional for Molecules, Including Homogeneous Catalysis Involving Transition Metals

! ! ELECTRONIC!SUPPLEMENTARY!INFORMATION!! for$a$paper$in$PCCP$entitled$ Nonseparable Exchange–Correlation Functional for Molecules, Including Homogeneous Catalysis Involving Transition Metals Haoyu S. Yu, Wenjing Zhang, Pragya Verma, Xiao He, and Donald G. Truhlar* Department of Chemistry, Chemical Theory Center, Inorganometallic Catalyst Design Center, and Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455-0431, USA.


Introduction
Kohn-Sham (KS) density functional theory has been very successful for electronic structure calculations in both physics and chemistry. 1The accuracy of KS calculations depends on the quality of the exchange-correlation functional.The quest for quantum mechanical methods that can be accurately applied to study atomic, molecular, and material properties has resulted in the design of exchange-correlation functionals with a variety of ingredients, costs, and accuracies, where the accuracy may depend strongly on the kind of property that is calculated.Exchange-correlation functionals that depend only on spin-up and spin-down electronic densities (r a and r b ) are known as local spin density approximations (LSDAs), and ones that depend on both the spin densities and spin density gradients are called gradient approximations (GAs, in particular GGAs and NGAs).More complicated functionals include ingredients calculated from the orbitals (which are functionals of the density), in particular spin-up and spin-down local kinetic energy densities (as in meta-GGAs and meta-NGAs), nonlocal Hartree-Fock exchange (as in hybrid functionals), and/or nonlocal correlation (as in doubly hybrid functionals, which have both nonlocal exchange and nonlocal correlation.).(One can also include nonlocal correlation without including nonlocal exchange.)Functionals depending only on local variables, such as spin densities, their gradients, and spin-specific local kinetic energy densities, are often called local (especially in the chemistry literature, while the physics literature often labels them as semilocal if they include density gradients or spin kinetic energy densities).
Even though the meta and nonlocal functionals can give more accurate results than GGAs and LSDAs, GAs are still of great interest for four reasons.First, GAs are widely implemented in many programs because of their ease of coding.Second, GAs often have better self-consistent field (SCF) convergence and smaller grid requirements than meta functionals.Third, calculations employing GAs are less expensive than calculations involving nonlocal functionals, with the difference being more pronounced for extended and large systems and when geometries are optimized.
The fourth reason for special interest in GAs is that local functionals often have better performance than hybrid functionals, on average, for systems with high multi-reference character.Multi-reference character is the extent to which a wave function is inherently multi-configurational so that a single Slater determinant does not provide a good starting point (reference function) for approximating the complete wave function.Although KS theory does not calculate the wave function of the interacting system, it does use a Slater determinant to represent the density, and calculating the exchange from the Slater determinant, as in Hartree-Fock exchange, can introduce static correlation error, a result of which is that it is often more challenging to obtain good approximations for multi-reference systems when Hartree-Fock exchange is included.(The unknown exact exchange-correlation energy functional includes nonlocal effects and does not have static correlation error, but the problem just mentioned is not completely solved by currently available functionals.)Multireference systems are sometimes called strongly correlated systems.Many open-shell systems and transition-metal systems have multi-reference character, and hence the ability to treat multi-reference systems is critical to the ability to treat many catalytic reaction mechanisms.Systems without high multireference character are called single-reference systems.
Most GAs have a form that separately approximates exchange and correlation, as first introduced by Langreth and Mehl 2 and usually called a generalized gradient approximation 3 (GGA); however, it has been shown that a nonseparable gradient approximation 4 (which has more flexibility at the cost of satisfying less exact constraints) is capable of performing well for a broader set of properties.The original NGA, called N12, was designed to give good predictions both of solid-state lattice constants and of cohesive energies and molecular atomization energies; it also gives good predictions of molecular bond lengths. 4Here we show that we can get improved performance for barrier heights (which are important for studies of both uncatalyzed and catalyzed reactions) by relaxing the accuracy for lattice constants, which are not needed for molecular (as opposed to solid-state) processes.By diminishing the emphasis on obtaining good lattice constants we can obtain an exchange-correlation functional that may be more useful for treating many large and complex homogeneous and enzymatic catalysts that do not require the calculations on solid-state material.
A second goal of this work is to obtain improved results for compounds containing metal atoms, including transition metal compounds with high multi-reference character, by incorporating a greater amount of representative data for metal-ligand bond energies in the training set of a density functional.A third goal of the present work is to obtain a very smooth exchange-correlation functional by enforcing an unsmoothness penalty as part of the optimization process.
Combining these three goals, we have designed a new exchange-correlation functional called gradient approximation for molecules, or GAM, and this new functional is presented here.The GAM functional is an NGA, and so it depends only on spin densities and spin density gradients.The parameters of the GAM functional are optimized against a broad set of molecular and solid-state data in a new database called Database 2015, which is also presented here.We will show that the resulting GAM functional yields good results for main group bond energies, chemical reaction barrier heights, transitionmetal bond energies, weak interaction energies between noble gas atoms, and bond lengths of diatomic molecules.
Section 2 describes the computational details.Section 3 describes Database 2015, for which complete information is given in the ESI.† Section 4 describes previously available functionals to be used for comparison.Section 5 describes the design and optimization of the GAM functional.Section 6 gives results; Section 7 provides discussion; and Section 8 summarizes the main conclusions.

Computational details
All the calculations in this paper were performed by a locally modified version 5 of the Gaussian 09 program. 6Ultrafine grids (''99,590'') are used to evaluate the exchange-correlation energies of our new GAM functional.We use the stable = opt keyword in Gaussian 09 6 to find the stationary solution to the Kohn-Sham equations by allowing symmetry breaking in the wave function if the symmetry-constrained solution is unstable.The periodic boundary condition (PBC) algorithm 7 in Gaussian 09 6 is used to calculate the lattice constants, cohesive energies, and semiconductor band gaps in our new Database 2015, which will be explained in the next section.
Besides testing the new functional on the training subset of Database 2015, we made several tests outside the training set.First we tested the new functional against subdatabases for semiconductor band gaps (SBG31) and solid-state cohesive energies (SSCE8), which are in Database 2015 but outside the training set.We also tested our functional against other data that is not in the training set.This data includes a recently published database WCCR for transition metal coordination reactions 56 (renamed here as WCCR10 for consistency with our general naming scheme), the enthalpies of binding of O 2 and N 2 to the metal-organic framework Fe 2 (dobdc), the binding of C 2 H 4 to Pd(PH 3 ) 2 , Ag 2 dimer and FeC bond dissociation energies, transition metal dimer bond distances, and the Ar 2 potential energy curve.
This journal is © the Owner Societies 2015 For the WCCR10 database we use the same basis set (def2-QZVPP) and geometries as used in the original paper; these geometries, which were optimized by functional BP86, 33,34 are provided in the ESI † of the WCCR paper. 56or calculating the binding enthalpies of O 2 and N 2 bound to Fe 2 (dobdc), we used an 88-atom cluster model of the experimental structure of Fe 2 (dobdc) containing three iron centers.The details of this cluster and rationale for its design are described in our earlier work. 8This cluster has three iron atoms, and here we studied binding at the central iron, which best represents the immediate environment around iron in the actual MOF.During optimization, the cluster of the MOF was frozen and the guest molecules (O 2 or N 2 ) were allowed to relax.The binding enthalpies were calculated using the formula given in eqn (1) of ref. 8.
The binding energy of the Pd(PH 3 ) 2 C 2 H 4 complex were computed using four basis sets.In all four basis sets, Pd atom has 18 active electrons and 28 core electrons that are replaced by an effective core potential.Basis set BS1 denotes the Stuttgart-Dresden-Dunning (SDD) basis set for Pd 9 and the cc-pVTZ basis set for P, 10 C, and H. 11 Basis set BS2 denotes the def2-TZVP basis set for Pd 12 and the cc-pVTZ basis set for P, C, and H. Basis set BS3 denotes the def2-TZVP basis set for Pd, the cc-pV(T+d)Z basis set for P, 13,14 and the cc-pVTZ basis set for C and H. Basis set BS4 denotes the def2-TZVP basis set for Pd, the maug-cc-pV(T+d)Z basis set for P, 15 the maug-cc-pVTZ basis set for C, 15,16 and the cc-pVTZ basis set for H.

Database 2015
Database 2015 is our new database for optimizing and testing density functionals.Compared to Database 2.0 that we used in previous work 26 the following changes are made: We divide the previous bond energy databases according to two types of classification: (i) whether the molecule contains only main-group nonmetal atoms or it also contains maingroup-metal atoms or transition-metal atoms; (ii) whether the molecule has singe-reference character, i.e., can be well described by a single configuration wave function, or multireference character, i.e., cannot be so described.Then we added additional data to the underpopulated classes.Accordingly we have six new subdatabases for bond energies.Theses subdatabases are as follows (their shorthand names are in parentheses, where the final number in the shorthand name of a subdatabase is the number of data): single-reference main-group-metal bond energies (SR-MGM-BE9), single-reference main-group-nonmetal bond energies (SR-MGN-BE107), single-reference transition-metal bond energies (SR-TM-BE17), multi-reference main-group-metal bond energies (MR-MGM-BE4), multi-reference main-group nonmetal bond energies (MR-MGN-BE17), multi-reference transition-metal bond energies (MR-TM-BE15).A new subdatabase called NGDWI21 has been added for noble-gas-dimer weak interactions.It comprises both homodimers and heterodimers.
The above points summarize the main changes made to our previous database, 26 called Database 2.0.A complete list of the subdatabases included in Database 2015 is given in Table 1, which also shows the number of data in each category (the inverse weight column of this table will be explained in Section 5).The database is divided into primary subdatabases, and some of the primary subdatabases are further divided into secondary subdatabases.Complete details of the new database and its layers of subdatabases, including geometries and references for the included data and also the basis sets we use for calculations on the various subdatabases, are given in the ESI.†

Functionals for comparison
We compare our results to 22 previously available exchangecorrelation functionals.Since GAM depends only on spin densities and spin density gradients, we compare our results mainly to GAs, in particular to 14 GGAs and the one previously available NGA.In a practical sense, three of the GGAs are corrected to second order in the density gradient expansion for exchange, and the other 11 are not.Altogether we compare to 20 local functionals of four types and to two hybrid functionals.The local functionals are an LSDA, namely GKSVWN5; [27][28][29] 14 GGAs, namely SOGGA, 30 PBEsol, 31 PBE, 32 BP86, 33,34 PW91, 35 BLYP, 34,36 mPWPW, 37 revPBE, 38 BPW91, 34,35 RPBE, 39 HCTH407, 40 SOGGA11, 41 OLYP, 36,42 and OreLYP; 36,42,43 an NGA, namely N12; 4 and four meta-GGAs, namely TPSS, 44 revTPSS, 45 M06-L, 46 and M11-L. 47For context we also compare to two popular hybrid functionals, namely a global-hybrid GGA, B3LYP; 42,48,49 and a range-separated hybrid GGA, HSE06. 50,51All these functionals are listed in Table 2 with the type, the percentage of Hartree-Fock exchange, the year, and the original reference.A more complete comparison of gradient approximations to more advanced functionals of the meta-GGA, meta-NGA, and hybrid type is found elsewhere 26 and will not be repeated here, where our emphasis is on gradient approximations for the four reasons stated in the introduction, so comparisons to more advanced functionals are limited here to providing context.

Design and optimization of the GAM functional
The general functional form of GAM is the same as N12, 4 which has the flexibility to approximate both exchange and correlation effects in terms of spin density r s and reduced spin density gradient x s .In order to design a good functional, we use a broad molecular and solid-state database to optimize the parameters of the functional, and we also add smoothness constraints to our optimization.We will discuss the functional form in Sections 5.1-5.3 and the optimization of the functional in Section 5.4.

Functional form
The exchange-correlation energy E xc of the GAM functional is the sum of nonseparable exchange-correlation component E NSGA nxc and an additional term that is nominally treated as a correlation energy E c .Typically one writes the first component as E x , however, we label it as E NSGA nxc to show that it is a nonseparable approximation involving both exchange and correlation.Since we optimize the functional empirically and do not enforce the factorizable form on the first term, the first term also represents part of correlation energy, and similarly the second term is not purely correlation.Both terms must also include an empirical contribution required to account for the difference of the exact electronic kinetic energy from that computed from the orbitals of the Kohn-Sham determinant.The philosophy used in designing the functional form is consistent with the statement of Tozer and Handy that ''The functionals represent exchange and correlation effects in a combined manner.Individual exchange and correlation terms cannot be isolated''. 52Our total exchange-correlation functional is where

Nonseparable exchange-correlation functional form
In eqn (2), the nonseparable energy density is written as where F x is the exchange enhancement factor, which in the present paper is defined as where r s stands for the spin density, u xs and v xs are finite variables defined by x s stands for reduced spin density gradient, for which we use the definition of Becke: 34 e UEG xs stands for the uniform electron gas energy, which is calculated by 27,28 e UEG A GGA exchange functional can be written like eqn (4) but where the enhancement factor F x depends only on the reduced a X is the percentage of nonlocal Hartree-Fock exchange.When a range is given, the first value is for small interelectronic distances, and the second value is for large interelectronic distances.Details of the functional form that joins these regions of interelectronic separation are given in the references.b GVWN5 denotes the Ga ´spa ´r approximation for exchange and the VWN5 fit to the correlation energy; this is an example of the local spin density approximation (LSDA), and it has the keyword SVWN5 in the Gaussian 09 program.Note that Kohn-Sham exchange is the same as Ga ´spa ´r exchange, but Slater exchange (not tested here) is greater by a factor of 1.5.c PW91 formally satisfies the gradient expansion for exchange to second order but only at such small values of the gradient that for practical purposes it should be grouped with functionals that do not satisfy the gradient expansion to second order.
spin density gradient x s .For an NGA we allow the enhancement factor to depend also on the spin density r s .

Additional correlation functional form
In eqn (3), the correlation functional has two parts.One is the contribution E cab from opposite spins, and the other is the contribution E css from same spins.These two contributions are defined by where b i and c i are unitless parameters to be determined, g cab and g css are unitless parameters given the same values as in N12, 4 namely, g cab = 0.006 and g css = 0.2, x avg 2 is defined as the average of x a 2 and x b 2 , and e UEG cab and e UEG css represent the correlation energy of the uniform electron gas.The uniform-gas functions are taken from the Perdew-Wang parameterization 53 and the ansatz of Stoll, which is used to separate the correlation energy into same-spin and the opposite-spin contributions. 54,55

Functional optimization
In eqn ( 5), (10), and (11) above, we see that a ij , b i , and c i are linear parameters of the functionals, which will be optimized.We do not force the uniform-electron-gas limit to hold when we optimize the functional.In order to make our functional smooth, smoothness constraints are added to the optimization, which will be explained in detail in the last paragraph of this section.The values of m, m 0 , n, and n 0 are chosen as 3, 3, 4, and 4 respectively.We found that the performance of the functional is not significantly improved by increasing these values, which shows that one cannot obtain improved functionals simply by adding more parameters.Therefore, in order to design good density functionals we must pay more attention to the mathematical form of the functional and the diversity of the database we are optimizing against, instead of concentrating on the number of parameters.
We optimize our functional against 27 primary databases, including 24 molecular energy databases, two molecular structure databases, and one solid-state structure database.We optimize the GAM functional self-consistently by minimizing the following unfitness function: where R n is the root mean squared error of database n, I n is the inverse weight of database n, and the product of l and (a + b + c) is the smoothness constraint, which is explained by The purpose of this constraint is to ensure that the density functional is a reasonably smooth function of the spin densities and their gradients.We varied the value of l from 0.001 to 0.1, where the range is selected such that l(a + b + c) has about the same magnitude as R n =I n .We made fits with various values of l, and we monitored the smoothness of the resulting exchange-correlation functionals by plotting them, by examining the magnitudes of the linear coefficients of the exchangecorrelation functional (they should not be too large in magnitude or having severely oscillating signs), and by checking whether there is any difficulty in achieving self-consistent-field convergence on difficult cases (we had made a list of cases where previous functionals sometimes showed SCF convergence difficulties).After balancing the performance of the functional and the smoothness of the enhancement factor (as judged by the three criteria just mentioned), we finally chose l to be 0.001, which gives what we judged to be the best combination of overall accuracy, convergence, and smoothness of the exchangecorrelation functional.
In order to design a functional with good across-the-board performance, we include various molecular and solid-state properties in our training set; these properties include maingroup bond energies, transition metal bond energies, transition metal atomic excitation energies, barrier heights, ionization potentials, proton affinities, electron affinities, lattice constants, etc.In Table 1, the inverse weight of each primary database is given.The smaller the inverse weight is, the more emphasis we put on that primary database.The inverse weights were chosen as follows: first we calculated the mean unsigned errors (MUEs) of 80 exchange-correlation functionals (previously published functionals developed in many different groups) for all the molecular subdatabases in Database 2015; this shows how well previous exchange-correlation functionals typically perform for each kind of data.The average of these MUEs for a given subdatabase were used as our initial inverse weights.Then we modified the inverse weights iteratively to improve performance on the various subdatabases where we wished to reduce the error.Our goal in this process was to obtain good across the board performance for as many subdatabases as possible, not to simply reduce the overall mean unsigned error.
This journal is © the Owner Societies 2015 Whereas the N12 4 functional involved 20 optimized linear coefficients and the constraint that it reduced to PBEsol at low density, the new GAM functional involves optimizing 26 linear coefficients in eqn ( 5), (10), and (11) with no constraints.We use the same values as N12 for the nonlinear parameters o xs , g xs , g cab , and g css .A key element in the optimization is the choice of weights.We do not choose them to minimize the overall error but rather to try to get small errors across the board, i.e., relatively small errors for each of the subdatabases, to the greatest extent possible.The final choice of weights was determined after considerable trial and error and is a subjective decision that cannot be justified by any numerical argument.
Table 3 lists the values for the optimized and inherited parameters of the GAM functional.

Tables of results
In Tables 4 and 5 we compare the performance of the new functional to that of 22 existing functionals for molecular energetic data, and in Table 6 we do the same for molecular bond distances.
Table 7 gives the performance for solid-state databases, but since B3LYP calculations with periodic boundary conditions are very expensive, we only compare 21 density functionals for the solid-state lattice constant and energetic data of Table 7. Table 8 compares the performance of GAM to that of eight density functionals for the WCCR10 database of Weymuth et al. 56 Table 9 is a test for the binding of dioxygen and dinitrogen to Fe 2 (dobdc), which is also called Fe-MOF-74, where we compare to experiments of Bloch et al. 57 Table 10 presents results for the binding of ethylene to Pd(PH 3 ) 2 , where we compare the results of GAM to the best estimate computed using BCCD(T) 58 in our earlier work. 67Table 11 presents results for the bond distance of homonuclear transition metal dimers, where we compare the results of GAM and N12 with 5 functionals in a recent paper. 80

Molecular energy database
Tables 4 and 5 show the mean unsigned error (MUE) for molecular energy database ME417 and its subdatabases.Note that we always compute MUEs without weighting the data; it is a straight average over the absolute deviations from the reference data of the database.In order to compare the properties that are especially important for molecular catalysis, we also combine some of existing subdatabases to form three larger subdatabases, in particular main-group bond energy (MGBE137), which includes the SR-MGM-BE9, SR-MGN-BE107, MR-MGM-BE4, and MR-MGN-BE17 subdatabases, transition metal bond energy (TMBE32), which includes the SR-TM-BE17, MR-TM-BE13, and MR-TMD-BE2 subdatabases; and barrier heights (BH76), which includes the HTBH38/08 and NHTBH38/08 subdatabases.

Molecular structure database
Table 6 shows the mean unsigned error for DGL6 and DGH4 subdatabases.The last column shows the mean unsigned error for MS10, which is the overall mean unsigned error of these two subdatabases.

Solid-state databases
Table 7 shows the mean unsigned errors for solid-state databases, which include LC17, SBG31, and SSCE8.Lattice constants are related to nearest neighbor distances (NNDs), but the ratio of the lattice constant to the nearest-neighbor distance depends on the crystal structure.For our larger lattice constant database, SSS47, 26 we calculated an average value for this ratio of 2.15, and we use this as a typical conversion factor for discussion purposes.Therefore we also report the results for LC17 by dividing by 2.15 so that the reader can more easily compare the errors to the errors in molecular bond distances.These results are labeled NND and are discussed as mean unsigned errors in nearest neighbor distances, but the reader should keep in mind that a slightly different result would be obtained if we first converted to NND and then averaged.The rationale of having both columns in Table 7 is that the LC17 column can be directly compared to physics literature papers that report average errors in lattice constants, while the NND column allows a more physical comparison to average errors in molecular bond lengths.

WCCR10 database
We show mean unsigned errors for the WCCR10 database of transition metal coordination reactions in Table 8.The mean unsigned error of GAM against WCCR10 is 6.60 kcal mol À1 .This is larger than the mean unsigned error of GAM against TMBE32 subdatabase, i.e. 6.03 kcal mol À1 , but still reasonable since it is the second best among the functionals tested in Table 8.

Separation of O 2 and N 2 on Fe 2 (dobdc)
The performance of the newly designed functional, GAM was also tested for its ability to predict the separation of a mixture The TMBE32 database consists of SR-TM-BE17, MR-TM-BE13, and MR-TMD-BE2. c The BH76 database consists of HTBH38/08 and NHTBH38/08.d The ME417 database consists all the 24 subdatabases above and the ME400xAE consists all the subdatabases except AE17.
This journal is © the Owner Societies 2015 The functionals are listed in the same order as in Tables 4 and 5. a The functionals are listed in the same order as in Tables 4 and 5.
b The values in this column are obtained by dividing the previous column by 2.15 (a standard factor determined in previous work -see text) so that the results may be compared more physically to errors in molecular bond lengths. of O 2 and N 2 on a metal-organic framework (MOF), in particular Fe 2 (dobdc).The binding enthalpies for O 2 and N 2 bound to Fe 2 (dobdc) were computed using the GAM exchange-correlation functional with the def2-TZVP basis set, and then compared to experimental values 57 of isosteric heat of adsorption.The values for the Fe-O 2 and Fe-N 2 interacting systems in their ground spin states computed at 201 K were 10.8 and 3.9 kcal mol À1 , respectively, while the corresponding experimental values reported in ref. 57 are 9.8 kcal mol À1 and 8.4 kcal mol À1 , respectively.Later, a more accurate experimental adsorption enthalpy for N 2 bound to Fe 2 (dobdc) under different experimental conditions and with different procedures was reported to be 5.5 kcal mol À1 . 59able 9 also presents the binding enthalpies of higher energy spin states of the Fe-O 2 interacting system, which are predictions for which no experimental data is available (these calculations were necessary to be sure that the reported binding energy corresponds to lowest-energy spin state of the system).

Binding energy of Pd
The binding of C 2 H 4 to Pd(PH 3 ) 2 was computed using GAM and compared the best estimate of binding energy performed in our earlier work. 67The binding energies calculated using various basis sets are reported in Table 10.

Bond length of homonuclear transition metal dimer
1][22][23] Table 11 shows the bond length and mean unsigned error calculated by each functional.
7 Other results and discussion

Convergence
In order to design a smooth functional, we add a smoothness constraint to the parameter optimization of the GAM functional.The linear coefficients optimized for GAM all have magnitudes in the range 0.1-23, which is a reasonably narrow range so there is no excessive cancellation between terms.The convergence of the new functional has been tested against our common Database 2015.In the common Database 2015, there are 483 data including ME417, MS10, SBG31, SSCE8, and LC17.If we also count the fragments that are used to calculate these data, there are more than one thousand data calculated by the GAM functional.Among the over-a-thousand data, only FeCl shows some SCF convergence issues; all other calculations converged without any problems.The smoothness of the exchange-correlation enhancement factor of the GAM functional has also been examined in plots, and it will be discussed in Section 7.3.

Performance of the GAM functional
Tables 4 and 5 show that the GAM functional gives especially good results for main group bond energies, transition metal bond energies, reaction barrier heights, molecular structures, and noble gas weak interactions.Furthermore, the GAM functional provides reasonably good results for the test sets including semiconductor band gaps, solid-state cohesive energies, and transition metal coordination reactions.Table 4 shows that among LSDA, all the GGAs, and the previous NGA, the new functional GAM gives the smallest overall mean unsigned error for the entire molecular energy database ME417; the mean unsigned error is only 4.51 kcal mol À1 .We also show the overall error of ME400xAE, which is the average error for the molecular energy database when we exclude absolute atomic energies, and in this case too, the GAM functional gives the smallest error among GGAs, LSDA, and NGA.We emphasize that we could reduce these total errors more, if that were our goal, but that is not our goal.Our goal is rather to obtain good performance across a broad range of databases.In order to have a functional that is especially good for studying molecular catalysis, the functional should be good for main-group bond energy (MGBE137), which includes the SR-MGM-BE9, SR-MGN-BE107, MR-MGM-BE4, and MR-MGN-BE17 subdatabases, for transition metal bond energy (TMBE32), which includes the SR-TM-BE17, MR-TM-BE13, and MR-TMD-BE2 subdatabases, for barrier heights (BH76), which includes the HTBH38/08 and NHTBH38/08 subdatabases, and for molecular structure (MS10), which includes DGL6 and DGH4 subdatabases.In Tables 4 and 5 we calculate the average error for each of these four categories by averaging the errors from each subdatabase.Among LSDA and all GGAs and NGAs, the GAM functional ranked the best for the MGBE137, TMBE32, and BH76 subdatabases.If we consider all the functionals in Tables 4 and 5, the GAM functional ranks the second best for TMBE32 subdatabase, for which M06-L is the best with an error 0.48 kcal mol À1 smaller than the GAM; the GAM functional ranks the second best for the a The various basis sets used are: BS1 = SDD (Pd), cc-pVTZ (P, C, H); BS2 = def2-TZVP (Pd), cc-pVTZ (P, C, H); BS3 = def2-TZVP (Pd), cc-pV(T+d)Z (P), cc-pVTZ (C, H); BS4 = def2-TZVP (Pd), maug-cc-pV(T+d)Z (P), maug-cc-pVTZ (C), cc-pVTZ (H).b The best estimate was calculated in an earlier work using BCCD(T) and is described in ref. 67.MGBE137 subdatabase, for which M06-L is the best with an error 0.28 kcal mol À1 smaller than the GAM; and the GAM functional ranks the fifth best for BH76 subdatabase, for which M11-L is the best followed by M06-L, B3LYP, and HSE06.We note that M06-L is a meta functional, and therefore it should be better than a simpler gradient approximation, but we gave several reasons for optimizing a gradient approximation in the introduction.
In addition to the databases mentioned above, the GAM functional also provides good results for 3d transition metal atomic excitation energies, which are very hard for most available density functionals, but which we have recently shown 60 can be very important for understanding metal-metal bonding.The GAM functional ranks the fifth best for the 3dAEE7 subdatabase, behind M06-L, B3LYP, PBE, and RPBE.
Next we consider noble-gas weak interactions.From Tables 4  and 5 we can see that all the functionals tested except GAM give a mean unsigned error larger than 0.081 kcal mol À1 for the NGDWI21 subdatabase, for which GAM only gives 0.019 kcal mol À1 .The average value of all the noble gas weak interaction energies in our database is 0.160 kcal mol À1 , which means that most functionals give an average error that is larger than 50% of the average of the reference values.The GAM functional gives the best results for NCCE30 subdatabase as compared to all tested GGAs and N12.
The GAM functional also provides the second best results for MR-TMD-BE2 (Cr 2 and V 2 , which are known to be very hard cases for density functional theory) among all functionals tested.
Table 6 shows that the relative performance of GAM for molecular structures is not quite as good as for energies.The GAM functional ranks the 13th for MS10 subdatabase with an MUE of 0.018 Å, which is 0.002 Å larger than the average MUE of all functionals tested in Table 6.However, a more fair comparison in this case is to compare to the 12 GGAs excluding PBEsol and SOGGA (we exclude PBEsol and SOGGA at this point since their design is understood to make them better for structures than for energies, and so we do not consider them to be general-purpose functionals).As compared to the remaining group of 12 GGAs, only HCTH407 does better for DGL6 and only four of the 12 do better for MS10.The less than stellar performance of GAM on MS10 is primarily due to a large overestimation of the bond length of Ag dimer; this bond length behaves differently than other bond lengths in MS10, and success for this bond length is highly correlated to performance on lattice constants, which we downplayed.This downplay is evidenced in Table 7, which shows that the GAM functional does not give good results for the solid-state lattice constant database with a mean unsigned error of 0.046 Å for the quantities we nominally call nearest neighbor distances (NND -see Section 6.3); this error is 0.010 Å larger than the average mean unsigned error for NND.As discussed in the introduction, this results from a strategic decision to emphasize molecular energies over lattice constants in the creation of GAM.The ''M'' (for ''molecules'') at the end of GAM is primarily to indicate our awareness that we still do not have a universally good functional, which is so far unattainable by any functional containing only density and density gradient ingredients.Nevertheless, despite not being universal, the performance of the new functional developed here is very good if we consider molecules rather than solid-state lattice constants.
Next we turn to data not used for training.Table 7 shows that the GAM functional also shows reasonably good results for the solid-state energies databases.Among the 17 LSDA, GGAs, and NGAs, the GAM functional ranks the sixth best for the SBG31 database and fifth best for the SSCE8 database.These databases were not used for training.
In Table 8, the GAM functional ranks the second best among all functionals being tested, where the functionals tested are those chosen by the previous 56 authors.The WCCR10 database includes ten transition metal coordination reactions.The molecules involved in these reactions are very large and very different from the training sets in Database 2015.The performance against these large molecules is slightly worse than that for the transition metal molecules in our training set, but within a reasonable range.
Table 9 presents the results for the performance of GAM on MOFs.We find that GAM gives good results when compared to experiments for the separation of O 2 and N 2 on Fe 2 (dobdc), with a magnitude of the deviation from experimental adsorption enthalpies of 1.0 kcal mol À1 for O 2 and 1.6 kcal mol À1 for N 2 .It should be noted here that our training set has no data on MOFs  or any other type of nanoporous materials.This average deviation from the latest experimental values is under 3 kcal mol À1 and is within experimental error.This indicates that the GAM functional shows good agreement with experimental data that are not used for training.
In Table 10, results for the binding of C 2 H 4 to Pd(PH 3 ) 2 are presented.This datum is outside the training set.This is a difficult case for functionals; for example, BLYP gives a binding energy of 10.2 kcal mol À1 as compared to the best estimate of 17.6 kcal mol À1 .Table 10 shows good stability with respect to This journal is © the Owner Societies 2015 changes in the basis set and that the GAM functional deviates from the best estimate by 6.5 kcal mol À1 with the largest basis set used.This is comparable to the 6.3 kcal mol À1 mean unsigned error for single-reference transition metal bond energies of molecules in the training set, and therefore it is an example where we obtain comparable performance inside and outside of the training set.
A very recent paper, which we considered only after our training set weights were final, reported bond distance for eight transition metal dimers, only one of which (Ag 2 ) is in our training set.We therefore use the bond distances of the seven others as a test against data quite different from that used for training.These seven dimers, Cu 2 , Au 2 , Ni 2 , Pd 2 , Pt 2 , Ir 2 , and Os 2 , include two 3d metals, one 4d metals, and four 5d metals.(No 5d data were used for training.)The GAM functional gives the third best results among all the functionals tested in Table 11, with an MUE of 0.05 Å; the only functionals that do better are LSDA, which is much better for bond lengths than for molecular energies, and N12, the only previous NGA.This is very encouraging performance well outside the training set.
We also tested our new functional against the experimental bond dissociation energies of Ag 2 and FeC, which are 38.0 kcal mol À1 and 88.32 kcal mol À1 respectively. 61,62The GAM functional predicts these bond dissociation energies to be 39.21 kcal mol À1 and 86.49kcal mol À1 .Li et al. 62 have tested the bond dissociation energy of FeC with various functionals, and in Table 12 we add our new result to their comparison.The results in Table 12 show that the GAM functional is the second best among all 18 functionals being tested, and that many of the previous functionals have large errors for this difficult case.
Recent studies pointed out that some density functionals give unstable results for large basis sets. 63Fig. 1 shows the potential energy curve of Ar 2 with our new GAM functional and the aug-cc-pVQZ and aug-cc-pV6Z basis sets.Fig. 1 shows that our results are very close to the reference values 64 and there is no slow convergence issue with respect to the basis sets.Moreover, the excellent agreement with the reference plot shows that the GAM functional provides good results for noble gas weak interactions.This is consistent with Tables 4 and 5 showing that the GAM functional is the best for the NGDWI21 subdatabase among all the functionals tested in the present paper.

Exchange-correlation enhancement factor of the GAM functional
The enhancement factor is defined by the following equations: Ð dr e UEG x (r)F xc (r s , s) s ¼ jrrj where E xc is the total exchange-correlation energy, e UEG x is the exchange energy density of a uniform electron gas, r stands for the total density, s is the unitless reduced density gradient, and r s is the Wigner-Seitz radius.For illustrating the enhancement factor in this section, we only consider the spin-unpolarized cases, which means that r = 2r a = 2r b .The exchange-correlation enhancement factor of the GAM functional is plotted in Fig. 3.We choose four values of r to plot the enhancement factor F xc , and these values correspond to r s = 0.5, 2.5, 4.5, and 6.5 in atomic units.The region of s is chosen from 0 to 3, which is the key range for real systems.The enhancement factor for all 14 GGAs, for the previous NGA, namely N12, and for GAM are shown in Fig. 2 and 3.
A key design element of the NGA functional form is that, unlike GGAs, we do not attempt to separately fit exchange and correlation.Therefore, unlike a GGA, we do not have a pureexchange enhancement factor that depends only on s.However, Fig. 2 and 3 show that after we add correlation to exchange, the extent of dependence on r for closed-shell systems is not qualitatively different in GAM and in the GGAs.

g
xs and o xs are unitless parameters taken to have the same values as the ones in N12, 4 namely g xs = 0.004 and o xs = 2.5, and the a ij are unitless parameters to be determined.Since both r s and x s range over [0,N), the dependent variables u xs and v xs range over [0,1].

Fig. 1
Fig.1Ar-Ar potential curve, the bonding energies are calculated with GAM/aug-cc-pVQZ and GAM/aug-cc-pV6Z level of theory.The reference is from the Tang-Toennies model.

Table 1
Databases in included in Database 2015 a,b a Databases 1-27 were used with various inverse weights in training, and databases 1-29 constitute Database 2015.Database 30 is from T.Weymuth et al. (ref.56), and -like databases 28 and 29 -it was used only for testing.bIn the name of a database or subdatabase, the number at the end of the name or before the solidus is the number of data.For example, ME417, SR-MGM-BE9, IsoL6/11, and DGH4 contain respectively 417, 9, 6, and 4 data.cInverse weights with units of kcal mol À1 per bond for databases 1-7, kcal mol À1 for databases 8-24, and Å for databases 25-27.dTM denotes transition metal.eNA denotes not applicable.This journal is © the Owner Societies 2015

Table 2
Exchange-correlation functionals tested in this paper

Table 5
MUE (kcal mol À1 ) for the molecular energy database and its subdatabases: GAM compared to meta and hybrid functionals a The MGBE137, TMBE32, BH76, ME417, and ME400xAE notations are explained in footnotes to

Table 4 .
Table 6 MUE (kcal mol À1 ) for the molecular structure database and its subdatabases a The MS10 database consists of DGL6 and DGH4 subdatabases.

Table 7
Mean unsigned errors for lattice constants and nearest neighbor distances in Å, band gaps in eV, and cohesive energies in eV per atom

Table 8
Mean unsigned errors for the WCCR10 database in kcal mol À1 a a The GAM results are from the present calculations, but all other results in this table are from ref.56.

Table 9
Binding enthalpies (kcal mol À1 ) of O 2 and N 2 bound to the 88-atom cluster model of Fe 2 (dobdc) calculated using GAM a

Table 10
Binding energies (kcal mol À1 ) of C 2 H 4 bound to Pd(PH 3 ) 2 calculated using GAM and various basis sets

Table 11
Homonuclear transition metal dimers: equilibrium bond lengths (Å) and mean unsigned errors as compared to experiment The experimental bond length is taken from ref.80. a

Table 12
Errors for bond dissociation energy (kcal mol À1 ) of FeC