Performance of GFN1-xTB for periodic optimization of metal organic frameworks.

Tight-binding approaches bridge the gap between force field methods and Density Functional Theory (DFT). Density Functional Tight Binding (DFTB) has been employed for a wide range of systems including proteins, clays and 2D and 3D materials. DFTB is 2-3 orders of magnitude faster than DFT, allowing calculations containing up to ca. 5000 atoms. The efficiency of DFTB comes via pre-computed integrals, which are parameterized for each pair of atoms, and the requirement for this parameterization has previously prevented widespread use of DFTB for Metal-Organic Frameworks. The GFN-xTB (Geometries, Frequencies, and Non-covalent interactions Tight Binding) method provides parameters for elements up to Z ≤ 86. We have therefore employed GFN-xTB to periodic optimizations of the Computation Ready Experimental (CoRE) database of MOF structures. We find that 75% of all cell parameters remain within 5% of the reference (experimental) value and that bonds containing metal atoms are typically well conserved with a mean average deviation of 0.187 Å. Therefore GFN-xTB provides the ability to calculate MOF structures more accurately than force fields, and ca. 2 orders of magnitude faster than DFT. We therefore propose that GFN-xTB is a suitable method for screening of hypothetical MOFs (Z ≤ 86), with the advantage of accurate binding energies for adsorption applications.


Introduction
Interest in Metal-Organic Frameworks (MOFs) has been increasing since they were first synthesized in the 1990s.MOFs are a group of nanoporous materials made by combinations of building blocks -metal nodes and organic linkers -assembled into a specific network. 1 The modular nature of MOFs creates an effectively infinite number of possible structures, of which several tens of thousands have been synthesized.MOFs have been synthesized from all corners of the periodic table, giving rise to a wide chemical and physical diversity of MOF structures and properties.The Computation-Ready Experimental (CoRE) 2,3 database has compiled more than 12000 crystal structures of MOFs and the Cambridge MOF subset 4 currently stands at 88,000 structures, of which ca. 8,000 are porous.While many interesting MOFs and MOF applications have been discovered serendipitously, it is normally desired to design a MOF that possesses some given property.][7][8][9][10] One standard procedure is a combinatorial enumeration followed by molecular simulation on every hypothetical structure. 11,12Inevitably, the important aspect determining the success of this approach is in the accuracy and efficiency of the molecular simulation methods.
Calculation methods that are used in the molecular simulation can be grouped into quantum mechanics or classical simulation methods.The level of theory / computational method required depends on the property of interest.For properties that require electronic structure and information about the exact binding sites, quantum mechanical methods are used.However, due to the high computational cost of quantum mechanical methods, the size of the system to model needs to be carefully chosen.Quantum mechanics based on wave function theory can only be applied to small cluster models of MOFs capturing the region of interest and limited to a few tens of atoms. 13Consequently, small cluster models are unable to describe long range dispersion interactions, In addition, cutting the MOF to create a feasible cluster model requires capping the cut points with e.g.hydrogen atoms or methyl groups, which introduces electronic effects not present in the parent periodic structure.5][16][17][18][19] While possible on handfuls of chosen MOFs, the expense of these quantum mechanical approaches means they cannot support screening large subsets of MOFs.
Classical mechanics such as Force Fields (FFs) could be a shortcut.Force fields are employed for properties that don't need information about electronic structure and can be defined by the relevant conditions of temperature or pressure.In these cases, force fields offer a welcome reduction in cost.Existing force fields which have been routinely applied in MOF simulations are either the general force fields -such as UFF 20 , TraPPE 21 and DREIDING 22 -or specific force fields designed for a particular group of MOFs 23,24 , such as MOF-FF 25 and QuickFF [26][27][28][29] .Specific force fields are typically more accurate than general force fields, but are unsuitable for screening studies outside their parameterization.The general force field approaches are good at handling the chemical diversity of MOFs for reproducing experimental results such as crystal structures, bulk moduli, sublimation energies and fluid properties 30 .However, they are not ideal for explaining structural transformation over external stimuli and for explaining donor-acceptor interactions in open metal site materials. 31Another problem is general force fields fail to model adsorption isotherms at the lower pressure region. 32nce the diverse inorganic fragments that comprise MOFs cannot be replicated by general force fields, a common approach is to reparametrize these force fields.The initial development of force fields for MOFs were force fields designed to accurately describe individual popular MOFs, such as extended MM3 for MOF-5, 26 Cu paddlewheel−based MOFs 27 and for the Flexible Metal-Organic Framework MIL-53(Al). 23Subsequently, force fields have been developed to accommodate a group of MOFs, BTW-FF 23 for 6 popular MOFs associated with Cu, Zn, and Zr metal nodes; Becker at al 29 optimized a polarizable force field for CO2 and CH4 Adsorption in M-MOF-74 (M = Co, Cr, Cu, Fe, Mg, Mn, Ni, Ti, V, and Zn).QuickFF 28 is a program that automates the derivation of force fields for MOFs.UFF4MOF 33,34 took a different approach, aiming for universality rather than transferability and reached over 99% coverage of the 2014 CoRE database. 3other approach frequently used is combining potentials from multiple generic force fields.Hamon et al. modelled adsorption of H2S into MIL MOFs by combining UFF for the inorganic part of the framework and the DREIDING inter-atomic potential for the organic part. 35The model matches the experimental adsorption isotherms quite well in the whole pressure range, confirming structural transitions occur in some pressure regions.Vandenbrande 24 et al. reviewed five different force fields, three combined generic force-field (UFF/TraPPE, Drieding-UFF/TraPPE, MM3-MBIS) and two ab initio derived force fields (SAPTFF and MEDFF), for methane adsorption in Zr-based MOFs (UiO-66, UiO-67, DUT-52, NU-1000, and MOF808), 24 found that UFF/TraPPE gives an acceptable agreement with the experiment in the UiO-66 framework for pressure between 30 and 80 bar.Meanwhile, other combined generic force fields do not accurately reproduce single molecule adsorption energies.The two ab-initio derived force fields gave a remarkable accuracy of the individual adsorption energies.
Due to the motivation of reproducing experimental data and the requirement of having enough experimental data to parameterize and test, most force field development to date has mostly focused on fitting to a limited number of popular MOFs.The lack of transferability and the uncertain level of accuracy is the main drawback of force fields.This especially hampers screening efforts, particularly of hypothetical framework materials, and especially for MOFs using less frequently employed metals such as the lanthanides and actinides.
An intermediate method, Density Functional Tight Binding (DFTB) comes as a compromise between accuracy and efficiency.DFTB is a semi empirical quantum mechanical method, and approximates DFT by using pre-calculated parameters, a minimal basis, and including only nearest-neighbour interactions.DFTB has been extensively used on large biomolecules with accuracy comparable to DFT but is at least 2 orders of magnitude faster, not been used for MOFs because of the difficulty in parameterizationneeding to develop parameters for each pair of elements leads to ~100 2 required parameters. 36,37onsequently, while DFTB has been used extensively for Covalent Organic Frameworks, [38][39][40][41] parameters have not been available for most MOFs.
Addressing this parametrization problem, Grimme et al. developed a semiempirical TB method, GFN-xTB 42 , designed to produce reasonable Geometries, Frequencies, and Non-covalent interactions for diverse chemical system consisting of elements from the periodic table, Z≤86.GFN-xTB is targeted to facilitate systems more than 1000 atoms; it has been successfully applied to proteins with 3000 atoms.The open-source release of GFN-xTB 43 only recently added the capability of undertaking periodic optimizations, and previous benchmarking of GFN-xTB for MOFs has employed capped cut-out structures. 44The implementation of GFN-xTB in the AMS package 45 and DFTB+ 46 does permit periodic optimizations.In this paper, we evaluated the performance of GFN-xTB as implemented in AMS to optimized geometry structures of MOF from the 2014 and 2019 CoRE Databases.

Computational Details
Geometry optimization was performed for each structure in the CoRE 2014 3 and 2019 2 database (version 1.1.0).The computation is carried out using GFN-xTB in Amsterdam Modelling Suite (AMS) by Software for Chemistry and Materials (SCM). 45Two separate geometry optimizations were undertaken on each structure, the first optimization only allows the atomic position to move, while the second optimization relaxed both the lattice and atomic positions.
Most of the 9895 CoRE structures are amenable to geometry optimization in this way, there are some exceptions to the simple protocol above, as follows: 1. Z ≤ 86.GFN The MOF structure may be fundamentally not stable upon solvent removal -in this case, structural collapse in the calculation mirrors the structural collapse in experiment; In addition, structural collapse may be observed as an artefact of the structure curation process. 2,3In typical MOF syntheses, metal salt and ligand solutions are mixed together in polar, high boiling point solvents.Experimentally reported metal organic frameworks may contain solvent molecules in their pores, missing charge balancing ions (CBIs) or missing hydrogen atoms and overlapping atoms.before being imported into the CoRE database, curation is performed, however the solvent is removed with an imperfect method.As a consequence, some of the solvent still remains in the pore and/or an essential part of the MOF structure itself is accidently removed.This has been observed by other authors employing the CoRE database. 47These structures are excluded from the analysis.(478 structures) 5. Structures that partially converged.Some structures converged (change in energy < 1e-02 Hartree), but still had residual gradients larger than the default criterion of 1e-05 Hartree.These structures were manually examined to confirm convergence and extract the structure where the energy had converged.(173 structures) Refcodes for all structures in each case above are provided in the supplementary information.
The textural properties, such as gravimetric surface area and volumetric surface area, are calculated before and after optimization.They are calculated using zeo++ 48 , using the high accuracy (ha) settings and using probe molecules with diameter 1.86Å representing N2 and with a trial number of 2000.

Lattice Parameters
From the successful optimizations, 74.51% of all cell parameters are within 5% of the reference value, 87.40% are within 10% and 95.13% are within 20% as shown in Figure 1.As the lattice optimizations allowed complete relaxation of cell vectors and angles, one source of large errors in lattice parameters is breathing MOFs.Breathing in MOFs is a behaviour of reversible flexible framework occurs upon guest adsorption, temperature, pressure or other stimuli, which allows a large change in the unit cell. 48For example, Refcode FERWAC_clean 49 begins with a=9.97 Å, b=15.74Å, c=15.74Å, but optimized to a narrow pore structure with a=7.80Å b=15.31Å,c=15.31Å.Similar effects are observed in e.g.refcode GUSNEN01_SL 50 , QOVWOO_SL 51 , QUQGAL_clean 52 , QOSJIT_clean 53 , FOFCOU_clean 54 , QOSJOZ_clean 53 , QOVWOO01_clean 55 , CEFDAU_clean 56 and SABWAU_SL 57 .Breathing MOFs are responsible in some structures with both +20 to +30 and -20 to -40 deviations in lattice parameters (e.g. the a parameter of FERWAC_clean 49 changes from 9.97Å to 7.80Å after optimization).From the successful optimizations, 74.51% of all cell parameters are within 5% of the reference value, 87.40% are within 10% and 95.13% are within 20%.As the lattice optimizations allowed complete relaxation of cell vectors and angles, one source of large errors in lattice parameters is breathing MOFs.Breathing in MOFs is a behaviour of reversible flexible framework occurs upon guest adsorption, temperature, pressure or other stimuli, which allows a large change in the unit cell. 49For example, Refcode FERWAC_clean 50 begins with a=9.97 Å, b=15.74Å, c=15.74Å, but optimized to a narrow pore structure with a=7.80Å b=15.31Å,c=15.31Å.Similar effects are observed in e.g.refcode GUSNEN01_SL 51 , QOVWOO_SL 52 , QUQGAL_clean 53 , QOSJIT_clean 54 , FOFCOU_clean 55 , QOSJOZ_clean 54 , QOVWOO01_clean 56 , CEFDAU_clean 57 and SABWAU_SL 58 .Breathing MOFs are responsible in some structures with both +20 to +30 and -20 to -40 deviations in lattice parameters (e.g. the a parameter of FERWAC_clean 50 changes from 9.97Å to 7.80Å after optimization).
Another source of large negative errors is structures which were starting to collapse, when the atoms move closer to other atoms they might bind to other atoms and form a different connectivity.While breathing MOFs is indicated by a narrowing in one side of cell parameter, collapsing structures shows a large negative deviation in all cell parameters.As an example, refcode ACUBAB_clean 59 which begins with a=8.88Å, b=8.88 Å, c=23.78Å and optimized to a=7.43Å, b=7.68Å, c=15.50Å.This happens mostly to structures with linkers that are relatively small compared to the size of metal node (example refcode ACUBAB_clean and C6DT02320G_c6dt02320g2_clean).
The large positive deviations arise from structures with partial occupancy cif files.Automatic conversion of cif files (which permit fractional occupancy) to input files which do not permit fractional occupancy, sometimes produces erroneous output, where parts of structures are duplicated.While a lot of partial occupancy cif files could be detected and have been removed from the evaluation, some still remained.The identification of partial occupancy files is done by calculating the atom-atom distances with threshold closer than 1 Å with neither atom being a hydrogen atom.However, for some structures, multiple possible atom positions with partial occupancy are located at a distance greater than 1Å.One example is refcode WUXYUL_clean 60 .

Atomic position
RMSD of atomic position is calculated to assess the quality of optimized structures.To calculate the RMSD atomic position of lattice optimized structures, the unit cell is scaled back to the original cell parameters.The average RMSD for all atoms are 0.489Å when atom positions are optimized with the lattice fixed at its experimental value and 0.617Å when the lattice is simultaneously optimized as shown in Table 1.Oxygens attached to metal atoms have slightly higher RMSD in both cases, but the positions of the metal atoms themselves are slightly better conserved with RMSDs of 0.439Å and 0.560Å for fixed and lattice-optimized structures respectively.Atoms in the organic linker have a higher deviation than the atoms in metal node since the organic linkers have more flexibility and degrees of rotational freedom around the inorganic building unit.

Atom bonding
As simple consideration of atom positions fails to account for linker flexibility and rotation, a more robust consideration of error in the calculated structures employs bond lengths, the distribution and mean deviations w.r.t. the reference CoRE structures are shown in Figure 3 and Table 2.The analysis is separated into three groups; the green line in figure 3 features metal containing bonds with average bond length deviation 0.120Å with the lattice fixed and 0.175Å with lattice optimization.The orange line represents bonds without a metal atom and have average deviation 0.093Å and 0.150Å respectively.Finally, all bonds altogether is shown by the blue line and exhibits an average deviation of 0.187Å and 0.236Å for fixed and lattice-optimized structures respectively.2. Mean RMSD of GFN-xTB optimized bond lengths with respect to the CoRE structures.
The sharp peaks centred around 0.063Å in Figure 3, show that typical organic bonds are very well replicated.Metal containing bonds are also well replicated, though to a lesser extent, with 76.63% of M-M bonds being within 0.250Å of the reference crystal structure value.The cases where optimized M-M bonds are outside this threshold involve a change of atomic connectivity during the optimization, (b) (a) typically caused by the removal of structural ligands.For example, the structure AGARUW 61 is a La 3+ containing MOF, where 1D chains of La 3+ ions are deca-coordinated by a mixture of coordinating and chelating carbonato ligands and water.In the cleaned structure in the CoRE database (refcode AGARUW_clean 61 ), the water ligands are removed, resulting in a significant rearrangement of the coordination environment of each La 3+ ion and a consequent reduction in one of the La-La distances from 4.923Å to 2.874Å.Most force fields developed specially for MOFs have placed special attention on ensuring the correct geometries of common sets of inorganic building units.We checked the metal clusters used in popular MOFs such as M2O paddlewheels found in HKUST-1, trimeric oxo-centered M3O in MIL-100 series, M4O octahedron in MOF-5, and finally the 12-connected M6O cuboctahedron in UiO-66.Very good results are shown for all metal nodes, the deviation of optimized structure is presented in Figure 4.The two key distances in building blocks that affect the lattice parameters of the optimized structure are the M-O and MM bond lengths, which in a layer-pillar MOF, affect the a, b and the c lattice dimensions respectively.Cu-based paddlewheel is highly conserved, with Cu-Cu bond length deviating only -0.030Å from the experimental value.Other M2O paddlewheels, Ni2O and Zn2O exhibit a higher deviation, contracting 0.3Å and 0.15Å in bond length between the two metals.For all paddlewheels, the M-O bonds are very well conserved with all deviations less than ±0.1Å.Structures for Ni2O and Zn2O paddlewheels are supplied in the supporting information.
Both M3O and M4O experimental structures are also reproduced very accurately shown in figure 4b and  4c respectively.We examined In, Fe, and Al-based trimeric oxo-centered metal nodes, In-In have a deviation of -0.087Å, Fe-Fe -0.163Å (refcode HAKSIY 66 ) and Al-Al -0.173Å (refcode JALCAD 67 ).Again, the M-O bonds are very conserved with all deviations less than ±0.1Å.Zn-based octahedron unit found in MOF-5 (e.g.refcode EDUSIF 64 ) exhibits a Zn-Zn deviation of -0.0699Å and Zn-O error of -0.04Å.
The zirconium oxide building block, from the well-known UiO-66 MOF (e.g.refcode RUBTAK03 65 ) is similarly well-conserved.GFN-xTB has corrected the inner oxygen positions from being symmetric, to having two types of oxygen atoms -one capped with hydrogen and another one without hydrogen.The distinct zirconium-oxygen bonds optimized to a longer bond for Zr-OH (refer to O2 in figure 4d) and a shorter bond for Zr-O (refer to O1 in figure 4d).This correction is justified as the crystal structure represents an average of oxygen positions, where 50% of the oxygen atoms bear a hydrogen atom, but which oxygen atoms (i.e. the orientation of the building block) is not pre-defined.

Geometrical Properties
To further interrogate the quality of the optimized structures, gravimetric and volumetric surface areas were calculated and compared to the surface areas of the original CoRE structures.Overall, the surface areas of the optimized structures follow a similar distribution to the experimental structures.In the volumetric surface area of lattice optimized structures, there is a conspicuous dip around 800-1200 m 2 /cm 3 .This occurs for one of two reasons: Firstly, after optimization, structures shrink marginally, as measured by comparing initial and final unit cell volumes, which results in 20-25% of structures in this region having a pore size that is no longer adequate to fit the nitrogen probe molecule (r=1.86Å), or structures undergo distortion resulting in a smaller pore -e.g. a wine-rack distortion (e.g.REFCODE TURDIV_clean 67 ).To further interrogate the quality of the optimized structures, gravimetric and volumetric surface areas were calculated and compared to the surface areas of the original CoRE structures.Overall, the surface areas of the optimized structures follow a similar distribution to the experimental structures.In the volumetric surface area of lattice optimized structures, there is a conspicuous dip around 800-1200 m 2 /cm 3 .This occurs for one of two reasons: Firstly, after optimization, structures shrink marginally, as measured by comparing initial and final unit cell volumes, which results in 20-25% of structures in this (b) (a) region having a pore size that is no longer adequate to fit the nitrogen probe molecule (r=1.86Å), or structures undergo distortion resulting in a smaller pore -e.g. a wine-rack distortion (e.g.REFCODE TURDIV_clean 68 ).

Conclusion
We tested GFN-xTB on MOFs for its ability to do periodic geometry optimization, we observe both the performance on fixed lattice optimization and complete relaxation of lattice optimization.The results confirm that optimized structures using GFN-xTB conserve the experimental structures very well.After optimization, 74.51% of all cell parameters are within 5% of the experimental value.
We also confirm that GFN-xTB could also reproduce the detailed atomic structures shown by the RMSD of atomic position being 0.489 Å for all atoms and 0.439 Å for metal atoms when the lattice was fixed, while for the optimized lattice, the RMSD values are 0.617 Å for all atoms and 0.560 Å for metal atoms.Bond lengths are better conserved with an RMSD of 0.120 Å for all bonds and 0.187 Å for metal containing bonds for fixed lattice optimization.While for relaxed lattice optimization 0.175 Å for all bonds and 0.236 Å for metal containing bonds.
This accuracy, combined with computational efficiency and the wide coverage of the periodic table (Z ≤ 86) make GFN-xTB a useful method for screening studies.The emphasis on accurate non-covalent interactions in the method, make GFN-xTB especially useful in screening for applications where binding energies of adsorbate molecules are sought, as force fields often have difficulty in this area.

Figure 1 .
Figure 1.Error in cell parameters, calculated as (XGFN − XExp) (a, b, c).Right hand side shows the 95.13% of cell parameters within ±20% of the experimental value.

Figure 2 Figure 2 .
Figure 2. RMSD of GFN-xTB optimized atom positions with respect to the CoRE structures.(a) shows optimization allowing only atom position movement.(b) shows optimization where both the lattice and atomic positions were allowed to relax.

Figure 3 .
Figure 3. RMSD of GFN-xTB optimized bond lengths with respect to the CoRE structures.Blue shows all bonds, orange shows without metal atom bond, and green shows all metal containing (M−X).(a) shows optimization allowing only atom position movement.(b) shows optimization allowing lattice parameter and atom movement.

Figure 4 .
Figure 4. (a) Gravimetric and (b) volumetric accessible surface areas of CoRE structures, as provided (blue) and after optimization using GFN-xTB in AMS with fixed lattice parameters and relaxed lattice parameters (orange and green respectively).

Structures that collapse during geometry optimization
-xTB is parameterised only for Z ≤ 86.Therefore, all structures with atoms of elements with Z > 86 were excluded.(191 structures) 2. Very large structures.Structures with total number of atoms greater than 1200 were considered too computationally expensive for a full optimization, especially given the number of "soft" deformations expected in such structures.In this case, we limit the optimization to 100 geometry optimization steps.(334 structures) 3. Partial occupancy CIF files.Structures with partial occupancies: Calculations can only be undertaken on full atoms.Structures where the total number of atoms is greater than defined by the stoichiometry are excluded from the evaluation in this work.(1407 structures) 4.
. There are several possible causes of structural collapse:

Table 1 .
Mean RMSD of GFN-xTB optimized atom positions with respect to the CoRE structures.