Juul S.
De Vos
a,
Sander
Borgmans
a,
Pascal
Van Der Voort
b,
Sven M. J.
Rogge
*a and
Veronique
Van Speybroeck
*a
aCenter for Molecular Modeling (CMM), Ghent University, Technologiepark-Zwijnaarde 46, 9052 Zwijnaarde, Belgium. E-mail: Sven.Rogge@UGent.be; Veronique.VanSpeybroeck@UGent.be
bCentre for Ordered Materials, Organometallics and Catalysis (COMOC), Ghent University, Krijgslaan 281 (S3), 9000 Ghent, Belgium
First published on 16th March 2023
Covalent organic frameworks (COFs) are a versatile class of building block materials with outstanding properties thanks to their strong covalent bonds and low density. Given the sheer number of hypothetical COFs envisioned via reticular synthesis, only a fraction of all COFs have been synthesized so far. Computational high-throughput screenings offer a valuable alternative to speed-up such materials discovery. Yet, such screenings vitally depend on the availability of diverse databases and accurate interatomic potentials to efficiently predict each hypothetical COF's macroscopic behavior, which is currently lacking. Therefore, we herein present ReDD-COFFEE, the Ready-to-use and Diverse Database of Covalent Organic Frameworks with Force field based Energy Evaluation, containing 268687 COFs and accompanying ab initio derived force fields that are shown to outperform generic ones. Our structure assembly approach results in a huge amount of computer-ready structures with a high diversity in terms of geometric properties, linker cores, and linkage types. Furthermore, the textural properties of the database are analyzed and the most promising COFs for vehicular methane storage are identified. By making the database freely accessible, we hope it may also inspire others to further explore the potential of these intriguing functional materials.
Computational high-throughput screenings offer a valuable alternative to accelerate materials discovery.35,36,51,52 Such screening studies select a database of material geometries and perform several calculations on each material to predict their macroscopic behavior. Especially for MOFs, high-throughput screenings are abundant, especially in the fields of gas adsorption53–60 and separation61–66 processes. Recently, also the discovery of MOFs with targeted electronic and catalytic properties has gained attention.67–70 A limited number of screening studies has been performed to shed light on their mechanical stability.40,56 All these high-throughput screening studies start from one of the many constructed MOF databases. These databases can be divided into two major categories, depending on the origin of the structures they contain. On the one hand, experimental databases are built from already synthesized materials, containing either the synthesized structure or the structure as obtained after computational structure optimization and/or guest removal. The CoRE MOF database,54 for instance, contains a subset of 5109 MOF structures identified from the Cambridge Structural Database (CSD)71 and was later expanded to include 14142 structures.72 Moghadam et al. implemented an automated screening algorithm in the CSD Python API to instantly identify a MOF when it is added to the CSD.73 At the time of its first publication, this subset contained 69
666 MOF materials.73 Recently, the QMOF database was established, containing a subset of 15
713 materials from the previously mentioned databases for which DFT calculations can be carried out efficiently.68 On the other hand, hypothetical databases contain in silico generated structures. They broaden the material space and provide a large and diverse set of structures. Although energy minimization approaches exist to generate plausible geometries, such as the Automated Assembly of Secondary Building Units (AASBU) method,74 hypothetical databases generally rely on geometric procedures that connect SBUs with one another to form a periodic material without any optimization. These geometric procedures can be divided into two classes: bottom-up and top-down methods, depending on how the SBUs are assembled. In the bottom-up approach, SBUs are naturally grown until a periodic framework is formed. Wilmer et al. applied this method to a set of 102 SBUs to generate a database of 137
953 MOFs,53 whereas a database of 324
500 MOFs was generated from 66 SBUs and 19 functional groups by Fernandez et al.75 However, later research showed that the topological diversity of the structures constructed using this bottom-up approach is limited.76 Top-down approaches typically result in a larger variety of topologies. These methods initially define the topological net, after which the SBUs are deliberately placed on the net's nodes. Multiple software packages, such as Zeo++,77 AuToGraFS,78 Weaver,79 TOBASCCO,80 and ToBaCCo,55,56 use such top-down approach to generate hypothetical frameworks. The latter has been used to construct a database of 13
512 MOFs starting from 78 SBUs and 41 topologies.55
Also for COFs, several high-throughput screenings have emerged, although they mainly focus on gas adsorption and separation processes, such as methane storage,81–83 hydrogen storage,84,85 and carbon capture,86–88 among others.89–91 These high-throughput COF screenings all rely on one of the four large COF databases constructed to date. The two experimental databases, the CoRE89 and CURATED86 databases, focus on the boronate ester and imine COFs, which are abundantly present in literature. They naturally underrepresent other linkage types that are less frequently observed experimentally, although these linkage types may result in materials with unique features. Therefore, Martin et al.82 and Mercado et al.83 created two databases of hypothetical COFs to explore different regions of material space. However, because they focused on a large diversity of linker cores, they included only a limited number of linkage types, some of which are rarely realized experimentally. Moreover, when considering those linkage types that are experimentally relevant, these hypothetical databases are significantly lacking. As each linkage type provides unique properties, a suboptimal representation of these subclasses in a database will result in a largely untapped potential for COF materials. For example, boronate ester COFs have outstanding crystallinity,6 and, whereas imine linked COFs already possess improved stability,6 enamine COFs have an even higher stability.92 Furthermore, triazine and hydrazone COFs provide coordination centers for transition metals that can be adopted in catalysis.93,94 Yet, both enamine and hydrazone COFs are absent in the current hypothetical databases and are underrepresented with respect to imine COFs in the experimental ones.
To tap into the potential of these and other COF linkage types, we present in this work a hypothetical database describing a diverse set of linkage types, including both frequently observed linkages and linkages that are not often synthesized experimentally. Furthermore, an unprecedented feature of our database is that a system-specific force field is generated for each material, starting from the cluster force fields of the underlying building blocks. These are derived using our in-house developed QuickFF routine.38,39 As such, each structure of the database can directly be used in high-throughput studies.
Besides the material's geometry and partial atomic charges derived for the COFs in the aforementioned experimental databases, the four databases lack the necessary information to model the interatomic interactions. To model these interactions, one can resort to computationally expensive ab initio methods, such as density functional theory (DFT), which largely limits the length and time scales that are feasible to simulate,67,68,70 or a less expensive generic force field with reduced accuracy.56,95 To obtain a description of the interatomic interactions with a higher accuracy than generic force fields but a lower computational cost than ab initio techniques, some studies derived system-specific force fields for a smaller set of materials.40 Such force fields are attractive to perform high-throughput screenings and to remove structures with a low synthetic likelihood from the database. For instance, many structures in the hypothetical databases of Martin et al.82 and Mercado et al.83 contain largely deformed SBUs and are likely unphysical. To identify the structures that are experimentally the most feasible, the synthetic likelihood of a large database of structures has been predicted with DFT energies96 and force field free energy calculations.95 It has been shown that while free energy calculations are necessary to predict the most favorable configuration out of a set of polymorphs, energy metrics are sufficient to predict the synthetic likelihood of hypothetical materials.95 However, this requires one to augment the versatile COF database with a relatively inexpensive yet sufficiently accurate method to describe the interatomic interactions.
Therefore, we present in this work the ReDD-COFFEE database: a ready-to-use database of 268687 COFs. ReDD-COFFEE is an acronym for Ready-to-use and Diverse Database of Covalent Organic Frameworks with Force field based Energy Evaluation. In our database, an optimized atomic structure and an ab initio derived force field is provided for each material. Essential to generate system-specific force fields for the huge amount of structures within this database, is the building block approach. In this approach, cluster force fields are fitted to quantum mechanical reference data for a limited number of smaller building blocks and then assembled to derive force fields suitable for the periodic structures. This procedure was introduced earlier through our QuickFF procedure,38,39 but is now adopted for a much larger number of materials. By deriving cluster force fields for the underlying building blocks, the number of required ab initio calculations is greatly reduced, while a high accuracy of the interatomic potential is maintained.40–50 All COFs are assembled starting from a limited set of 279 SBUs, which are selected to result in a diverse database, as elaborated in Section 4.2. In contrast to other hypothetical databases, we have explicitly included a large number of linkage types in our database. The SBUs are combined taking physical constraints into account and the structures that have the lowest synthetic likelihood are removed with a deformation energy filter. We firstly demonstrate that our system-specific force fields achieve a higher accuracy than generic force fields. Next, we show that our database has a great diversity in terms of geometric properties, linker cores, and linkages compared with the already existing COF databases. Furthermore, we establish property–property relations in between textural properties and compare them with other nanoporous materials, i.e., MOFs53,55,68 and zeolites.97 Finally, the applicability of the database to identify COFs for vehicular methane storage is demonstrated by identifying promising candidates and determining property–property relations for adsorption properties on a diverse subset of the database containing 10
000 structures. All 268
687 optimized structures and force fields of the ReDD-COFFEE database are publicly available at https://doi.org/10.24435/materialscloud:nw-3j and are ready-to-use for further high-throughput screenings.
In step 2 of Fig. 2, a first selection of reasonable SBU configurations is made based on the geometric consideration whether the points of extension can be oriented towards the neighboring nodes correctly. This is quantified by the root-mean-square deviation (RMSD) between the unit vectors pointing (i) from the center of the SBU towards the points of extension, and (ii) from the node towards its neighbors in the topology. In the ideal case, these two sets of vectors would be overlapping once the SBU is positioned on the node using the transformation obtained from Kabsch algorithm.108 However, for some configurations, such as configuration 2 of SBU1 and configuration 3 of SBU2 in Fig. 2, the transformation results in a less optimal configuration and a geometric mismatch between the SBU and the topological node is introduced. To avoid that these SBU configurations would be inserted in the topology, only those configurations that minimize the RMSD are selected and passed on to step 3. Furthermore, to avoid structures with too large a mismatch, a second filter checks if the minimal RMSD for each node is lower than the threshold RMSDmax = 0.11 Å (see Section S1.5 of the ESI† for the rationalization of this value). If this is not the case, for instance when trying to insert a tetrahedral SBU on the square vertex V11 in Fig. 2, the (topology, SBUs) combination is rejected. After this step, 403581 combinations in 1196 topologies are retained.
This challenge motivated us to introduce our additive top-down approach, in which the nodes of the topology are decorated one-by-one. In each step, the optimal configuration of the SBU of the selected node, i.e., the configuration that minimizes both the RMSD and the deformation energy of the linkages with the already inserted SBUs, is identified. By following a breadth-first iteration through the topological graph, all nodes at a certain distance from the starting vertex are decorated before continuing with the next layer. As such, the number of linkages with neighboring building blocks is increased for each SBU as compared to a random iteration or a depth-first iteration. As indicated in Fig. 2, step 3 can be omitted for the starting node, as no neighboring SBUs are present yet and the deformation energy Edef is the same for all configurations. The final SBU configuration for vertex V11 is thus chosen from the set of configurations that minimizes the RMSD in step 2. In the next step of the breadth-first iteration, a neighboring node to the ones already occupied is decorated with an SBU. In the example of Fig. 2, the vertex V21 is selected once a configuration for the SBU that is put on vertex V11 is determined. From the SBU configurations that minimize the RMSD in step 2, the deformation energy Edef of the linkage from this SBU with the already inserted SBU on vertex V11 is calculated. Finally, the SBU configuration that minimizes the deformation energy Edef is selected and inserted on the vertex V21. As such, the deformation energy Edef has to be calculated only once for every SBU configuration, instead of for every material configuration, and the computational complexity of the structure assembly process is reduced from exponential to linear in terms of the number of nodes in the topology. For linear linkers, which only have two points of extension lying on the same axis as the center of the SBU, there is still a degree of freedom that step 2 can not describe: the rotation around their axis. By using the deformation energy Edef in step 3, this degeneracy is lifted to the point that only the internal symmetries of the linker remain.
The RACs are calculated between two sets of atoms, which are defined for each chemical environment (ii–iv). Initially, the linkages (ii) that hold together the COF are identified by scanning the material graph for linkage patterns. By removing these linkages from the material graph, the linker cores (iii) that constitute the COF are retrieved. The functional groups (iv) are detected as parts of the linker cores that are attached to its skeleton with exactly one bond and do not exist of a single hydrogen atom. Once the start and scope atom lists are determined for each chemical environment (see Section S3.2 of the ESI†), the difference and product RACs are defined in eqn (1) and (2), respectively.
![]() | (1) |
![]() | (2) |
Each RAC investigates one out of six properties P: the atom identity (I), connectivity (T), Pauling electronegativity (χ), covalent radius (S), nuclear charge (Z), or polarizability (α). The depth d indicates the number of bonds that must be present between the considered atoms in the start and the scope list. Together with the definition of the start and scope atom lists, which depend on the chemical environment, this depth specifies the atom pairs over which to iterate. With two types of RACs, six properties, and a maximum depth of three bonds, a total of 48 descriptors are obtained for each environment (linkages, linker cores, and functional groups).
Once all materials are featurized, either with structural parameters or with RACs, the diversity metrics that determine how well a material set covers the material space can be defined. The material space is in this context described by the union of all available databases: the four existing databases and our database presented here. Before calculating the diversity metrics, the material space is subdivided into a specific number of bins, here chosen to be 1 000, defined through k-means clustering. These bins partition the material space into subclasses. The variety V of the considered database checks if each of the subclasses is sampled by measuring the number of bins that are examined. To make sure that each subclass is sampled equally, the balance B indicates the evenness of the distribution of materials among the sampled bins. In the ideal case, all covered subclasses are represented with the same number of structures. Lastly, the disparity D quantifies again the fraction of material space that is covered by the considered database, using a distance-based approach instead of the clustering into bins. Therefore, it is also a measure of the spread of the bins. The mathematical definition of these diversity metrics is given in Section S3.3 of the ESI.† Together, the variety V, the balance B, and the disparity D summarize the diversity of a subset of the material space for each investigated domain.
The accuracy of these cluster force fields is to a large extent defined by the choice of termination. A larger termination mimics the environment of the SBU in the material in more detail but is only applicable for a smaller set of materials as the termination has to represent the material correctly. Herein, we have chosen to define a single termination for each observed linkage section (blue in Fig. 1). The cluster termination is therefore independent of the SBU environment and can be adopted in each material containing that SBU. As such, there are as many clusters as there are combinations of linker cores and linkage sections. These terminations are depicted in Fig. S3 of the ESI.†
Once a cluster force field is obtained for each SBU in the material, the parameters of the periodic force field are derived. As the cluster approximation has the smallest impact on covalent terms that are fully embedded in the SBU, these terms are directly adopted from the cluster force field. In contrast, the overlap terms that span multiple SBUs are most accurately described in the cluster force field of the SBU with the majority of the atoms. Hence, these parameters are obtained by taking a weighted average over the respective terms in both constituent SBUs. For each term, the weights associated with each cluster involved in the term are proportional to the number of atoms embedded in the SBU core of the cluster. Also bond charge increments of bonds spanning two SBUs are obtained following this procedure. Charge neutrality is satisfied by using bond charge increments to derive the partial charges in the periodic material.113 The averaging is only allowed when all covalent bonds involved in the term have the same bond order in both clusters involved, which is mostly the case in this paper. When this is not the case, the parameters are directly obtained from the building block that mimics the environment correctly.
Inserting the SBUs in a periodic material introduces topological constraints to its environment, which can impose strain between the SBUs that is not present when considering an isolated cluster. The equilibrium geometry of the SBU in the material is therefore not necessarily the same as in the cluster model. This is for example illustrated in COF-108,2 where the tetrahedral TBPM building block, with point group Td, is inserted in the bor topology and placed on a vertex with point group D2d. Due to the geometric mismatch, which results in an RMSD of 0.09 Å before optimization, the SBUs are slightly reshaped with respect to their equilibrium structure.
To quantify the energy penalty for inserting the SBUs in a suboptimal environment, the deformation energy Edef is introduced as the energy difference between the periodic material and the sum of the energies of the isolated clusters it is composed of. Only interactions that are present both in the periodic material and the isolated clusters are included. Although they differ only slightly from the cluster force field parameters, the force field parameters of the periodic structure are also used for the calculation of the energies of the different clusters to ensure a consistent comparison. More specifically, the covalent and non-covalent interactions in the periodic material are divided into interactions between atoms within the same SBU (Eper,intracov and Eper,intranon-cov) and in different SBUs (Eper,intercov and Eper,internon-cov). In the periodic material, the interactions within the same SBU, i.e., Eper,intracov and Eper,intranon-cov, are also described in the cluster model by Eclust,intracov and Eclust,intranon-cov, respectively. The covalent interactions between different SBUs in the periodic material, i.e., Eper,intercov, exactly correspond to the overlap terms between the core and the terminations of the appropriate clusters Eclust,intercov. To avoid that these overlap terms would be counted in both corresponding clusters, they are weighted with the same rescaling factors as in the generation of the periodic force field. However, no cluster counterpart for the long-range non-covalent interactions between different SBUs Eper,internon-cov can be defined, as these interactions are not integrated into the isolated cluster model. They are therefore neglected in the calculation of the deformation energy. As the topological constraints scale with the number of linkages between SBUs, N, the deformation energy is normalized with this number, resulting in the definition of Edef in eqn (3).
![]() | (3) |
Since the deformation energy indicates the energy change upon inserting an SBU in a topology, it describes how easily the SBUs can accommodate the mismatches introduced due to topological constraints, i.e., if they are sufficiently flexible. High values of the deformation energy indicate that the SBUs are too rigid to accommodate the introduced mismatches and result in highly contorted structures. As described in Section 2.2, COFs with a high deformation energy have a low synthetic likelihood and are therefore discarded from the database via the third filter.
The initial periodic structure is generated by placing the SBUs on the nodes of the topology and its periodic force field is derived from the cluster force fields of the underlying building blocks, as described in Sections 2.2 and 2.4, respectively. For 2D COFs, a 1 × 1 × 2 supercell is used to include two layers in the simulation cell that can better describe the specific type of stacking. The initial structure is a perfect eclipsed structure with an interlayer distance of 10 Å. This interlayer distance is chosen sufficiently large to avoid overlap between the layers and decreases during the optimization. The structures are relaxed with Yaff,125 following a three-step procedure. During the structure assembly process, nearby SBUs may contain atoms that almost overlap. The potential well of the Buckingham potential that describes the van der Waals interactions of the QuickFF derived force field would diverge to minus infinity when two such atoms approach each other. To push these atoms apart and generate a more physical geometry, the first step of the optimization procedure is to optimize the atomic positions with the UFF force field for 50 steps. The van der Waals interaction of the UFF force field is described by a Lennard-Jones potential, which is repulsive at short distances.124 Subsequently, the system-specific QuickFF force field is applied to fully optimize the atomic positions. Finally, the unit cell parameters are added to the degrees of freedom that are relaxed. For each optimization, the conjugate gradient optimizer as implemented in Yaff is adopted. A real-space cutoff rcut of 11 Å is used for the nonbonded interactions. Tail corrections are used to estimate the van der Waals interactions beyond this cutoff distance. Furthermore, the electrostatic interactions are calculated with an Ewald summation, with scaling factor α = 0.26 Å−1 and reciprocal space cutoff kmax = 0.26 Å−1. A truncation model is used to smooth these interactions. The textural properties of the optimized structures are derived with Zeo++ (v0.3).127 To calculate the accessible surface area and volume, a probe radius of 1.84 Å, coinciding with the kinetic radius of nitrogen,128 and 3000 Monte Carlo samples are adopted. In Section S5 of the ESI,† benchmark studies for all simulation parameters are provided.
Molecular dynamics (MD) simulations are performed to compute the powder X-ray diffraction (PXRD) patterns and single crystal structures at operating conditions50 using the Yaff software package125 with the same force field parameters as determined for the optimizations. The MD simulations sample the (N, P, σa = 0, T) ensemble at operating conditions, i.e., a pressure of 1 atm and a temperature of 100 K for COF-300 and LZU-111, a pressure of 1 atm and a temperature of 89 K and 298 K for the distinct phases of COF-320, respectively, and a pressure of 1 bar and a temperature of 300 K for the calculation of the PXRD patterns. They are controlled by a Nosé–Hoover chain thermostat129–131 with three beads and a relaxation time of 100 fs, and a Martyna–Tuckerman–Tobias–Klein barostat132,133 with a relaxation time of 1000 fs, respectively. The velocity Verlet integration scheme is used, with a timestep of 0.5 fs. For the computation of the crystal structures, 9900 snapshots are collected from an MD trajectory of 500 ps, where the first 5 ps are considered as equilibration run. An MD trajectory of 200 ps is created to compute the PXRD patterns and 50 snapshots are extracted from the last 100 ps. 3D COFs in the test set are simulated using a 2 × 2 × 2 supercell, whereas a 1 × 1 × 5 supercell is adopted for the 2D COFs, resulting in simulation cells of ten layers to account for the inherent freedom in layer movement. All PXRD patterns are computed using the pyobjcryst python package, based on the ObjCryst++ Object-Oriented Crystallographic Library.134 A Cu Kα wavelength of 1.54056 Å is adopted. The peak shape is computed with a pseudo-Voigt shape function, where the mixing parameters are set to η0 = 0.5, η1 = η2 = 0, and a fixed width W of 0.02° in Caglioti's formula together with the U and V parameters equal to 0°.
In Section 4.4, we perform grand-canonical Monte Carlo (GCMC) simulations135 using RASPA136 to calculate methane storage capacities at pressures of 5.8 bar and 65 bar. A cutoff radius of 14 Å is used for all interactions and tail corrections are applied, as required by the TraPPE model.137 The simulation cell contains as many unit cells as needed to ensure a distance of twice the cutoff radius in every direction. The simulations are run for 10000 cycles, from which the first 5000 are discarded for equilibration. As outlined in Section S5.3 of the ESI,† this is sufficient to let the system equilibrate. For the subset of 10
000 COFs that are contained in our test set, 1600 structures are discarded from the GCMC analysis as their pore sizes are prohibitively large, preventing the calculation of the interactions between the framework and the methane molecules at 65 bar within a reasonable timeframe. As explained in Section 4.4, it is expected that these structures have a low volumetric deliverable capacity and are therefore unviable candidates for vehicular methane storage. Six additional COFs are discarded as they require too much memory.
Starting from the ab initio optimized geometry, both the QuickFF and UFF force fields are used to relax the SBU clusters. Subsequently, the adjustments of the internal coordinates, i.e., the bonds, bends, dihedral angles, and out-of-plane distances, are measured. The QuickFF force field successfully reproduces the ab initio optimized geometry, with RMSD errors on the bonds, bends, and out-of-plane distances as small as 4.73 × 10−3 Å, 7.18 × 10−1°, and 4.12 × 10−2 Å, respectively (see Fig. S26 in Section S2.3 of the ESI†). These are substantially lower than the RMSD errors obtained for the UFF optimized structures, which amount to 3.56 × 10−2 Å, 2.87°, and 4.50 × 10−2 Å, respectively. The most challenging internal coordinates to describe with force fields are the dihedral angles of non-planar COF building blocks, as they are primarily dictated by long-range electrostatic and van der Waals interactions. The RMSD error of these internal coordinates in the QuickFF relaxed structure is relatively large, namely 9.40°, which is still improved substantially compared to the UFF relaxed structure, for which the RMSD error is 22.27°.
For these optimized clusters, the force field vibrational frequencies are derived and compared with the ones obtained from the ab initio Hessian. Similar to the internal coordinates, also the vibrational frequencies are described more accurately by the QuickFF force field (see Fig. S27 in Section S2.3 of the ESI†). The RMSD error on the frequencies is lower than the errors obtained for the UFF force field in all frequency regions. While they are still comparable in the low-frequency regime (<500 cm−1), the QuickFF force fields outperform UFF by an order of magnitude for the higher frequencies (>500 cm−1), as the QuickFF parameters are fitted to reproduce the ab initio Hessian.
This quantitative comparison between the force field and ab initio derived values of both the optimized cluster geometry and the vibrational frequencies provides an indication of the accuracy our QuickFF force fields can reach. However, we are more interested in the capacity of our periodic force fields to predict experimentally measured, structural properties. For COFs, a key macroscopic descriptor is the PXRD pattern, capturing the atomic structure of the material, typically used due to the challenging synthesis of single crystal COFs.5 In Fig. 3c, the weighted profile residual Rwp is used as a heuristic metric to compare the calculated PXRD patterns of a diverse set of seven COFs with their experimentally measured pattern. For each material, a static and dynamically averaged PXRD pattern is derived both with the QuickFF and UFF force fields, following our procedure outlined in ref. 50. Dynamical averaging at operando conditions is necessary for COFs to account for the inherent temporal character of experimental measurements.50 In contrast to the static approach, during which the PXRD pattern is calculated for the optimized structure, the dynamic approach starts from an MD trajectory performed at operating conditions. The resulting PXRD pattern is an ensemble average of the pattern calculated for different snapshots from this trajectory.
For the large majority of the examined materials, the Rwp metric is lower for the QuickFF than for the UFF force field for both the static and dynamically averaged patterns, indicating a better agreement of the former with the experimental pattern. Of the materials shown in Fig. 3, the static PXRD is predicted better by UFF than QuickFF only for PI-COF-4 and ACOF-1.12,26 However, once experimental operando conditions are taken into account to obtain higher accuracy, our system-specific force fields again outperform the generic ones. As an example, the PXRD patterns for CTF-1 and COF-103 are provided in Fig. 3a and b. It can be observed that the peak that is detected at 27° and 23° in the experimental patterns of CTF-1 and COF-103,2,138 respectively, is correctly reproduced in the dynamically averaged QuickFF PXRD pattern. However, the poor description of the dihedral angles between aromatic rings in the UFF force field enlarges the unit cell parameters and shifts these peaks to lower angles. PXRD patterns for the additional COFs can be found in Fig. S28 of the ESI.†
When single crystals are available, the structure of a COF can be determined with an even higher resolution compared to analyzing the PXRD pattern. While the synthesis of single crystal COFs is challenging, some studies have succeeded in experimentally determining the framework's atomic structure with single crystal X-ray diffraction (SCXRD) or 3D rotation electron diffraction (RED).139,140 To verify that our system-specific force fields also predict the COF structure more accurately than generic ones at these higher resolutions, additional MD calculations at operando conditions are performed to reproduce the crystal structures of COF-300 and LZU-111. Furthermore, simulations are executed to distinguish between two distinct phases of COF-320, which are observed at different temperatures and both have a specific pore structure and unit cell volume.
As can be observed in Tables S6–S9 of the ESI,† our system-specific QuickFF force fields indeed achieve an overall better agreement with experiment than the generic UFF force field in describing the crystal structures of COF-300 and LZU-111 and the distinct phases of COF-320. Especially for the bonds, a consistently better description is achieved by our force fields, with relative differences being half of the ones obtained by UFF. Whereas some bends are more accurately described by the generic force field, most of them achieve a significantly higher precision in the simulations performed with our system-specific force fields. Also the dihedral angles, which are most difficult to reproduce due to the importance of the nonbonded interactions, are better described by our ab initio force fields. Finally, the unit cell volume and the unit cell lengths are reproduced more accurately by the system-specific force fields, again with the relative difference only being half the one obtained with UFF.
The diversity of experimental and hypothetical databases is illustrated in Fig. 4, where the distributions of selected frequently observed linkages of four COF databases are compared with the one presented in this work. Typically, the COF material class is divided into several subclasses according to the formed linkage. Each of these subclasses has its unique characteristics and advantages or disadvantages for several applications.6,141 While the majority of materials in the CoRE and CURATED databases contain imine and boronate ester linkages, other linkage types are less frequently observed in these experimental databases, indicating that they are more difficult to synthesize. To broaden the scope of investigated COF structures, Martin et al.82 and Mercado et al.83 built hypothetical databases using a geometric top-down approach. They chose to focus on a diverse set of linker cores but limited the versatility of linkage types. Both hypothetical databases contain the experimentally abundant imine linkages. In addition, the database of Martin et al. also focuses on boronate ester COFs, the second most observed linkage type, and to a lesser extent on borosilicate COFs, while other linkages are not described. Mercado et al. chose not to focus on boronate ester COFs, but to describe some less frequently observed linkage types, i.e., the amide, amine, and carbon–carbon linkages. The amide and carbon–carbon linked COFs represent only 1.79% and 1.46% of the experimental CoRE and CURATED databases, respectively, while the amine linked COFs are not present in either of the two databases at all. We chose to continue on this path and include a large variety of linkage types in the ReDD-COFFEE database. We also include linkages that are less frequently observed than the abundant imine and boronate ester COFs but occupy a larger fraction of the experimental databases than amide and carbon–carbon linked COFs. These linkages cover regions in material space that are experimentally more relevant but remain largely uncharted. Furthermore, the structures are well distributed over most linkage types as the only linkage dependent criterium in the (topology, SBUs) combination is that the linkage sections of neighboring SBUs have to combine to form their specified linkage. The structures that are assembled using a boroxine, triazine, borazine, or borosilicate linkage are present less frequently than the other COF subclasses, as the natural constraint that each SBU has to be connected with a three-connected vertex for their linkage largely limits the possible topologies.
![]() | ||
Fig. 4 Distribution of frequently occurring linkage types in five COF databases. The CoRE89 and CURATED86 databases are experimental ones, while the databases of Martin82 and Mercado83 contain hypothetical structures, similar to our ReDD-COFFEE database. |
Moosavi et al. developed a systematic approach to assess the diversity of a MOF database,37 which we extended to COF databases in Section 2.3. The resulting diversity metrics, i.e., variety V, balance B, and disparity D, for each domain and each of the five COF databases are plotted in Fig. 5. The geometric properties of the structures in hypothetical databases are more diverse than the ones in experimental databases. This comes as no surprise, as in silico materials assembly algorithms provide more structural freedom than experimental synthesis procedures. Compared to the other hypothetical databases, the structures generated in this work can deviate more strongly from the perfect topological graph as provided in the RCSR, because the deformation filter only removes those structures for which the deformation energy is prohibitively large. Therefore, the structures in the ReDD-COFFEE database are assembled in a larger number of topologies than the ones in the databases of Martin et al. and Mercado et al.82,83 Furthermore, while the distribution of the linker cores in experimental databases is more balanced, their variety is higher in hypothetical databases due to the combinatorial freedom in choosing the linker cores that constitute the COF. Moreover, while the experimental materials and the structures in the databases of Martin et al. and Mercado et al. are mostly restricted to containing two different SBUs at maximum, the structures generated in this work can combine more than two building blocks in one structure. The diversity of the linkages present in the ReDD-COFFEE database is better in terms of variety and disparity, but the linkages are more balanced in the CoRE database and the database of Martin et al. There is no clear preference between the hypothetical databases on the one hand and the experimental databases on the other hand. Lastly, the diversity of the functional groups is the largest in the hypothetical databases of Martin et al. and Mercado et al. They exploited a large database of linker cores with various functional groups, which are even more diverse than the ones used in experimental structures. In contrast, since we chose to focus on describing different linkage types, we started from a limited number of linker cores, to which a limited set of functional groups were attached (see Fig. S1 of the ESI†). However, the ReDD-COFFEE database can be used to extend the scope towards more functional groups. To this end, a two-step procedure could be used. Our nonfunctionalized database could first be reduced to a set of frameworks with a high potential for the targeted application, after which they can be easily functionalized a posteriori.
While our ReDD-COFFEE database combines a high diversity in terms of geometric properties and linker core and linkage chemistry with accurate system-specific force fields, the large number of structures can also form a barrier to its adoption in high-throughput screenings. Therefore, we created a smaller set of 10000 structures, using the same descriptors that were used to derive the diversity metrics, i.e., geometric properties and RACs to characterize the linker core, linkage, and functional group chemical environments. Starting from a random initial structure, an iterative procedure is followed that selects the structure that has the largest minimal distance to the set of already selected structures in each step. As materials can only be added to the selected set, the variety V and disparity D of each domain will continue to grow as the selected set becomes larger. However, the balance B will initially show a sharp peak when the first chosen structures are added to empty bins. Upon adding further structures, the balance will start to gradually drop. The selected subset, therefore, occupies a slightly smaller region in the material space than the full database, but the structures are better balanced, as can be observed in Fig. 5. Within the full database, many structures sample similar regions in material space and one structure hardly provides additional information that cannot be learned from the other structures. Computational high-throughput screenings that require expensive calculations can, therefore, use this subset to reduce the computational cost, while still achieving a comparable accuracy with respect to screening the whole database.
The pore diameter, and in turn also the mass density, is not only influenced by the topology and the building blocks of the material, as one could expect, but also by the linkage that connects these building blocks (see Fig. 6d). A linkage type that has a larger spatial extent, such as the azine and (acyl)hydrazone linkages, increases the space between the linker cores and, therefore, also the pore diameter. This results in materials with a lower mass density as opposed to boronate ester or imide COFs, which have linkages with a smaller spatial extent. Also the linkage types that emerge during synthesis, such as the triazine and borosilicate linkages, result in higher mass densities as the linker cores of the SBUs are only separated by a single bond and are thus placed closely together. This reasoning does not take into account that interpenetrated nets can form during experimental synthesis upon increasing the pore volume, as these are described to a lesser extent in our database.32
These considerations are not unique to COFs. Similar trends in textural properties can be observed for other classes of nanoporous materials, such as MOFs and zeolites. In Fig. 7, a density map of the textural property–property relationships is visualized for our ReDD-COFFEE database, together with three MOF databases, i.e., the experimental QMOF database68 and the hypothetical hMOF53 and ToBaCCo databases,55 as well as for the database of zeolite structures of the International Zeolite Association (IZA).97 Each relationship shows the same qualitative behavior, independent of the material class. As observed in Fig. 6a and b, the highest volumetric accessible surface areas of almost 2500 m2 cm−3 are found for materials with pore fractions of 0.4 and gravimetric accessible surface areas around 4000 m2 g−1. Fig. 6c shows that these have a mass density of around 500 kg m−3. As is illustrated in Fig. 6d, a higher pore fraction results in an increased gravimetric accessible surface area. COFs that approach a pore fraction of 1.0 can reach gravimetric accessible surface areas up to 9000 m2 g−1. While Fig. 6 illustrates that the COFs in our database cover the whole range of textural properties, Fig. 7 emphasizes that most of them are present in the region with the lowest mass density among the three material classes. More precisely, as is illustrated in Fig. S45 of the ESI,† 71.11% of the structures in the ReDD-COFFEE database have a pore fraction above 0.85 and a gravimetric accessible surface area larger than 7000 m2 g−1. This results in a unique combination of exceptionally high gravimetric accessible surface areas and pore volumes, together with a very large porosity as compared to MOFs and zeolites. This endows them with a large potential in adsorption applications. However, while there are COFs that have a comparable volumetric accessible surface area as the two aforementioned materials classes, most of them cover the smaller accessible surface region. Our property–property relationships show that the choice of topology, building blocks, and, importantly, linkage type, allows for a substantial freedom in the porosity reached in the synthesized COF.
To check the performance limits of COFs for vehicular methane storage, we perform GCMC simulations on the 10000 structures in the diverse subset of the ReDD-COFFEE database defined in Section 2.3. A storage pressure of 65 bar and a depletion pressure of 5.8 bar are applied, as imposed by the MOVE program.51,150 For the electrostatic contribution, MBIS charges122 are obtained from the periodic force fields and an ab initio calculation on a single methane molecule. The choice of van der Waals interactions is thoroughly benchmarked by reproducing the experimental adsorption isotherms of COF-1, COF-5, COF-102, and COF-103 (see Section S5.3 of the ESI†).8 This benchmark study showed that UFF overestimates the methane uptake, while it is underestimated when the MM3 van der Waals host–guest interactions are adopted. Combining the DREIDING force field151 to describe the host–guest interactions and the TraPPE united atom model137 for the guest–guest interactions gives the best accuracy while maintaining a reasonable computational efficiency. Since these GCMC simulations adopt a rigid framework structure, no host–host interaction model is required.
Fig. 8a shows that the majority of the COFs in the subset (70.6%) meet the ARPA-E target of 0.5 g g−1 for gravimetric deliverable capacity due to their extremely low mass density. However, the lightest materials also have the largest pores and encompass the largest volumes, resulting in low volumetric deliverable capacities of about 60 vSTP/v. As discussed in Section 4.3, the heaviest materials have no accessible pores and both the volumetric and gravimetric deliverable capacities approach zero. This can also be observed in Fig. 8c and d, where the dependency of the volumetric deliverable capacity on the largest pore diameter and the pore fraction is plotted, respectively. Histograms of the structures with the 5% highest and lowest volumetric deliverable capacity support the previous claims and identify that the COFs with the highest volumetric deliverable capacity have a pore diameter between 0.7 nm and 3.4 nm and exhibit quite a broad pore fraction within the range of 0.2 to 0.7. While these materials do not have the highest gravimetric deliverable capacity, there are still structures with high volumetric deliverable capacity that meet the ARPA-E target for gravimetric deliverable capacity. The highest volumetric deliverable capacity, 187.4 vSTP/v, is observed for the boronate ester based COF ths-c3_11-01-01_06-08-01_06-08-01, which has a gravimetric deliverable capacity of 0.37 g g−1. Among the structures that meet the gravimetric deliverable capacity ARPA-E target, the COF with the highest volumetric deliverable capacity is the imine COF ths-c3_11-02-04_04-03-04_04-03-04, which has a deliverable capacity of 141.1 vSTP/v and 0.50 g g−1.
Whereas the methane molecules at low pressures are mainly located at the adsorption sites of the framework, they are more distributed over the pore volume at higher pressures. Therefore, the adsorption behavior is dominated by methane–framework interactions at low pressures and methane–methane interactions at high pressures, respectively. Thus, as evidenced in Fig. 8b, the deliverable capacity, i.e., the difference between the methane uptake at high and low pressure, is an interplay between the two interaction types. When methane–framework interactions are less important than methane–methane interactions, the uptake at low pressures will be small and therefore, a higher deliverable capacity will be obtained. Contrarily, when the methane–framework interactions are more dominant, a lot of guests will remain bound to the framework at low pressures, decreasing the deliverable capacity. In Fig. S49 of the ESI,† the dependency of the methane uptake on the dimensionality of the framework is visualized. Despite that no COF in the database meets the ARPA-E target for the volumetric deliverable capacity, many of the here identified structures have volumetric deliverable capacities comparable with record-holding materials, such as MOF-5 (182 vSTP/v),152 HKUST-1 (183 vSTP/v),153 Co(bdp) (197 vSTP/v),154 or NJU-Bai 43 (198 vSTP/v).155 However, the hypothetical COF proposed by Mercado et al.83 that consists of a carbon–carbon linked triazine framework and achieves a volumetric deliverable capacity of 216 vSTP/v, is not matched in this work, as we did not consider carbon–carbon linkages in our database. Several studies have proven that the targets imposed by the ARPA-E are too ambitious and that physical limits are preventing these objectives from being achieved by the current state-of-the-art materials.51,156 These observations are valid also for the COFs in our ReDD-COFFEE database. However, upon functionalization, as discussed in Section 2.3, or by anchoring alkali metals to the framework,157 an enhanced volumetric deliverable capacity could be obtained.
We screened the textural properties of the COFs in the ReDD-COFFEE database and highlighted interesting property–property relations. Furthermore, our database is compared to databases of other material classes, i.e., MOFs and zeolites, which showed that COFs possess the lowest mass densities among the studied materials. They combine high gravimetric accessible surface areas and pore fractions with low volumetric accessible surface areas. Finally, we screened the diverse subset for attractive COF storage materials for ANG in vehicular transport. Property–property relations were determined and the structural characteristics of the top candidates were selected. The highest observed volumetric deliverable capacity is comparable with the current record-holding materials. This study forms the basis for future, more elaborate screening studies focussing on gas storage and separation processes, such as carbon capture from flue gasses, as well as the mechanical stability of COFs, which will maximally benefit from the increased accuracy of the derived force fields. We hope our ReDD-COFFEE database—which is freely available on the Materials Cloud (https://doi.org/10.24435/materialscloud:nw-3j)—may also encourage other researchers to perform high-throughput simulations on these materials and further tap into the potential of functional COF materials.
Footnote |
† Electronic supplementary information (ESI) available: Additional details about the database construction and force field generation and validation, together with extensive analysis and illustrative case studies. Further information and analysis of the diversity metrics and subset selection. Supplementary property–property relations for textural and adsorption properties. Benchmark of the adopted parameters in the force field optimizations, Zeo++ calculations, and GCMC calculations. All 268![]() |
This journal is © The Royal Society of Chemistry 2023 |