Open Access Article
Siyuan
Yang‡
ab,
Qianqian
Mao‡
a,
Heng
Ji‡
a,
Dingyue
Hu
a,
Jinjin
Zhang
a,
Linjiang
Chen
*cd and
Ming
Liu
*ab
aDepartment of Chemistry, Zhejiang University, Hangzhou, Zhejiang 310027, China. E-mail: mingliu@zju.edu.cn
bHangzhou Global Scientific and Technological Innovation Center (HIC), Zhejiang University, Hangzhou, Zhejiang 311200, China
cKey Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China. E-mail: linjiangchen@ustc.edu.cn
dSchool of Chemistry and School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK. E-mail: linjiangchen@ustc.edu.cn
First published on 8th April 2025
The development and sharing of computational databases for metal–organic frameworks (MOFs) and covalent organic frameworks (COFs) have significantly accelerated the exploration and application of these materials. Recently, molecular materials have emerged as a notable subclass of porous materials, characterized by their crystallinity, modularity, and processability. Among these, macrocycles and cages stand out as representative molecules. Experimental discovery of a target molecular material from a vast possibility of structures for defined applications is generally impractical due to high experimental costs. This study presents the most extensive Computation-ready Experimental (CoRE) database of macrocycles and cages (MCD) to date, comprising 7939 structures. Using the MCD, we conducted simulations of binary CO2/CH4 competitive adsorption under conditions relevant to industrial applications. These simulations established a structure–property–function relationship, enabling the identification of materials with potential for CO2/CH4 separation. Among them, a macrocycle, NDI-Δ, exhibited promising CO2 adsorption capacity and selectivity, as confirmed by gas sorption and breakthrough experiments.
953 MOFs constructed from 103 different building blocks.14 The database has been instrumental in numerous experimental discoveries of functional metal–organic frameworks (MOFs).11,15 Colón et al. introduced the Topologically Based Crystal Constructor (ToBaCCo) database, featuring 13
512 MOFs with 41 unique topologies.16 Concurrently, the development of COF databases has been remarkable, amassing over 560
000 structures.17–20 Experimental MOF structures, like those found in the Cambridge Structural Database (CSD), often contain solvent molecules within their pores, exhibit positional disorders in certain fragments, and lack hydrogen atoms, among other issues. These factors make such MOF structures unsuitable for direct use in computational studies. To address this, Chung et al. created a database of Computation-ready Experimental (CoRE) MOFs.21 In the CoRE-MOF database each MOF structure undergoes a multi-stage cleaning process to resolve structural issues, making them directly useable in calculations. The creation of the CoRE-MOF database featured systematic data curation and standardization procedures, and the approximately 14
000 structures22 originating from experimental data are immediately ready for computational studies. The database subsequently proved useful for researchers seeking to explore the potential of MOFs in various applications through computational studies.23–25 Later, Rosen et al. developed the Quantum MOF (QMOF) database, which consists of Density Functional Theory (DFT) optimized experimental MOFs with a range of DFT derived properties for each MOF, such as band gaps, Density-Derived Electrostatic and Chemical protocol 6 (DDEC6) charges, spin densities, and more.26
Similarly, Tong et al. compiled a CoRE-COF database of now over 600 covalent organic frameworks (COFs), extracted from the experimental literature.27 Based on CoRE structures, Ongari et al. developed a database comprising 324 “Clean, Uniform, and Refined with Automatic Tracking from Experimental Database” (CURATED) COF structures and updated it to 871 structures.28 The CURATED COF database has further undergone an optimization process for both atomic coordinates and cell dimensions of the CoRE structures, employing a multi-step DFT approach. Subsequently, DDEC6 charges were assigned to each atom, enhancing the database's utility.
Apart from MOFs and COFs, molecular materials have recently emerged as a notable subclass of porous materials, characterized by their modularity and processability.29 Unlike MOFs and COFs, which are extended framework structures, molecular materials are formed by the assembly of discrete molecules. The discrete feature provides the molecular materials with good solubility in common solvents or high solution dispersibility, and therefore promotes processability during applications, compared to MOFs and COFs.30 As many of these molecules have been intensively investigated as hosts in supramolecular systems, there are a great number of single crystal structures that have been reported and archived, leaving a valuable resource for potential high throughput structure screening. Therefore, there are continuous efforts to develop new porous molecular materials via computational design and computational screening. Evans et al. suggested that using small organic molecules exclusively as the building blocks for cage-based porous molecular structures could yield up to 1060 possible variants.31 This highlights the vast potential of using these organic entities for innovative porous molecular crystal discovery. Msayib et al. carried out a focused exploration within the CSD for molecular crystals with the capability for adsorption of hydrogen and nitrogen and identified 23 promising candidates.32 There are several molecular crystal databases that have been developed, such as the organic porous molecular crystals database (oPMC)33 and the Cage Database (CDB).34 Recently, Li et al. established the first database of metal–organic cages (MOCs), containing 1839 structures, and also the largest database to date of experimental organic cages (OCs), containing 7736 cages.35 This was achieved by integrating topological data analysis (TDA) and supervised and unsupervised learning methods. However, none of these databases entirely align with the criteria for ideal CoRE structures. Even for the most comprehensive OC database, a significant limitation arises from the presence of solvent molecules within many OCs, which obstruct channels or pores, thereby complicating the assessment of their potential applications. Moreover, the issue of redundant coordinates in the structure files, attributable to the occupancy ratio of atoms, requires meticulous correction and alignment with reference literature. Additionally, we noticed the inclusion of structures in the OC database that do not conform to the definitions of cages or macrocycles, necessitating their exclusion. Furthermore, the database's reliance on TDA to primarily consider the heaviest molecule within a structure has led to the unintended inclusion of rotaxanes and pseudorotaxanes.
This study introduces a CoRE database of macrocycles and cages, two of the most representative porous molecular materials. Using the CSD, we updated structures not previously catalogued within the OC. All structures were carefully curated and optimized in two steps, applying semi-empirical DFT to both atomic coordinates and cell dimensions. Subsequently, DFT-derived DDEC6 partial charges were assigned to each atom. This CoRE macrocycle and cage database (MCD) can be directly used for screening molecular systems for target functions. We conducted competitive Grand Canonical Monte Carlo (GCMC) simulations on a selected dataset from MCD to assess the selective adsorption efficiency of these structures for CO2 over CH4. Among the selected candidates, a macrocycle NDI-Δ was identified for its promising CO2 adsorption capacity and selectivity, as confirmed by gas sorption and breakthrough experiments. These results demonstrate the effectiveness of the MCD database for identifying promising molecular candidates for a target application.
Utilizing ConQuest (version 5.44, updated as of April 2023), we initiated a systematic search for potential candidates of cages and macrocycles, employing the ‘must-have’ criteria that were previously utilized for the OC database. The selection criteria were specific, requiring structures to have well-defined coordinates, to be non-polymeric and entirely organic, and to exclude any structures containing metal atoms. This selection process resulted in identifying 26
667 potential candidates, marking a notable increase from the 18
294 OC database candidates discovered using a previous version of the CSD. Details on the specific ‘must-have’ fragments and the corresponding numbers of hits are provided in Table S1.†
From the 26
667 candidates identified, 7736 had already been classified as ‘organic cages’ in the OC database, leaving 18
931 candidates with classifications yet to be determined. As previously discussed, the TDA method might categorize some structures that do not conform to the traditional definitions of cages or macrocycles. Furthermore, there is a deliberate effort to exclude rotaxanes and pseudorotaxanes, given the reported limited porosity of some macrocycles. Considering these factors, we undertook a manual review of the 18
931 unclassified structures. This review process aimed to identify and select those candidates that feature at least one macrocycle or cage structure, ensuring the relevance and accuracy of our database's content.
To ensure that structures from the CSD are suitable for computational simulations, a comprehensive cleaning and correction process was implemented to achieve a ‘computationally ready’ status. This process included the addition of hydrogen atoms, completion of missing atoms, removal of solvents and small guest molecules, adjustment of atomic positions, elimination of redundant atoms, correction of elemental assignments, structure exclusion and charge neutralization. These structural cleaning processes are detailed further in the Experimental section and ESI Section S2.†
After the structure cleaning process, more than 10
500 configurations were curated. However, the removal of solvent molecules may inadvertently increase the porosity of these configurations, which could lead to biased outcomes in simulations. Therefore, it is necessary to perform geometry optimization on these configurations to ensure accuracy. The application of DFT for optimization across such a large dataset presents considerable computational challenges, especially for structures generally comprising thousands of atoms. The GFNn-xTB series, particularly the GFN2-xTB model, has been shown to offer remarkable accuracy in geometry reproduction across a diverse array of systems compared to other semi-empirical methods.36,37 Its effectiveness has been demonstrated in the geometry optimization of large structures, such as transition metal complexes,38 periodic peptides and proteins,39 proving its capability to accurately reproduce structures as confirmed by X-ray diffraction data. GFN2-xTB was used to optimize the curated structures in two steps. First, the atomic coordinates were optimized with fixed cell parameters, followed by a second optimization process allowing full structural flexibility.
Notably, structural errors – whether missing elements or inaccuracies introduced during the cleaning phase – were identified through warnings in the optimization convergence process. These issues were then rectified, and the affected structures were reprocessed.
Finally, DFT-derived DDEC6 partial charges were computed for the optimized structures, utilizing the electron density that was computed by Perdew–Burke–Ernzerhof (PBE)40/DZVP-MOLOPT-PBE-GTH basis sets41 with DFT-D3(BJ) dispersion corrections.42
MCD includes both single-molecule crystals and cocrystals, with approximately 25% (1965 out of 7939) of the structures in the MCD containing more than one kind of molecule, highlighting the comprehensive coverage and versatility of the database in capturing the structural diversity of macrocycle and cage crystals.
The changes of cell volumes, the diameter of the largest inclusion sphere (Di), and the pore limiting diameter (Df) between structures optimized by GFN2-xTB and those further optimized by DFT are insignificant (Fig. 3c to e). The correlation coefficients (R2) for the linear relationship y = x for cell volume, Di, and Df are exceptionally high, standing at 0.99 for cell volume and 0.98 for both Di and Df. This indicates a strong agreement between the two sets. Specifically, for cages, the consistency across all three geometric parameters is remarkable, with an R2 value of 0.99, denoting very slight variations in most cases. However, the structure REWKIQ demonstrates a notable deviation, particularly in Di, where it underwent a significant contraction from 4.33 Å to 2.19 Å (nearly a 50% reduction) after DFT optimization. Additionally, Df saw a decrease from 1.38 Å to 0.99 Å. As a result, the structural similarity between the DFT and GFN2-xTB-optimized configurations of REWKIQ is relatively low, at 77.28%. This particular case of REWKIQ also highlighted the most substantial change in dispersion energy, at −0.0101 eV per atom, with a total energy difference of −0.0130 eV. These energy variations suggest that the discrepancies observed are primarily due to changes in molecular movement and denser packing after DFT optimization, which significantly affect the dimensions of Di and Df. The details of REWKIQ structural comparison can be seen in Fig. S4.†
The accessible volume (Fig. 3f) and surface area (Fig. 3g) of GFN2-xTB optimized structures underwent slightly larger adjustments during DFT optimization than cell volume, Di and Df. The R2 values for accessible volume and surface area are 0.97 and 0.95, respectively, indicating a strong linear relationship despite these adjustments. Notably, for cages, the R2 values remain exceptionally high at 0.99. This high level of consistency may be attributed to the predefined pore and window structures characteristic of cage molecules. Interestingly, after DFT optimization, 17 structures were identified as non-porous, of which 13 belonged to similarity stage I, indicating that their similarity to their GFN2-xTB-optimized counterparts exceeded 90%. Within this subset, FAFHUQ exhibited the lowest similarity at 95%. On the other hand, GUNDEZ underwent a transition from non-porous to porous as a result of DFT optimization, achieving a similarity of 94% alongside an accessible volume of 0.056 cm3 g−1 and a surface area of 898.3 m2 g−1. These observations underline the potential for even minor modifications in molecular crystal structures to significantly impact their geometric properties. Such changes are crucial considerations in the preliminary screening process for material selection, demonstrating the importance of recognizing the flexibility and dynamic properties inherent in molecular crystals.
The changes in accessible surface area, volume, and void fraction are shown in Fig. 4d to f. From the dataset, 1789 structures were initially identified as being porous to CO2 in their cleaned state, but this number fell to 1142 after optimization. 712 structures that were initially porous lost their porosity after optimization, while 65 structures changed from being non-porous to porous. Reflecting the patterns seen in cell volume, Di, and Df, a significant proportion of structures experienced reductions in accessible surface area (90%), accessible volume (82%), and void fraction (80%). This analysis explicitly excludes structures categorized as non-porous both before and after optimization.
The efficiency of computationally driven material discovery using high-throughput GCMC simulations based on other CoRE databases has been demonstrated in several previous studies.46,47 First, we selected structures in MCD that have been previously studied for CO2 and CH4 adsorption isotherms in the literature, and the simulated isotherms for both CO2 and CH4 showed good agreement in shape with the experimental adsorption isotherms (Table S4†). The adsorption mechanisms of these materials were captured by the GCMC simulations. The IAST selectivity values from simulated and experimental isotherms are comparable, as is the selectivity ranking. However, due to the inherent flexibility48–51 of molecular materials and the loss of crystallinity after activation, GCMC simulations are not able to precisely reproduce the exact adsorption quantities of all materials. The selectivity and ranking derived from simulations remain valid for guiding the identification of promising materials. We conducted high-throughput GCMC simulations to assess the competitive adsorption of CO2/CH4, aiming to demonstrate that MCD is capable of identifying potential materials for specific applications. Criteria for selection included a requirement for structures in MCD to have a Df exceeding 3.80 Å,52 the kinetic diameter of CH4, leading to a subset of 697 structures for analysis.
Fig. 5a visualizes the correlation between CO2 uptake and CO2/CH4 selectivity across these structures, revealing that materials exhibiting the highest selectivity generally possess smaller Df values. The materials displaying the top five selectivity were identified as DORZAO (4.25 Å), HOWNEO (5.62 Å), PUNMUH (5.47 Å), CUVHOT (4.98 Å), and OQIVAO (6.19 Å), arranged in descending order of selectivity. Notably, a larger accessible surface area was not indicative of increased CO2 uptake under the conditions tested. The structures with the highest CO2 uptakes had surface areas of 1181 m2 g−1 (REDMET), 1456 m2 g−1 (CUMHUO), 1455 m2 g−1 (KODMIC), 1097 m2 g−1 (WASPEO), and 2581 m2 g−1 (FINCIP), respectively. More detailed structure–property relationships can be found in ESI Section S7.†
DORZAO, the γ polymorph of (−)-NDI-Δ, is a macrocycle first reported by Stoddart's group in 2013.53 For simplicity, we refer to it as NDI-Δ throughout the text. It stood out as the structure with the highest selectivity within the screening range, attaining a selectivity value of 50.14 and a CO2 adsorption capacity of 1.97 mmol g−1, with negligible CH4 adsorption. Fig. 5b and c highlight two types of adsorption sites: intrinsic and extrinsic pores, in the simulated crystal structure after CO2/CH4 competitive adsorption. Notably, CH4 adsorption within the intrinsic pore was virtually absent, as observed in the simulation movie outputs from RASPA. This result indicates CO2's energetic preference for adsorption within this triangular pore structure, illustrating the structure's specificity and effectiveness for selective CO2 adsorption.
According to the synthetic method reported in the literature,53,54 we successfully prepared the rigid triangular macrocycle NDI-Δ by reacting naphthalenediimides (NDIs) with (RR)-trans-1,2-cyclohexanediamine. However, we were unable to obtain the γ-NDI-Δ polymorph under the same crystallization conditions after many attempts. Instead, a new phase, which we named γ′-NDI-Δ, with reasonably good crystallinity was obtained (Fig. S6†). CO2 and CH4 adsorption/desorption isotherms revealed relatively low adsorption capacities of 0.31 mmol g−1 and 0.08 mmol g−1, respectively (Fig. S7†). This lower capacity may be attributed to differences in molecular packing between γ′-NDI-Δ and γ-NDI-Δ, where the imperfect packing in γ′-NDI-Δ likely hinders access to adsorption sites as expected in γ-NDI-Δ from the database. Despite the low adsorption capacity, the breakthrough experiments of γ′-NDI-Δ at 1 bar and 298 K demonstrated promising CO2/CH4 selectivity. The breakthrough times were 14 min g−1 for CH4 and 20 min g−1 for CO2 (Fig. S8†).
Further efforts to obtain additional NDI-Δ phases were made. A second phase, named NDI-Δ-Evap (Fig. S6†), was prepared by evaporating the corresponding solution in CH2Cl2. This phase exhibited slightly improved gas adsorption performance compared to γ′-NDI-Δ (Fig. S7†). A third phase, NDI-Δ-CH2Cl2, previously reported in the literature,54 was obtained by slow diffusion of CH3OH into a CH2Cl2 solution of NDI-Δ. The PXRD patterns of the synthesized NDI-Δ-CH2Cl2 matched well with simulated results from single-crystal data, confirming its phase purity (Fig. S9†). Gas adsorption studies showed that NDI-Δ-CH2Cl2 displayed decent CO2 adsorption capacities of 1.84 mmol g−1 (273 K) and 1.24 mmol g−1 (298 K) (Fig. 5d), despite becoming amorphous after activation.
The isosteric heat of adsorption (Qst) at zero coverage was 44.05 kJ mol−1 for CO2 and 33.89 kJ mol−1 for CH4, indicating the inherent selectivity of NDI-Δ toward CO2. More importantly, dynamic breakthrough experiments with a CO2/CH4 (1
:
1, v/v) mixture at 1 bar and 298 K (Fig. 5f) showed a 14 min g−1 breakthrough interval between CH4 (20 min g−1) and CO2 (34 min g−1), suggesting that the screened molecular adsorbent NDI-Δ can efficiently separate CO2 from CH4 at one breakthrough cycle. Additionally, the breakthrough curves remained stable over at least five cycles without significant changes in retention times, demonstrating the material's good reusability.
More importantly, we are able to perform high-throughput GCMC simulations based on the database to screen out target molecules for specific applications, such as competitive adsorption of CO2 and CH4. From the high-throughput computational screening of a dataset selected from the database, we can readily identify a macrocycle, NDI-Δ, which exhibited promising CO2 adsorption capacity and selectivity toward CO2 over CH4 even in its amorphous phase. It is worth noting that it would be very difficult to predict that such a macrocycle could efficiently separate CO2 from CH4 based solely on its chemical structure, even with the extensive knowledge of its rich host–guest chemistry. This demonstration underscores the database's potential to accelerate the discovery of functional molecular materials.
(1) Hydrogen atom addition: hydrogen atoms were added to structures missing hydrogen coordinates using BIOVIA Materials Studio,44 to complete the molecular framework.
(2) Missing atom inclusion: absent atoms were added to the structures to maintain structural integrity.
(3) Solvent and guest molecule removal: solvent and small guest molecules were extracted, except in instances where an equivalent clean macrocycle or cage with similar packing was already documented in the MCD.
(4) Atom position adjustment: atom positions were altered to correct elongated, unrealistic bonds, thereby preventing errors in subsequent optimization steps.
(5) Coordinate redundancy elimination: redundant coordinates within structures have been removed.
(6) Correction of element labels: wrong element labels within the structures have been corrected.
(7) Structure exclusion: structures that were too messy to correct, did not contain macrocycles or cages, contained fullerenes, rotaxanes or those with coordinatively bonded metal elements were excluded.
(8) Charge neutralisation: the net charges of the structures were evaluated and adjusted to neutrality.
:
50 CO2
:
CH4 molar ratio, under conditions of 1 bar and 298 K. In these simulations, the structures of the macrocycles and cages were treated as rigid bodies. The Lorentz–Berthelot mixing rules were applied for interaction potentials, and electrostatic interactions were accurately depicted using the coulomb potential and Ewald summation. Each simulation ran for 10
000 cycles for equilibration and 200
000 cycles for production, executing a range of movements including translation, rotation, regrowth, identity changes and swap moves.
Footnotes |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5sc01532d |
| ‡ These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2025 |