Sushil
Kumar
,
Gergo
Ignacz
and
Gyorgy
Szekely
*
Advanced Membranes and Porous Materials Center, Physical Science and Engineering Division (PSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia. E-mail: gyorgy.szekely@kaust.edu.sa; Tel: +966128082769 Web: http://www.szekelygroup.com
First published on 8th October 2021
Covalent organic frameworks (COFs) have attracted considerable interest owing to their structural predesign ability, controllable chemistry, long-range periodicity, and pore interior functionalization ability. The most widely adopted solvothermal synthesis of COFs requires the use of toxic organic solvents. In line with the 5th principle of green chemistry and the United Nations’ 12th Sustainable Development Goal, we aim to mitigate the adverse effect of solvents on COF synthesis. Here we have investigated twelve green solvents for the sustainable synthesis of five series of COFs using the solvothermal approach. Crystallinity and porosity were used to assess the quality of the obtained COFs. In addition, the suitability of the solvents in the synthesis of crystalline and porous COFs was investigated and color-coded for the final green assessment. In particular, γ-butyrolactone (for TpPa, TpBD, and TpAzo), para-cymene (TpAnq), and PolarClean (TpTab) were found to be excellent green solvents to produce high-quality COFs. For the first time, we successfully used quantitative structure–property relationships in combination with machine learning approaches to predict both the surface area and crystallinity of COFs using the structure of the solvents and COF building blocks.
In the past few years, we have witnessed a significant development in synthetic methods for the preparation of highly porous and long-range ordered COFs. The methods include solvothermal synthesis, mechanochemical grinding,3,4 ionothermal synthesis,5 microwave-assisted synthesis,6 interfacial polymerization,7,8 and microfluidic synthesis.9 Among these methods, the solvothermal approach has been widely used in the construction of high-quality COFs.10 This approach relies on solvent selection for reaction media. In particular, the nature of solvent, the solubility of precursors, temperature, and the duration of the reaction are considered as crucial factors, which affect the crystallinity and porosity of the resultant COFs. The solvothermal preparation of COFs often requires a combination of two organic solvents (e.g., mesitylene–dioxane) in a particular ratio. This method is not applicable for all types of COFs. Moreover, solvent mixtures are more difficult to recover and recycle, and therefore undesired from a green chemistry perspective.
The synthesis of newly designed COFs requires a cumbersome screening of organic solvents and their mixtures. The limited solubility of the precursors and their rate of diffusion in the selected solvent system significantly affect the crystallization process and ultimately, the quality of the obtained COFs. Therefore, understanding the structure–property relationship of the solvent–precursor nexus is crucial in the synthesis of high-quality COFs. The reaction medium has substantial contribution to the sustainability of synthetic processes.11 The application of green solvents in the solvothermal synthesis of COFs is scarce. Banerjee and co-workers successfully synthesized COFs in water using the dynamic covalent chemistry approach.12 Water is considered as an environmentally friendly reaction medium. The resulting COFs are porous and crystalline in nature. COFs with high surface areas were successfully prepared in ethanol, which is considered a green solvent.13,14 Deep eutectic solvents as green media for the synthesis of 2D and three-dimensional (3D) COFs based on Schiff-base chemistry were also reported. However, the porosity and crystallinity of the prepared COFs were compromised.15
Identification of efficient green solvents in the synthesis of COFs is a tedious task that is commonly performed via trial-and-error experimentation. However, the quantitative structure–property relationship (QSPR) tool, which is an emerging technique among the major computational methods in modern molecule design, could offer a resource and time efficient solution.16 QSPR analysis refers to any practical approach by which the chemical structure is quantitatively correlated with the physicochemical properties of the molecule or material. QSPR models have already found application in assessing the potential impacts of chemicals and nanomaterials on both living and synthetic systems. There have been no QSPR or any related quantitative structural–activity relationship-based studies on the property prediction of COFs.
In this work, we surveyed various green solvents as reaction media for the synthesis of high-quality COFs. We prepared five series of β-ketoenamine-based COFs in twelve different green solvents (Fig. 1). We identified the best solvent for each series that is suitable to deliver highly porous and crystalline COFs. The QSPR was used to identify the key structural elements affecting the surface area and to determine if the resultant COFs are crystalline or amorphous by analysing the solvent–precursor pairs. We used the partial least squares (PLS) regression tool and 11 different machine learning (ML) algorithms for binary classification. Our study initiates the exploration of the field of COFs by design using advanced molecule design tools.
Fig. 1 Schematic representation of COF synthesis using Tp trialdehyde and five different amines in green solvents. |
Binary classification was used for the prediction of the crystallinity of the COFs. The dataset consisted of the same descriptors that were used in the PLS dataset. The binary outcome of the reaction was “1” if the reaction resulted in a crystalline COF, and “0” if the reaction did not occur or resulted in an amorphous COF or a polymer. The final dataset contained 60 binary-valued outcomes and descriptors. The binary classification problem was chosen over regression analysis for the reaction outcome due to the small dataset and the missing correlation between the surface area, crystallinity, and yield. The dataset was split into training and test datasets in an 85:15 ratio. It was necessary to perform principal component analysis (PCA) and Y-scrambling (Y-randomization) due to the high dimensionality and the small dataset, respectively.20 The algorithms employed were k-nearest neighbours, sigmoid support vector machine (SVM), radial basis function (RBF) SVM, polynomial SVM, decision tree, random forest, artificial neural network, adaptive boosting (AdaBoost), naïve Bayes, and quadratic classifier algorithms (section S1, ESI†). All Python calculations were performed on 100% sustainable Google Cloud Platform.21
Fig. 2 Examples of experimental PXRD patterns and SEM images of TpPa-GBL, TpBD-GBL, TpAzo-GBL, TpAnq-PCl, and TpTab-PCl COFs. |
The FTIR spectra of the COFs are in good agreement with those reported in the literature.4 The presence of strong peaks at 1250 cm−1 for ν(C–N) and 1575 cm−1 for ν(CC) confirmed that the precursors, i.e., Tp and amines, were covalently linked together via the formation of β-ketoenamine moieties in the framework (section S6, ESI†). We have performed 13C CP-MAS solid-state NMR studies to explore the composition of the framework structure. The carbon signal present at approximately 180 ppm was assigned to the keto group, while the peak at 100 ppm corresponded to the CC bond adjacent to the keto group (section S7, ESI†).
The chemical structure of the COFs was characterized using XPS profiles (section S8, ESI†). For example, the TpPa COF showed three intense peaks at 284.62, 399.63, and 530.62 eV, which correspond to C (1s), N (1s), and O (1s) signals, respectively. Detailed analysis of the high-resolution XPS profile is shown in Fig. S25, ESI.† The high-resolution profile for C (1s) displayed three main peaks and one additional π–π* satellite peak. The peak at 284.13 eV corresponded to the CC bond of the aromatic rings, where the shoulders at 285.36 and 287.01 eV were assigned to the C–O and CO bonds, respectively, present in the framework backbone. The high-resolution profile for N (1s) showed a peak at 399.63 eV, which corresponded to the C–NH moiety of the ketoenamine bond of the framework. In the high-resolution profile of O (1s), the peak signals that appeared at 530.49 and 532.21 eV were assigned to the CO and C–O bonds, respectively. For the detailed analysis of the XPS profiles, refer to section S8 in the ESI.† All the COFs exhibited good thermal stability up to approximately 350 °C (section S9, ESI†). The COFs displayed a sheet texture with lateral dimensions of 1–5 μm for all the COFs (section S10, ESI†).
The permanent porosity of the COFs was evaluated by measuring the nitrogen gas uptake at 77 K (section S12, ESI†). The obtained BET surface area (SABET) of the COFs spanned across a wide range of 30 to 1674 m2 g−1 depending on the green solvent employed (Fig. 3). Among all the COFs reported in this work, TpAzo-GBL exhibited the highest surface area of 1674 m2 g−1, followed by 1046 (TpBD-GBL), 1036 (TpTab-PCl), 1033 (TpAnq-Cym), and 888 (TpPa-GBL). Note that most of the COFs synthesized here exhibited improved surface area values as compared to the ones reported in conventional organic solvents.2 The pore size distributions for the as-synthesized COFs are presented in section S13 (ESI)† and were found to be approximately 15 Å (TpPa), 18 Å (TpBD), 22 Å (TpAzo), 18 Å (TpAnq), and 14 Å (TpTab), which were calculated on the basis of the NLDFT model.
Fig. 1 shows the list of the green solvents used for the synthesis of the COFs. Solvents can be classified into seven classes: carbonates, esters, ethers, sulfites, alcohols, aromatic solvents, and aprotic solvents. A color-coding system was introduced in the GlaxoSmithKline and CHEM21 solvent selection guides,22–24 which were successfully used to describe the sustainable synthesis of UiO-66.25 We employed the same color-coding system in this work (section S14, ESI†). The column “overall green assessment”, which shows the color code for the green solvents utilized for the synthesis of the COFs, is based on the solvent greenness mentioned in the solvent selection guides (section S14, ESI†). The color codes for boiling point, viscosity, the presence of a characteristic PXRD peak (corresponds to diffraction from 100 planes), and SABET column are defined according to the ranges mentioned in Table S14, ESI.† The conventional solvents reported for the synthesis of COFs were also included as a reference for comparison.
The color codes for the last two columns define the rank by default and ranking after discussion. The column named as “rank by default” indicates the composite color extracted from the combined evaluation of solvent as well as the COF properties. Owing to the prime importance of the crystallinity and surface area of the COFs in a wide range of applications, the final color code in the “rank by default” column is dominated by the porosity of the COFs. Finally, the color code in the column “ranking after discussion” indicates the compatibility of the employed solvent and has been interpreted after an overall evaluation of solvent properties in the generation of crystalline and porous COFs. In general, the green code denotes efficient solvents with minor issues, the yellow code for solvents that can be used but are found to be less efficient, and the red code for solvents that are either not recommended (according to solvent selection guides) or resulted in very low crystalline porous COFs.
To assess the suitability of green solvents in the preparation of high-quality COFs, we calculated the relative SABET, relative crystallinity, and relative yield for the COFs. As shown in Fig. 4a, the TpPa, TpBD, and TpAzo COFs synthesized in GBL displayed high BET surface area values. In contrast, in the case of the TpAnq and TpTab COFs, the Cym and PCl solvents were found to be efficient in delivering highly porous COFs. In terms of the crystallinity of the COFs, the results were quite vague and the data points were scattered all over the plot (Fig. 4b). All the solvents afforded relatively moderate to low crystalline COFs. This suggests difficulty in correlating the crystallinity of the as-synthesized COFs with respect to the solvents used. A similar kind of observation was made with the relative yield plot (Fig. 4c); the data points were randomly distributed across the plot, making it difficult to directly correlate with the COFs synthesized in this study. For example, PC resulted in high yields for TpBD and TpAzo; however, it afforded moderate to low yields of other COFs. In other words, on the basis of relative crystallinity and yield, it is difficult to obscure a strong correlation of these COF properties with the solvents employed.
Fig. 4 (a) Relative surface area, (b) relative crystallinity, and (c) relative yield of the TpPa, TpBD, TpAzo, TpAnq, and TpTab series of COFs synthesized in twelve green solvents. The value provided in parenthesis along the x-axis denotes the maximum value of (a) BET surface area (m2 g−1), (b) crystallinity, and (c) yield (%) used in the calculations (section S4, ESI†). |
To address this problem, for the very first time, we utilized an ML approach to deduce the structure–property relationship between the solvents and resultant COFs. The surface area of the COFs is co-dependent on the type of solvent(s) used. Thus, classical ab initio DFT calculations would require overly complex methods to quantify the properties of COFs.26 To overcome the issues with solvent dependency, we used QSPR computational tools to predict the surface area and to verify if the resultant COF can be synthesized in the crystalline form. We hypothesized that by determining the structure of the solvent and the structure of the COF, a predictive relationship could be drawn while other parameters can be kept constant. Using a dataset with 60 points with high-capacity ML and deep learning methods remains a challenge since they generally require a large amount of data to obtain good predictive results. Using the QSPR approach, we developed a quantitative structural–property relationship to predict the key structural elements necessary to generate high surface area and crystalline COFs by analyzing the solvent–amine precursor pairs. Initially, a cross-correlation analysis between the obtained results was necessary to filter out relationships across the surface area, crystallinity, and yield. No direct correlation for the highly scattered, randomly distributed points was observed for the yield-surface area results (Fig. S54a, ESI†). Similarly, the crystallinity-yield (Fig. S54b, ESI†) and the crystallinity-surface area (Fig. S54c, ESI†) datasets did not reveal any correlation. The non-correlated data indicate that, for example, a COF obtained in a high yield does not necessarily have a high surface area. Having no correlations across the results suggests that the surface area, crystallinity, and yield data need to be predicted separately; thus, none of them could be obtained one from the other.
With only 43 measured surface area data points and 2639 calculated descriptors (predictor features), the original dataset was high-dimensional and prone to suffer from dimensionality issues, making the application of classical prediction methods challenging.27 To overcome the issues related to high dimensionality datasets, PLS regression and PCA were applied to the dataset. PLS regression and PCA are useful when the number of predictor features is high, and they are possibly cross-correlated. Using a PLS model, the response features were predicted from a large set of predictor features by reducing the set of the latter to a smaller set of uncorrelated components (projection to latent structures). In the model-building phase, the original dataset contained a matrix of 3672 molecular descriptors of the used solvents and amine precursors as the X matrix, and the surface area and the binary results of the corresponding COF as Y variables as a vector. The first two PLS components were plotted against each other, and the outliers were removed based on a 95% confidence ellipse. The resultant matrix of (392631) was split and standardized.
The optimal number of PLS components was found to be 3 with seven-fold cross-validation and a blind thickness of 1 based on the average minimum of the RMSECV values. The RMSEC and RMSECV values were found to be 119 and 174 from the Y-scrambling test, respectively. In contrast, RMSEP was 199 based on the Y-scrambling test. The insignificant difference between the cross-validation and the test R2 score values indicates no overfitting. The prediction error agrees well with the measured general error of the surface area of the microporous materials.28Fig. 5 shows general model training and test data with the corresponding trend line. The error of the surface area was found to increase with an increase in the surface area. In general, the model shows a strong correlation between the predicted and measured surface area. Based on the VIP scoring, 196 descriptors were selected (refer to VIP scoring, section S15, ESI†) from descriptors with the highest VIP scoring related to the amine precursors’ and the solvents’ electronic structures. From the best 196 descriptors 90 of them were ligand descriptors (45%), which means that the BET surface area is dependent on the structure of both the solvent and the ligand. Interestingly, out of the top 50 descriptors, only 12 belonged to the ligands (24%), and the first ligand descriptor was only the 17th from the absolute value sorted PLS prediction list. The highest scoring descriptors belonged to hybridization factor, spatial autocorrelation values (Moran's index), electrotopological state indexes and the logP of the solvent. The highest scoring ligand descriptor was also a spatial autocorrelation index (electronegativity weighted Geary index). Fig. S1† shows the VIP scoring in decreasing absolute order. There was no single outstanding descriptor with several mid-range VIP scores, emphasizing the complexity in surface area prediction. For the captured variance values and model parameter diagram, refer to section S15 (ESI).† The crystallinity of the COFs depends on the PXRD measurement parameters, while yield results generally have a high error. Thus, the yield and crystallinity results were combined and simplified for use in the prediction. The binary classification problem was created by combining the yield and crystallinity results into simple crystalline COF/amorphous COF data. The original dataset contained a matrix of (602631) molecular descriptors of the used solvents and amine precursors as the X matrix and the binary values of crystalline/amorphous COFs as the Y vector. The results of the binary classification ML algorithms and classical statistical methods are shown in Fig. 5. The performance of the naïve Bayes and QDA algorithms was better than those of the SVM, decision tree, random forest, artificial neural network, and boosting algorithms. This difference can be attributed to the insufficient data when the ML algorithms tend to underperform the classical statistical methods. Both the naïve Bayes and QDA reached an accuracy score of 0.87. For details of each algorithm, refer to section S15, ESI.†
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1gc02796d |
This journal is © The Royal Society of Chemistry 2021 |