Mikhail
Suyetin
Institute of Nanotechnology, Karlsruhe Institute of Technology, P. O. Box 3640, 76021 Karlsruhe, Germany. E-mail: msuyetin@gmail.com
First published on 9th April 2021
Multiple linear regression analysis, as a part of machine learning, is employed to develop equations for the quick and accurate prediction of the methane uptake and working capacity of metal–organic frameworks (MOFs). Only three crystal characteristics of MOFs (geometric descriptors) are employed for developing the equations: surface area, pore volume and density of the crystal structure. The values of the geometric descriptors can be obtained much more cheaply in terms of time and other resources compared to running calculations of gas sorption or performing experimental work. Within this work sets of equations are provided for the different cases studied: a series of MOFs with NbO topology, a set of benchmark MOFs with outstanding methane storage and working capacities, and the whole CoRE MOF database (11000 structures).
Computational approaches have played a great role in studying MOFs and other porous materials for gas storage: grand canonical Monte Carlo methods (GCMC) for revealing adsorption isotherms, classical molecular dynamics (MD) for studying gas diffusion in MOFs, density functional theory for obtaining favourable binding sites and crystal structure change, etc. Material simulations have been predicting novel advanced not yet synthesized MOFs since 2004,26 when the design of new IRMOF materials was proposed for methane storage using computer simulations. The large-scale screening of hypothetical MOFs has been performed computationally by creating porous structures from chemical databases of building blocks, which are based on known MOFs.27 More than 300 MOFs have been identified with outstanding methane storage capacities and structure–property relationships were also revealed. A total of 122835 MOFs have been designed computationally to reveal both the total methane uptake and working capacity as a function of the void fraction, volumetric surface area and heat of adsorption, identifying a maximal working capacity at room temperature.28
A lot of effort has been put into creating a database of structures for computational screening, which are free from solvents, disorders, etc. The Computation-Ready, Experimental Metal–Organic Framework Database (CoRE MOF Database) contains over 14000 porous, three-dimensional MOFs.29 It is very convenient for potential users that pore analytics and physical property data are included as well.29 Another database is a subset of the Cambridge Structural Database (CSD), where 69666 MOF materials were identified by the Cambridge Crystallographic Data Centre (CCDC).30 A total of 13512 MOFs with 41 different edge-transitive topologies were generated using the ToBaCCo code, which employed a reverse topological approach.31 The structure files of MOFs obtained experimentally and published can contain some disorders, solvent molecules, etc. which are not suitable for computational work.
Recently, machine learning (ML) has become an important tool in designing new materials, leading materials chemistry towards more rational design.32 The World Economic Forum identified the union of big data and artificial intelligence as the Fourth Industrial Revolution, which can dramatically improve the research process.33 The high-throughput screening (HTC) of databases employing MD or GCMC is computationally very intensive and demanding. On the one hand, the employment of ML methods can significantly decrease the complexity of computational screening, and at the same time provides results with high accuracy, and on the other hand, the existing results of the HTC of databases are an outstanding opportunity for employing ML methods to reveal desired properties. ML has been employed to perform an analysis of the chemical diversity of MOFs.34 ML methods are significantly developed nowadays, but in the case of MOFs there is still a lot of work that needs to be done. ML is very promising approach in discovering new MOFs, revealing structure–property relationships. Predictive algorithms are employed to help and sometimes replace simulations. There are some nice examples of employing ML for discovering the properties of porous materials: an artificial neural network has been used to identify the performance limits of methane storage in zeolites, and revealed good agreement in the methane working capacity of the top structures between the zeolites and the structures generated by the neural network.35 A MOF generation approach based on ML was discovered in ref. 36 by devising and constructing the supramolecular variational autoencoder (SmVAE). SmVAE is employed for the ‘‘inverse design”, where MOFs with the best performance are identified and generated. A generative adversarial artificial neural network has been created to produce 121 crystalline porous materials, employing inputs in the form of energy and material dimensions.37 Finally, there has been a nice overview of ML algorithms for the chemical sciences.38
Experimental and/or computational work needs to be done to reveal the desired property of a structure. In case of identifying the sorption properties of experimentally obtained MOFs, computer simulations need to be done or specific equipment should be employed. This is acceptable for studying several crystals, but in the case of revealing the properties of a big family of structures conventional approaches are too costly in terms of time, money, workforce, etc. ML can help a lot for revealing the properties of MOFs, saving both experimental and computational efforts. In spite of the wide employment of ML techniques, they have been used very rarely for developing equations describing the sorption properties (including the working capacity) of MOFs.
Linear regression is a supervised ML algorithm. Simple linear regression employs the slope–intercept form, where x is the input data (independent variable), f(x) is the prediction (dependent variable), k is the slope coefficient for the x variable and b is the y intercept, which are adjusted via learning to give the accurate prediction: f(x) = kx + b. Multiple linear regression is the most popular form of linear regression. Multiple linear regression is employed to show the relationship between one dependent variable and two or more independent variables: f(x, y, z) = ix + jy + kz + b, where x, y and z are the independent variables, f(x, y, z) are the dependent variables and i, j, k and b are the adjustable parameters.
The main goal of this work is to show that multiple linear regression analysis is an outstanding tool for revealing the structure–property relationships of MOFs. More importantly, by employing multiple linear regression analysis analytical equations can be developed, showing that the methane total and working capacity values at different thermodynamic conditions can be calculated from three variables based on the crystal characteristics of MOFs (geometric descriptors): surface area, pore volume and density. The values of the descriptors can be obtained routinely and very quickly in comparison to GCMC simulations or experimental work by well known and highly efficient simulation packages such as Poreblazer,39,40 Zeo++,41etc.
Therefore, if an experimentalist or theoretician has a file with a crystal structure, or a bunch of files, she/he can easily obtain the methane total and working capacities by simply employing the equations. The performance of the model designed can be measured by several characteristics: the mean absolute error (MAE), mean square error (MSE), root-mean-square error (RMSE) and the coefficient of determination,R2, as described below, where xi is obtained from experiments or GCMC simulations, yi is the value predicted by multiple linear regression and ȳ is the average of the predicted values.
It should be noted that a higher value of R2 and lower values of MAE, MSE and RMSE show the better accuracy of the ML model used. R2 is in the range beween 0 and 1, where 1 shows that the prediction is performed without any error from the set of geometrical descriptors and 0 means that the prediction cannot be performed by any of the geometrical descriptors.
The following parameters (descriptors) of the crystal structures of MOFs of different sizes are used to develop the equations for predicting the gravimetric total uptake and working capacity of methane sorption obtained at a pressure range of 65–5 bar at a temperature of 298 K: surface area (Sa), density of a crystal (Dc) and pore volume (PV). The data of the crystal structure parameters, as well as the values of the total uptake (at a pressure of 65 bar) and working capacity (at a pressure range of 65–5 bar) at a temperature of 298 K are summarized in Table 1.
MOFs | Sa, m2 g−1 | Pv, cm3 g−1 | Dc, g cm−3 | Total uptake, cm3 g−1 | Working capacity, cm3 g−1 | Ref. |
---|---|---|---|---|---|---|
ZJU-5a | 2829 | 1.08 | 0.679 | 367 | 277 | 46 |
UTSA-75a | 2836 | 1.06 | 0.698 | 360 | 275 | 46 |
UTSA-76a | 2820 | 1.09 | 0.699 | 368 | 282 | 46 |
UTSA-77a | 2807 | 1.08 | 0.690 | 361 | 272 | 46 |
UTSA-78a | 2840 | 1.09 | 0.694 | 363 | 275 | 46 |
UTSA-79a | 2877 | 1.08 | 0.697 | 366 | 277 | 46 |
NOTT-101a | 2805 | 1.08 | 0.688 | 344 | 263 | 46 |
UTSA-111a | 3252 | 1.229 | 0.590 | 397 | 309 | 25 |
UTSA-20a | 1620 | 0.66 | 0.909 | 278 | 206 | 47 |
UTSA-88a | 1771 | 0.685 | 0.860 | 288 | 215 | 48 |
UTSA-80a | 2280 | 1.03 | 0.694 | 336 | 251 | 49 |
PCN-14 | 2000 | 0.85 | 0.829 | 334 | 228 | 47 |
NOTT-100a | 1661 | 0.677 | 0.927 | 248 | 150 | 50 |
NOTT-102a | 3342 | 1.268 | 0.587 | 404 | 327 | 50 |
NOTT-103a | 2958 | 1.157 | 0.643 | 367 | 285 | 50 |
NOTT-109a | 2110 | 0.850 | 0.790 | 306 | 215 | 50 |
NJU-Bai 41 | 2370 | 0.92 | 0.741 | 331 | 232 | 4 |
NJU-Bai 42 | 2830 | 1.07 | 0.693 | 356 | 278 | 4 |
NJU-Bai 43 | 3090 | 1.22 | 0.639 | 397 | 310 | 4 |
The equations developed are shown below:
Total_uptake = 181.726 + 0.02 × Sa + 138.777 × Pv − 38.763 × Dc |
Working_capacity = 266.409 + 0.038 × Sa + 23.013 × Pv − 176.169 × Dc |
The results obtained with multiple linear regression analysis show that for the family of MOFs with the same topology (NbO) R2 = 0.931 for the total uptake and R2 = 0.913 for the working capacity. Delightfully, the MAE is very small: 7.59 cm3 g−1 and 9.33 cm3 g−1 for the total uptake and working capacity, respectively. The RMSE shows moderate values: 10.56 cm3 g−1 and 12.31 cm3 g−1 for the total uptake and working capacity, respectively.
The main conclusion from this part is that from using only the crystal structure parameters of a series of MOFs with NbO topology anyone can calculate the methane working capacity and total uptake, using the equations developed, very easily and extremely fast with a high precision. This is extremely useful for discovering new structures and screening MOFs with the same topology. For example, a user can draw and optimize a MOF in Material Studio (or employing other programs), reveal the values of the geometrical descriptors using the simulation packages Poreblazer or Zeo++, or tools in Material Studio, then use the equations to get the accurate values of the methane total uptake and working capacity. Of course, the same approach can be expanded for other MOFs with other topologies. Once equations are developed, there is no need to run simulations and/or perform experimental work.
MOFs | Total uptake, cm3 g−1 | Working capacity, cm3 g−1 | Ref. | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Sa, m2 g−1 | Pv, cm3 g−1 | Dc, g cm−3 | 240 K | 270 K | 298 K | 240 K | 270 K | 298 K | ||
NiMOF-74 | 1350 | 0.56 | 1.195 | 251 | 232 | 210 | 67 | 89 | 108 | 47 |
UTSA-20 | 1620 | 0.66 | 0.909 | 319 | 297 | 253 | 129 | 182 | 187 | 47 |
MOF-505 | 1661 | 0.68 | 0.927 | 300 | 270 | 248 | 97 | 121 | 150 | 50 |
PCN-14 | 2000 | 0.85 | 0.829 | 362 | 326 | 277 | 145 | 185 | 189 | 47 |
HKUST-1 | 1850 | 0.78 | 0.883 | 369 | 341 | 302 | 120 | 190 | 215 | 47 |
NOTT-109 | 2110 | 0.85 | 0.790 | 362 | 341 | 306 | 138 | 199 | 215 | 50 |
NU-135 | 2530 | 1.02 | 0.751 | 387 | 357 | 306 | 188 | 232 | 226 | 54 |
UTSA-80 | 2280 | 1.03 | 0.694 | 442 | 390 | 336 | 207 | 258 | 251 | 49 |
NOTT-101 | 2805 | 1.08 | 0.684 | 472 | 414 | 346 | 231 | 276 | 265 | 50 |
UTSA-76 | 2820 | 1.09 | 0.699 | 491 | 431 | 368 | 240 | 293 | 282 | 55 |
NOTT-103 | 2958 | 1.16 | 0.643 | 496 | 440 | 367 | 252 | 306 | 285 | 50 |
NOTT-102 | 3342 | 1.27 | 0.587 | 532 | 477 | 404 | 310 | 354 | 327 | 50 |
NOTT-122a/NU-125 | 3120 | 1.29 | 0.578 | 519 | 469 | 401 | 296 | 334 | 317 | 56 and 57 |
NU-800 | 3149 | 1.34 | 0.546 | 559 | 452 | 359 | 410 | 370 | 310 | 58 |
ZJU-36 | 4014 | 1.60 | 0.496 | 617 | 514 | 409 | 458 | 425 | 353 | 59 |
NU-140 | 4300 | 1.97 | 0.43 | 728 | 591 | 465 | 551 | 484 | 395 | 60 |
NU-111 | 4930 | 2.09 | 0.409 | 856 | 694 | 504 | 653 | 584 | 438 | 47 |
Al-soc-MOF-1 | 5585 | 2.30 | 0.34 | 882 | 712 | 579 | 697 | 632 | 518 | 24 |
In contrast to the previous case considered, several equations are developed for the set of benchmark MOFs, which have different topologies, metals in nodes, etc., therefore the values of R2, MAE and RMSE are expected to be more moderate. The equations are developed for the quick and accurate prediction of the methane uptake and working capacity employing only three crystal characteristics of MOF (descriptors): surface area, pore volume and density of the MOFs. The equations are shown below:
At 298 K
Total_uptake = 233.476 + 0.062 × Sa + 2.595 × Pv − 87.024 × Dc |
Working_capacity = 189.033 + 0.061 × Sa + 4.318 × Pv − 135.846 × Dc |
At 270 K
Total_uptake = 200.152 + 0.057 × Sa + 99.547 × Pv − 78.894 × Dc |
Working_capacity = 53.433 + 0.059 × Sa + 121.325 × Pv − 94.025 × Dc |
At 240 K
Total_uptake = 106.071 + 0.040 × Sa + 249.422 × Pv − 35.399 × Dc |
Working_capacity = −224.435 + 0.007 × Sa + 379.715 × Pv + 50.945 × Dc |
The equations developed in this section will be an opportunity to estimate the methane total uptake and working capacity of newly designed MOFs. The values of R2, MAE and RMSE show the robustness of the models obtained. The coefficients of determination for the working capacity are: R2 = 0.979 at T = 298 K, R2 = 0.987 at T = 273 K and R2 = 0.990 at T = 240 K. The coefficients of determination for the total uptake are: R2 = 0.965 at T = 298 K, R2 = 0.980 at T = 273 K and R2 = 0.984 at T = 240 K. An interesting trend is observed: the lower the temperature, the higher the R2.
Working_capacity (35–5.8 bar) = 39.989 + 0.026 × Sa + 12.789 × Pv − 14.862 × Dc |
A much smaller set of MOFs can be studied for the comparison of the models’ performances. The following equation is developed by employing 500 randomly chosen MOFs for training from the CoRE MOF database:
Working_capacity_500 MOFs (35–5.8 bar) = 51.567 + 0.022 × Sa + 13.989 × Pv − 20.526 × Dc |
The employment of 500 MOFs for training shows that the R2 is almost the as that when employing 11000 MOFs. The MAE and RMSE are a little bit bigger.
Also, the equation developed via training with 500 MOFs can be applied to the rest of the CoRE MOF database, treating it as a test set. The following values of the model’s characteristics are obtained:
R2 = 0.895; MAE = 9.69 cm3 g−1; MSE = 165.29; RMSE = 12.86 cm3 g−1. |
These results show that the model’s characteristics are very close to those obtained via the employment of 11000 MOFs: the R2 is a bit smaller, while the MAE, MSE and RMSE are a little bit bigger.
The accuracy of the prediction may be further enhanced by implementing in the equation some more geometrical descriptors:61 void fraction, LCD (largest cavity diameter) and PLD (pore limiting diameter)
Working_capacity (35–5.8 bar) = 36.191 + 0.023 × Sa − 3.405 × Pv − 14.153 × Dc + 35.154 × Vf + 0.689 × LCD − 0.695 × PLD |
Interestingly, no increase in the accuracy of the model is observed by employing the equation with six descriptors: the R2 values are the same and the MAE, MSE and RMSE are almost the same. In the case of the equations with three descriptors, these characteristics are even a little bit better.
A set of equations is developed for predicting the methane total uptake and working capacity for MOFs with the same topology (NbO, in the case studied). The model exhibits very high accuracy. Several equations are developed for the set of benchmark MOFs, which have different topologies, metals in nodes, etc. The values of R2, MAE and RMSE show the robustness of the models obtained, for example for the working capacity: R2 = 0.979 at 298 K, R2 = 0.987 at 273 K and R2 = 0.990 at 240 K. The GCMC results from the CoRE MOF database are considered for developing equations for predicting the methane working capacity which take into account only three parameters. The R2 = 0.899, MAE = 9.23 cm3 g−1 and RMSE = 12.60 cm3 g−1. The further enhancement of the model by employing more descriptors does not lead to increase in the accuracy of the model. This is very convenient for both experimentalists or theoreticians to easily obtain the methane total and working capacities via employing equations and just having a file of a crystal structure(s) and the values of the three descriptors.
This journal is © The Royal Society of Chemistry 2021 |