Shivanshu
Shekhar
a and
Chandra
Chowdhury
*b
aDepartment of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600036, India
bInstitute of Catalysis Research and Technology (IKFT), Karlsruhe Institute of Technology (KIT), 76344 Eggenstein-Leopoldshafen, Germany. E-mail: pc.chandra12@gmail.com
First published on 13th December 2023
Metal–organic frameworks (MOFs) have the capacity to serve as gas capturing, sensing, and storing systems. It is usual practice to select the MOF from a vast database with the best adsorption property in order to do an adsorption calculation. The costs of computing thermodynamic values are sometimes a limiting factor in high-throughput computational research, inhibiting the development of MOFs for separations and storage applications. In recent years, machine learning has emerged as a promising substitute for traditional methods like experiments and simulations when trying to foretell material properties. The most difficult part of this process is choosing characteristics that produce interpretable representations of materials that may be used for a variety of prediction tasks. We investigate a feature-based representation of materials using tools from topological data analysis. In order to describe the geometry of MOFs with greater accuracy, we use persistent homology. We show our method by forecasting the hydrogen storage capacity of MOFs during a temperature and pressure swing from 100 bar/77 K to 5 bar/160 K, using the synthetically compiled CoRE MOF-2019 database of 4029 MOFs. Our topological descriptor is used in conjunction with more conventional structural features, and their usefulness to prediction tasks is explored. In addition to demonstrating significant progress over the baseline, our findings draw attention to the fact that topological features capture information that is supplementary to the structural features.
Since metal–organic frameworks (MOFs) are synthesised modularly from metal centres and organic ligands, they are a novel class of functional porous crystalline solids with a wide range of controllable properties and a wide variety of chemical and structural forms.11,12 Theoretically, MOF materials can be designed in an almost infinite number of ways, with the modification of metal ions/clusters and organic ligands enabling for the tailoring of their porosity and pore chemistry for an extensive range of conceivable uses.13,14 For a variety of adsorbent applications, MOFs have shown promise.15,16 The BET surface area and porosity of MOFs are relatively large in contrast to other materials that may store hydrogen molecules, such as CNT, hydrides, zeolites, and clathrates. Opportunities in gas separation, catalysis, energy storage, and conversion have recently piqued interest in MOF-based materials. Predicting the adsorption and diffusion behaviour of guest species in nanoporous materials like MOF has benefited greatly from molecular simulations.17,18 They've made it possible to calculate things like Henry's coefficients, adsorption loads, and diffusion coefficients under different circumstances. However, this strategy is often restricted to tens of thousands of structures due to the high computational costs incurred by molecular simulations.
The field of machine learning (ML) is an exciting area to study for solving this problem. The screening of huge MOF databases and the prediction of their properties using ML algorithms are faster than using molecular simulations. The creation of an appropriate descriptor is crucial to the achievement of ML.19,20 Descriptors need to be able to instantly categorise nanoporous materials based on structural characteristics, as well as recognise their performance properties, like their adsorption properties for gas storage and separation. How to systematically characterise similarity of pore architectures is a critical topic in constructing a descriptor for nanoporous materials. Some common ways of describing nanoporous materials are by their pore volume, density, surface area, maximum included sphere, etc.21,22 Unfortunately, these descriptors are still not sufficient to identify the best materials, despite being easily calculable and having the potential to correlate with a material's performance. Because these descriptors are based on a collection of coefficients about pore topology, it is possible that they do not encode enough information to characterise pore structure because each coefficient only contains a portion of the pore's information and there is no established rule for how to combine them as a descriptor.
A novel descriptor for nanoporous materials has been presented recently, and it uses topological principles to assess the degree of similarity between pore patterns.23,24 High-dimensional data sets, beyond the capabilities of most traditional data-mining methods, are required to describe the full pore architecture of a material. Thus, topological data analysis (TDA) was used to examine the multi-dimensional pore structure data.
TDA is applicable to high-dimensional and noisy datasets because it analyses the data's “shape,” or its overall structure, rather than its individual features. To some extent, missing information can impact TDA's output, although the method is still useful for discriminating between data sets of varying shapes. Topology is the subfield of mathematics that studies the geometry of the structures. TDA analyses the “shape” of large and high-dimensional data to find meaningful structure and valuable subgroups within the data. TDA has been effectively implemented in a number of medical applications, such as the identification of a previously unknown sub-type of breast cancer by analysing patient gene expression data.25 In addition, TDA's scope of use has recently been expanded to include identification and characterisation in the field of materials research.26,27 Very recently some studies show the effective applications of TDA on the adsorption property of nanoporous materials.28–31 It has been shown that topological features as well as structural features enhanced the predicting power for some adsorption applications in nanoporous materials.
Inspired by the above studies, we have designed the deep learning (DL) framework for the prediction of H2 storage capacity in MOFs using topological as well as conventional features. With the advent of DL algorithms, it is expected that ML will help propel the material revolution to a paradigm of full autonomy within the next 5–10 years.32,33 One groundbreaking example is the ability to recreate a phase transition with only a few layers of convolutional neural networks (CNNs).34 DL algorithms were first designed for image identification. Examples include the ability of computer vision algorithms to swiftly and accurately analyse large numbers of images and extract relevant data.35 Since TDA involves visual data, using deep learning-based computer vision algorithms to examine the outcomes seems like a natural fit. This line of thinking led us to consider implementing a cutting-edge CNN-based architecture called residual network (ResNet), which was initially developed by a team of Microsoft researchers and has since shown significant gains in a number of different settings.36 In particular, inspired by the several successful implementations of the ResNet model in image detection, we conceived of employing ResNet in our TDA investigation to successfully extract crucial features from persistence images.37–40 To the best of our knowledge, this level of sophisticated DL models has never been used in such scenarios before. In our model first, we have predicted the hydrogen deliverable capacity using only conventional feature vectors. Then we incorporate the topological features along with conventional features and have shown a great improvement for predicting the target property. Our model is showing reasonable performances compared to other models exist in literature, albeit it should be emphasised that the datasets employed by the various studies are distinct.41,42 The manuscript is organised as follows: next section we will describe the computational setup and methodology which includes description of data, detailed description of creation of topological feature vectors, description about ResNet model used here followed by results and discussion in another section and in the last section we will describe about the conclusions drawn from this work.
A persistence diagram (PD) is a set of points in two dimensions, where each point represents a different value for the birth and death parameters of the homology groups in a given dimension. Dimension 0, 1, and 2 persistence diagrams for the distance function provide quantitative measures of the system's connectedness, as well as its gaps and empty spaces. But the persistence diagram isn't immediately useful in ML systems. The fundamental issue is that machine learning algorithms require fixed-dimensional vectors as input, but converting a PD to a vector using the coordinates of the points would result in a vector with dimensions double the number of points. Additionally, the PDs of various atomic configurations contain varying numbers of points, preventing them from being seen as identical vectors. Fig. 1 represents the schematic diagram for the creation of PD. Here, a point set of balls with increasing radius is presented by the hexagon figure and the two hexagons are connected through one point assuming the edge lengths of hexagons are one unit. Initially, each was separate point set and at 1 unit radius this becomes one connected component. So, there is one point in the PD which corresponds to 0D. Similarly, with a radius of around 1.75 unit, this becomes fully circularly connected and one point comes near this point which corresponds to 1D PD. The literature is plenty with suggestions for how to express the PD in a way that can be used by ML techniques. To the best of our knowledge, there is no method that is demonstrably superior to the rest; yet, one of the most widely used approaches is the persistence image (PI). For a more mathematical description of the persistence image, see ref. 43.
After transforming a PD's coordinate from birth time vs. death time to birth time vs. lifetime (i.e., death minus birth), the PI can be calculated. The longevity of an attribute is sometimes called its “lifetime.” The diagram's ensuing cloud of points is then transformed into a persistence surface, a kind of continuous surface map. This can be achieved, for instance, by augmenting each topological feature in the modified PD with a distribution function (for instance, a Gaussian) and summing to get a fully continuous surface. To further ensure that more persistent features are given more prominence in the final persistence image, a weighting function is often employed. At last, an integration over points in a predetermined grid pixelizes the representation of the persistence surface. Each pixel's location and value together give a representation of the PD that is amenable to machine learning methods since it is stable and insensitive to variations in the original PD (such as the number of loops).
The 72-layer architecture used in ResNet-18 is 18 layers deep. This network was built with the intention of efficiently supporting many convolutional layers with varying filter sizes and numbers of filters, followed by a global average pooling layer and a fully connected layer for classification. Layers such as convolution, maxpool, fully linked, and softmax are present in the network. It has been used for a wide range of computer vision tasks, including image classification, object detection, and semantic segmentation. ResNet-18 is a relatively shallow architecture compared to some of the larger ResNet variants like ResNet-50 or ResNet-101. However, it still achieves state-of-the-art performance on many computer vision benchmarks.44–46Fig. 4 represents the layered approach of ResNet-18 model.
Fig. 4 Overall representation of ResNet-18 model (upper panel) and Block XY of the model where X and Y represent the filter number and stride used respectively. |
In our method we used the ResNet-18 architecture to extract meaningful feature representation from the images, we use half the number of features used at each layer in the vanilla implementation and after Average pooling, we concatenate the image feature vector with our feature vector, and then pass this through multiple linear layers to get our final out. The XY in Block XY in the architecture diagram represents filter numbers and stride used respectively. We use a stride of one for the first block layer and 2 for the rest.
We finally apply the Average pool operation to the output to get a 256-dimensional feature map which is passed through two linear layers with ReLU activation to get 128 feature maps which is further concatenated with the feature vectors and downsized via the linear layers to finally output a single number. Table 1 details the obtainable number of trainable parameters and the corresponding convolutional layers.
Parameter | Value |
---|---|
Initial learning rate | 1 × 10−3 |
Dropout ratio | 0.2 |
Bias | True |
Batch size | 32 |
Number of epochs | 200 |
Activation | ReLU |
Number of layers | 21 |
Random state | 10 |
Batch normalization | True |
Trainable parameters | 3.222017 |
The ResNet-18 architecture first includes a convolutional layer of filter size 7 × 7 with a stride of 2 and padding of 3 this layer applies 32 such filters and these outputs are then passed to a batch normalization layer, whose output is again passed through a ReLU activation function followed by a Maxpool layer of kernel size of 3 and stride of 3 with padding of 1. The output obtained after doing the above is then passed through a series of Blocks which have 4 convolutional layers each of different filter numbers. Every block's first convolutional layer will have a stride of 2 other than the first block which has a stride of 1, the filter size for each of the blocks is 3 × 3 with stride and padding set to 1.
In any deep learning model, the hyperparameters play a crucial role in deciding how well the model performs. It is possible to fine-tune the model to achieve optimal performance by adjusting the hyperparameters, which include the training accuracy, training loss, validation accuracy, validation accuracy, batch size, learning rate, and the number of training epochs. In our model, we have used optimised hyperparameters.
Another important thing is, the model's effectiveness relies heavily on its learning rate. This slow training procedure means that the corresponding network weights won't be modified very quickly once they've been learned. The results of a higher learning rate, on the other hand, are likely to deviate from expectations. The learning rate is determined by optimising and minimising the loss function of the neural network. In the current experimental setup, we assume an initial learning rate of 0.001 and conclude that the models and learning rate have reached saturation after 200 training iterations.
We leverage the ResNet-18 architecture to incorporate the feature vectors along with the images, we first pass the images alone to the ResNet model and extract final features which have been passed through linear layers to restructure it to the required dimension, this image feature vector is then concatenated with the original feature vectors without any preprocessing to give us our final feature vector with is subsequently passed through a neural network that finally predicts one value. We used mean squared error (MSE) as our loss metric and which is determined by the equation given eqn (1) where yi, and m denote the true values, predicted values and the number of samples in the dataset, respectively. The model is trained end to end using images and feature vectors.
(1) |
We used 7 image features, we see that the matrix has relatively high values for only a few parts of the image features and input vector features which signifies that the neural network is focusing only on the important part of the image and feature vector and ignoring the rest. This does not represent overfitting as the weights are not too large. This signifies that the model is not totally ignoring the other features; rather, it is giving them a significantly lower weight. The top left side of the right panel image, which corresponds to the dog image feature vectors, appears very dense in comparison to the original persistence image feature vectors (top left side of the left panel image) when we pass random images like dog images instead of the actual persistence images. This means that the model is giving every image feature almost equal weight, i.e. it is not able to extract meaningful features, it is using all the features possible to bring the loss lower. To make up for the picture feature's increased importance, the model is assigning less weight to the normal feature vectors (top right side of the right panel), resulting in a sparser representation. We also get a higher loss for the dog images than the actual feature image. In order to better highlight the effect, we repeated the trials with 128 image features, and the results are displayed in the bottom panel of Fig. 5.
From Fig. 6, it is seen that the 3 features contribute mostly to the target property predictions. The testing loss while utilising 3 feature vectors is 0.059 respectively as shown in Table 2. Now, while using the persistence images as image feature vectors taking into consideration of all 0D, 1D and 2D topological features along with the traditional structural feature vectors, it is seen that test loss decreases to 0.043 value which shows the complementarity of image features along with conventional features. This is shown in Fig. 6. To confirm more about our prediction, we have used same dataset for different condition and the prediction here is the usable hydrogen storage capacity of MOFs at 77 K for the pressure swing between 100 and 5 bar (PS condition). Interestingly here also, using persistence images along with feature vectors we are getting a good accuracy than that of only using feature vectors. Table 2 summarizes the performances of feature vectors with that of persistence images. The more improvement using persistence images in PS condition than that of TPS condition is due to the fact that the functional relationships between output capacities (UG/UV) and input features under PS and TPS conditions are likely different, as was observed in previously reported structure(feature)–property(capacity) relationships.18
Fig. 6 Multivariate feature importance on predicting H2 deliverable capacity of MOFs at TPS condition (upper panel) and PS (lower panel) conditions. |
Condition | Method | Test loss |
---|---|---|
TPS | 7-features | 0.069 |
3-features | 0.059 | |
3-features+ PI | 0.043 | |
PS | 7-features | 0.076 |
3-features | 0.044 | |
3-features+ PI | 0.033 |
Predicting H2 storage in MOFs is a well-known challenge, and various prior literatures have attempted to do so using an ML model in an effort to reduce both computational and experimental costs.41,42,47 Among these, Ahmed et al.41 conducted an in-depth analysis drawing from several existing literatures to forecast under TPS and PS situations, and they concluded that the extensively randomised tree (ERT) model is effective. They calculated RMSE values of 0.18 and 0.23 (capacity unit) for UG under TPS and PS circumstances. By building an ML model on top of a TDA, we can forecast H2 storage with a reasonable accuracy. The prediction of H2 storage capacity in MOFs is an area of critical research, and in this work we are able, for the first time, to integrate topological information with the state-of-the-art ResNet model, which is a proclaimed advanced successful model in computer vision technologies.
MOF | Density | GSA | VSA | VF | PV | LCD | PLD | UG at TPS | UV at TPS |
---|---|---|---|---|---|---|---|---|---|
XAHQAA | 0.17 | 6250.1 | 1065.2 | 0.95 | 5.44 | 23.04 | 21.61 | 19.33 | 15.72 |
XAHPUT | 0.18 | 6301.4 | 1125.9 | 0.94 | 5.15 | 21.83 | 20.59 | 18.46 | 14.93 |
NIBJAK | 0.22 | 5417.2 | 1210.4 | 0.94 | 4.09 | 32.0 | 17.55 | 16.51 | 13.2 |
RAVXOD | 0.18 | 3299.1 | 590.9 | 0.88 | 5.02 | 71.64 | 71.5 | 15.45 | 12.66 |
RUTNOK | 0.24 | 6199.7 | 1493.0 | 0.9 | 3.73 | 24.61 | 14.65 | 14.89 | 12.05 |
Other than density, high gravimetric surface area in MOFs is crucial for hydrogen storage, particularly for mobile applications such as vehicles, where minimizing weight is essential for operational efficiency and range. This metric indicates more surface area per unit weight, providing more adsorption sites for hydrogen molecules and thus higher storage capacity. While volumetric surface area, which measures surface area per unit volume, is also important, it's secondary to gravimetric considerations in scenarios where weight has a greater impact on performance than the space occupied by the storage system. Consequently, materials with a high gravimetric surface area are favored for their ability to store a significant amount of hydrogen without adding substantial weight to the energy storage system. This is also confirmed from our study as evident from the values of GSA and VSA where both of which have higher values for all the best five MOFs. In addition to that it is seen that the best MOFs have high VF which can be explained from chemical intuition. MOFs with a high void fraction are particularly effective for hydrogen storage because the vast empty space translates to a higher surface area for the adsorption of hydrogen molecules. This structural characteristic ensures that there are ample sites for the physical adsorption of hydrogen, optimizing the storage capacity. Additionally, a high void fraction allows for more efficient diffusion of hydrogen molecules throughout the MOF, promoting uniform distribution and accessibility to adsorption sites. The delicate balance of having pores just slightly larger than the hydrogen molecules maximizes the van der Waals forces necessary for adsorption without overly restricting or loosening the hydrogen molecules. Consequently, MOFs with large void fractions are typically superior for hydrogen storage, providing both high capacity and fast kinetics, which are essential for real-world energy applications. MOFs with higher PV show higher hydrogen storage property and this is due to the due to the increased space available for hydrogen adsorption. From crucial inspection of Table 3, it is seen that MOFs with low LCD and high PLD show high hydrogen adsorption capacity. The LCD refers to the size of the largest void space within the MOF structure. A lower LCD indicates smaller, more compact cavities, which can be beneficial for maximizing surface area and adsorption sites within a given volume. Meanwhile, a high PLD, which represents the smallest diameter through which a molecule can pass to access a cavity, ensures that hydrogen molecules can easily enter and fill these cavities. This combination of a compact cavity structure with accessible pores allows for efficient storage of hydrogen. The small cavities increase the surface interaction with hydrogen molecules, boosting adsorption capacity, while the larger pore entrances facilitate easy diffusion of hydrogen into the MOF. To get an insight about the structure of the MOFs, we have included the structures of two MOFs, namely, XAHQAA and RUTNOK in Fig. 7.
Fig. 7 Structure of two best MOFs in terms of high hydrogen storage capacity. Left panel shows the structure of XAHQAA MOF and right panel corresponds to structure of RUTNOK MOF. |
Footnote |
† Electronic supplementary information (ESI) available: Figure showing the effect of training set size. See DOI: https://doi.org/10.1039/d3ma00591g |
This journal is © The Royal Society of Chemistry 2024 |