Rifan
Hardian
a,
Zhenwen
Liang
b,
Xiangliang
Zhang
b and
Gyorgy
Szekely
*a
aAdvanced Membranes and Porous Materials Center, Physical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia. E-mail: gyorgy.szekely@kaust.edu.sa; Web: https://www.szekelygroup.com Web: https://www.twitter.com/SzekelyGroup
bComputer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
First published on 9th October 2020
Materials discovery is rapidly revolutionizing all aspects of our lives. However, the design and fabrication of materials are often unsustainable and resource-intensive. Hence, we need a paradigm shift towards designing sustainable materials in silico. Machine learning, a subfield of artificial intelligence (AI), is emerging within the sustainability agenda because it promises to benefit science and engineering through improved quality, performance, and predictive power. Here we present a new methodology to extend the application of AI to develop materials in an environmentally friendly way. We demonstrate successful materials development by combining design of experiments with a new machine learning module that comprises a support vector machine, an evolutionary algorithm, and a desirability function. We use our AI-based method to realize the sustainable electrochemical synthesis of a ZIF-8 metal–organic framework and explore the hyperdimensional relationship between the synthesis parameters, product qualities, and process sustainability. The presented AI-based methodology paves the way for solving the challenge of the materials fabrication-sustainability nexus, and facilitates the paradigm shift from the wet lab to the wired lab.
Conventional materials discovery is performed by varying one factor at a time, which has multiple limitations and drawbacks. For instance, experimental datapoints are inherently limited, possible factor interactions are not revealed, and the optimum is rarely achieved, among other limitations. The design of experiments (DoE) is an alternative systematic approach that achieves a good balance between a reduced number of experiments and efficiency. DoE allows the factors to be varied and investigated simultaneously, thus accelerating the process of discovery and optimization, while conserving precious resources, labor, time, and ultimately resulting in a more sustainable approach.2–4 Since DoE utilizes a minimum number of experimental data, there is a need to develop and validate methodologies that can expand the limited experimental results into much larger virtual datasets. We hypothesize that machine learning (ML), a growing area of AI, will not only allow us to generate datasets with higher dimensional interactions but also help to predict the outcome of otherwise unrealized experiments (Fig. 1).
ML utilizes different statistical methods to learn from various data types via a set of algorithms.5 ML methods first learn the patterns and rules that underlie a dataset by evaluating a portion of that data and then build a model to make predictions.6 In fact, ML has already been used to discover and predict the performance of new materials and to optimize processes in the field of molecular and materials science.6,7 The use of ML to address environmental issues is also a rapidly growing area of research.8,9 For instance, Cao et al.2 fabricated photovoltaic devices using consecutive experimental design and ML. Support vector regression and random forest have been used to screen and identify the properties of metal–organic frameworks (MOFs) from structure databases.10 Furthermore, a combination of high-throughput molecular simulations with an artificial neural network algorithm has been implemented to predict the mechanical properties of MOFs.11
In this work, we describe a new ML module that synergistically combines DoE, a support vector machine, an evolutionary algorithm and a desirability function to sustainably obtain MOFs. Unlike multivariate optimization that often results in multiple solutions, this new strategy can propose a single solution, which is useful for effective decision-making. To demonstrate the new AI approach, we develop a sustainable electrochemical synthesis of MOFs.
MOFs are a class of porous materials built from metal cations bridged together by organic linkers to form a framework.12 The broad range of possible organic and inorganic components enables the design of MOFs with almost limitless structures. The interesting properties of MOFs, such as their flexible porous structure, high surface area, or high reactivity, make them of great interest for many potential applications such as gas storage, separation, water harvesting, catalysis, shock absorber, drug delivery, among many others.13,14 As they are considered the next chemistry powerhouse, there is a growing interest in the sustainable fabrication of MOFs.15–18
ZIF-8 is a type of MOFs that has received much attention because it has high thermal and chemical stability, a high surface area, unique properties such as gate opening due to linker twisting, and can undergo phase transformation.19–21 ZIF-8 is constructed from zinc atoms connected with 2-methylimidazole linkers, forming a porous structure with pore aperture and cavity size of 3.4 and 11.6 Å, respectively. In this work, ZIF-8 was selected because it is one of the most studied MOFs with various applications, however the developed AI methodology can be applicable to other MOFs.
Owing to the renewed interest in electroorganic chemistry to perform rapid, environmentally friendly, and cleaner synthesis,22,23 we decided to investigate the potential of electroorganic chemistry to synthesize ZIF-8. A plethora of parameters govern the electrochemical synthesis of MOFs such as voltage, current, and reaction time, as well as concentrations and ratios of the precursors and electrolytes.24,25 Another parameter that can be controlled in the synthesis of MOFs is solvent selection, which can be quantified using the Hansen solubility parameters, polarity, and dielectric constant among others.
Herein, we propose that AI is a silver bullet for sustainable materials development, demonstrated through the electrochemical synthesis of ZIF-8, to produce a high-quality product and a sustainable process (Fig. 1).
Fig. 2 Factor contribution and interactions. (a) XRD pattern, and (b) SEM image of ZIF-8 synthesized from entry 15 (Table S2†) in the experimental design. Factor contributions of each parameter to each response under the category of (c) product quality, and (d) process sustainability. Each axis and color represent a different factor contribution and response, respectively. Subscript n indicates normalized value. Ce is electrolyte concentration, V is applied voltage, t is reaction time, Cl is linker concentration. (e) Individual parameter-response interactions on product quality and process sustainability. (f) Two-parameters-response interaction on crystallinity. (g) Complexity of the multivariable interaction in the studied system, where the solid and dashed lines indicate direct and indirect (through intermediate) parameter-response interactions, respectively. |
The higher a parameter contribution to the observed responses, the higher the variance in the output responses. The linker concentration had the most significant effect, governing both the crystallinity and purity (Fig. 2c). The low linker concentration led to the formation of amorphous materials (Table S4†). On the contrary, all parameters were almost equally significant in controlling the yield of the synthesized product. The factor contributions to the process sustainability are illustrated in Fig. 2d. The applied voltage contributed the most to the total energy consumption during the electrochemical synthesis of ZIF-8, while the reaction time produced the highest variance in the E-factor and carbon footprint responses.
This phenomenon can be explained by the longer reaction time, which allowed enough time for the nucleation and growth of the products, that in turn, resulted in higher yields. Consequently, the reaction time significantly influenced both the E-factor and carbon footprint, which were inversely proportional to the yield. A detailed description of the factor contribution determination is available in section 2.2 in the ESI.†
Nevertheless, the individual parameter-response interactions did not consider the possible interaction between the parameters themselves. Thus, we evaluated the two-parameters-response interactions (see section 2.4 in ESI†). In contrast to the individual correlation (Fig. 2e), the effect of electrolyte concentration on crystallinity was dependent on other parameters (Fig. 2f). For instance, with a short reaction time and low applied voltage, the increase in electrolyte concentration led to an increase in crystallinity. In contrast, with a longer reaction time and higher applied voltage, an increase in electrolyte concentration resulted in a decrease in crystallinity. This phenomenon was caused by the low amount of metal cations generated from the cathode with a short reaction time and low applied voltage, which could be easily overcome at higher conductivity by increasing the electrolyte concentration, which favors the formation of crystalline material. However, with a longer reaction time and higher applied voltage, an abundance of metal cations were generated from the cathode. In this case, increasing the electrolyte concentration was not effective in facilitating ionic transfer processes due to oversaturation, which resulted in higher concentrations of both the zinc cations and the electrolyte.
Beyond the two-parameters interactions effects, the complexity of using the multivariable system for materials fabrication is illustrated in Fig. 2g. The process parameters influenced different responses either directly or through intermediate processes, such as salting-out, metal ions generation, driving force effect, and the formation of other undesired materials. With the aid of ML algorithms, we generated predictive virtual datasets through the grid-search method, which enabled us to construct the overlaying contour plots to visualize the multidimensional interactions. In the next section, we discuss the implementation of ML to investigate the multidimensional correlation between parameters and observed responses, in terms of product quality and process sustainability.
Fig. 3 Machine learning generated a 4D response surface for product quality and sustainability. Response surface plots from the SVM fits of the parameters on (a–c) product quality and (d–f) product sustainability. The electrolyte concentration was set at 0.01 M, while the applied voltage (V), reaction time (t), and linker concentration (Cl) were varied. The sampling points generated from the DoE are placed as the dots in each surface plot. The virtual dataset consists of 456976 data points in total. Refer to the ESI† for the comprehensive 4D plots. |
Fig. 3a–c shows that the highest product quality was achieved at high voltage, longer reaction time, low electrolyte concentration, and high linker concentration. This condition also revealed the best process sustainability, which is indicated by the low value for the E-factor, energy consumption and carbon footprint (Fig. 3d–f). Taken together, it can be interpreted that high voltage produces more zinc cations as the source of metal node to form MOFs. The longer reaction time allowed enough time for nucleation and crystal growth to occur. The high linker concentration ensured that there was enough linker to form a framework with the metal cations. Meanwhile, the low electrolyte concentration increased the system conductivity while avoiding salting out.
Thus, the broader view provided by multidimensional analysis is evidently helpful to critically assess the interaction of the variables. Multiparameter interactions can be more easily visualized, and their correlation with each response can be carefully investigated. However, the response surface plots appear to be limited to correlate multiparameter with only one response. As shown in Fig. S28 and S29,† a total of 18 panels of 4D plots with 54 layers were obtained by corelating four parameters with six responses. This combination generated 486 ways to interpret the plots. Hence, using the correlation of ten variables is still not very practical to propose a single optimum condition. In our case, visualizing all the interactions in one figure is not practically possible and not interpretable, as such visualization requires a ten-dimensional image. Below, we discuss an ML-based strategy to evaluate the optimum conditions for a hyperdimensional system that can propose a single solution.
In AI Module 2, we used the experimental data from DoE as the input for the SVM algorithm, followed by 50 random virtual data generations as the initial population for the evolutionary algorithm (Fig. 1b). This process was then followed by the implementation of the desirability function for the last optimization step (see Fig. S1† for the detailed flow-chart). The evolutionary algorithm was based on the evolution of the initial data population through cross-over and mutation, which was inspired by genetic evolution.27 In particular, in this study, we implemented a nondominated sorting genetic algorithm (NSGA-II).28
From a practical perspective, we considered two optimization objectives for decision-making (Fig. 4a). The first objective aimed to maximize the product quality only, while the second objective targeted both product quality and process sustainability. During the implementation of the evolutionary algorithm, the different objectives were met for the different generations, which was measured by the inverted generational distance (IGD) values (Fig. 4c and d). The solution converged at the larger generation as the number of optimized variables increased.
The global desirability graphs show the data points for DoE and AI Modules 1 and 2 for the normalized input parameters (Fig. 4e and f). The limited number of DoE results were insufficient to identify the optimal point, while AI Module 1 created an unnecessarily large dataset. Although the accuracy was good, the computational cost increased (Fig. 4b). The incorporation of the evolutionary algorithm in AI Module 2 made it possible to screen only the best data population. Therefore, AI Module 2 did not require too many data points, which significantly increased the computational speed from 0.01 s−1 to 9.09 s−1 to find the optimum solution. Owing to the small number of datapoints, the computational speed of DoE was the highest (28.57 s−1).
The single solutions for the optimum condition for each objective with different methodologies are compared in Table S12† and visualized in Fig. S30.† The solutions from both AI modules were better than the DoE results. Objective 2 accomplished with AI Module 2 required the least computational power and maintained a high-quality product while minimizing the environmental impact (Fig. 4g). Overall, the best performance achieved 86% crystallinity, 100% purity, and 88% yield using 0.07 M electrolyte, 1.86 M linker concentrations, and 18.5 V with a 0.9 h reaction time. Under these conditions, the E-factor and carbon footprint were found to be 11 and 27 kg kg−1, respectively, and the corresponding energy consumption was 7 kW h kg−1.
The parameter of the SVM model, such as ‘gamma’, is crucial for its performance. A gamma value that is too small leads to low accuracy, while a high value of gamma could cause an overfitting problem. To choose the best parameters for the SVM model such as ‘gamma’ and ‘C’, we applied cross-validation method. After cross-validation, the gamma was set to 1, and C was set to 10. The min-max normalization method was used to scale all the input and output variables to a certain range (0–1). The step value of 0.04 was used to generate a 4-dimension grid (for 4-input variables in this experiment). For each dimension there are 1/0.04 + 1 = 26 steps; thus, the total number of generated points was 26 × 26 × 26 × 26 = 456976. The model was validated by comparing the fitting of two ML algorithms; the SVM and the RF. The model validation is provided in section 2.12 in the ESI.†
After designing and training the ML model, evolutionary algorithm was used to perform optimization based on that surrogate model. Non-dominated Sorting Genetic Algorithm (NSGA-II) was used to achieve the best solutions. Our genetic algorithm started from a randomly initialized population with a sample size of 50, which was the potential solution set of the problem. After the initialization of the first-generation population, the generation evolved by crossover and mutation to produce increasingly better approximate solutions. In each generation, the best individuals were selected according to the fitness of the objective function. To show the progress of evolution, we used IGD (Inverted Generational Distance) as the evaluation metric.
Footnote |
† Electronic supplementary information (ESI) available: For detailed experimental design, calculations, data analysis, structure and morphology analysis. See DOI: 10.1039/d0gc02956d |
This journal is © The Royal Society of Chemistry 2020 |