Junhyeong
Lee
,
Donggeun
Park
,
Mingyu
Lee
,
Hugon
Lee
,
Kundo
Park
,
Ikjin
Lee
and
Seunghwa
Ryu
*
Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. E-mail: ryush@kaist.ac.kr
First published on 4th August 2023
In the last few decades, the influence of machine learning has permeated many areas of science and technology, including the field of materials science. This toolkit of data driven methods accelerated the discovery and production of new materials by accurately predicting the complicated physical processes and mechanisms that are not fully described by existing materials theories. However, the availability of a growing number of increasingly complex machine learning models confronts us with the question of “which machine learning algorithm to employ”. In this review, we provide a comprehensive review of common machine learning algorithms used for materials design, as well as a guideline for selecting the most appropriate model considering the nature of the design problem. To this end, we classify the material design problems into four categories of: (i) the training data set being sufficiently large to capture the trend of design space (interpolation problem), (ii) a vast design space that cannot be explored thoroughly with the initial training data set alone (extrapolation problem), (iii) multi-fidelity datasets (small accurate dataset and large approximate dataset), and (iv) only a small dataset available. The most successful machine learning-based surrogate models and design approaches will be discussed for each case along with pertinent literature. This review focuses mostly on the use of ML algorithms for the inverse design of complicated composite structures, a topic that has received a lot of attention recently with the rise of additive manufacturing.
With its development, ML also has significantly revolutionized the field of materials design and manufacturing by replacing various classification or regression-related tasks that had been performed by humans. For instance, ML models have performed laborious classification tasks of field experts, such as the detection of abnormalities in manufacturing equipment in real-time22,23 and the evaluation of product quality.24–26 Also, there have been numerous studies on using ML-based regression models to capture the correlation between the design variables of material structures and the resultant performance parameters.27–32 Furthermore, in recent years, ML has been widely applied to an inverse design of materials and their manufacturing process parameters, encompassing the entire range of a manufacturing process (material development (or selection) – structural design – process parameter optimization).33–36
Inverse design inherently poses certain difficulties. Firstly, the majority of inverse design problems are ill-posed, indicating that a design aimed at a target performance is not uniquely defined, and numerous feasible solutions exist due to a greater number of variables than constraints. Furthermore, the design solution might exhibit instability, where minor variations in desired performance could lead to substantial changes in the input design. To address these ill-posed problems, appropriate constraints, such as a limitation on the design space or projection to low dimensional space, can be incorporated to make the problem well-defined.37,38 It is imperative that machine learning-based, data-driven methodologies are employed, keeping in mind the fundamental challenges that come with the inverse design itself.
The essence of ML-based inverse design is to significantly reduce the cost of generating new data by replacing simulations or experiments with an AI-based surrogate model. In a conventional optimization loop, newly suggested structure designs or process parameter sets are evaluated or labeled by conducting numerical simulations or experiments for every iteration until the optimization process converges to a solution. Therefore, for optimization problems in which the acquisition of new data is expensive and time-consuming, the surrogate model trained on an accumulated dataset can be extremely beneficial for the optimization task, significantly reducing the costs. For instance, in composite design problems, the elastic properties of various fiber-reinforced composite structures can be computed in a fraction of a second by either analytical theories or simple simulations. With this data acquisition method, either gradient-based optimization or greedy search based on a genetic algorithm can be used for inverse design problems. However, the computation of their non-linear response beyond the elastic regime requires time-consuming simulations or experiments.39 As another example, the mechanical properties of 3D-printed composites with highly complex geometrical configurations cannot be evaluated simply by analytical models or quick simulations.29,40 Thus, for the past few years, the ML-based surrogate model has been extensively utilized to predict and optimize the performance of a new set of composite design problems. For example, starting from the Prediction of mechanical properties of comparatively simple-structured fiber-reinforced composites,41,42 the Prediction and design in complex grid composite materials have been studied using ML surrogate models.29,43
Despite intensive research and successes in the past years, there still remain challenges in constructing and utilizing AI surrogate models for solving inverse design problems, as summarized in Table 1. First, because of weak generalization performance in an unseen domain, AI models tend to show inaccurate predictions for extrapolation tasks. To find the optimum in a vast design space, exploration of design space outside the initial training dataset is necessary; however, the ML-based prediction model has difficulty in accurately estimating the objective function value of the design that is far from the training dataset.44–50 Second, although many existing ML-based design studies utilized the data from computer simulations to train ML models due to the ease of accumulating larger datasets, there exists a systematic difference between simulations and experimental results in most cases. Hence, the inverse design based on the trained ML model with the simulation datasets would find the optimum within the manifold of simulation results, which may not represent the realistic optimum. Failure to close the simulation-experiment gap would result in an inaccurate or implausible design, significantly hindering the real-life applicability of the ML-based design algorithms. Finally, the training of deep neural network (DNN) surrogate models generally requires massive amounts of labeled data, which is not always the case in some material design problems.51,52 For the cases where simple simulation (or theory)-based prediction is unavailable and high-throughput experiments are difficult to set up, one can only use a small dataset collected through manually conducted material experiments or highly time-consuming simulations, which prohibits the application of deep learning. Therefore, no matter how excellent the emerging DNN architectures are, the predictive performance of the DNN models without sufficient training data may not meet the standards required for an inverse design task. More efficient and practical applications of data-driven optimization can be carried out if one can recognize and overcome the aforementioned challenges.
Challenges | Methods | Ref. # |
---|---|---|
Weak generalization performance in an unseen domain | Active learning and data augmentation | 100–104 |
Design of DNN architecture | 105 and 106 | |
Difference between simulations and experimental results | Gaussian process approach | 108–120 |
NN-based approach | 121–123, 127–129 | |
Accessible to only small datasets | Bayesian optimization | 104, 134, 137 and 138 |
While there are existing reviews on ML-based inverse design in materials and manufacturing,7,10,53–58 few comprehensively discuss the suitability of different methodologies given problem-specific characteristics. This review addresses that gap, offering guidelines for selecting appropriate ML methodologies, considering factors like the scale of design space and data fidelity. We classify the inverse design problems into four categories with respect to the size of the dataset and design space and suggest appropriate design strategies for each case, as shown in Fig. 1. The first section considers an ideal case in which the design space is relatively small and the dataset is large enough to capture the overall input–output relationship throughout the design space, such that common interpolation-based inverse design schemes can be adopted without concerning the aforementioned challenges (Case 1). The second section considers the design problems that have a vast design space (such as combinatorics or complex shape optimization problems with a high degree of freedom), such that one has to devise a way to mitigate the DNN's weak generalization performance outside the training set (Case 2). Here, an active learning-based gradual ML model update method or careful design of DNN architecture is suggested to resolve the challenge. The third section highlights the ML-based methods to close the systematic difference gap between simulations and experiments (or between a large low-cost, low-fidelity dataset and a small high-cost, high-fidelity dataset) (Case 3). Transfer learning or multi-fidelity regression methods are suggested for such a case, as the algorithms are capable of incorporating multiple datasets having similar properties. We acknowledge that the term “active learning” is applied when operating within the same data domain (for instance, supplementing the machine learning model initially trained with FEM data with more FEM data), whereas the term “transfer learning” is utilized when dealing with two distinct yet related datasets (such as FEM simulations and experimental data). The fourth section considers the material design problems that have relatively small design space and a small dataset available for the training of a surrogate model, usually due to the objective function being too expensive to evaluate (Case 4). Such design problems can be approached by Bayesian optimization, a sequential design strategy that tries to reach the optimal solution with a minimal number of data acquisition. Finally, the review is closed by describing the ongoing challenges that are yet to be solved, as well as the prospects and future in the field of ML-based materials research.
Fig. 1 This figure presents the four major types of inverse design problems in the context of ML-based methods in materials design and manufacturing, alongside representative research examples for each case. Case 1 Illustrates scenarios where there is a wealth of data and a relatively constrained design space. The figure showcases (a) Inverse modeling network (Reproduction with permission from ref. 65. Copyright (2018) American Chemistry Society), (b) Forward modeling network coupled with optimization algorithm (Reproduction with permission from ref. 59. Copyright (2020) American Chemistry Society), and (c) Generative adversarial network (From ref. 93, Licensed under CC BY-NC 4.0) as examples of effective interpolation strategies in this context. Case 2 depicts large design space scenarios and the associated challenges of weak generalization performance of ML models. (d) Active learning and data augmentation (From ref. 100, Licensed under CC BY 4.0) and (e) Innovative DNN architecture (Reproduction with permission from ref. 106. Copyright (2022) Elsevier) are featured as solutions to enhance generalization and extrapolation in unseen domains. Case 3 highlights the challenge of reconciling differences between diverse datasets. Research approaches such as the (f) Surrogate model by Gaussian process theory (Reproduction with permission from ref. 111. Copyright (2022) Elsevier) and the (g) Surrogate model by neural network (Reproduction with permission from ref. 128. Copyright (2022) Elsevier) are showcased as potential solutions. Case 4 presents the conundrum of limited dataset and a small design space. (h) Single objective Bayesian optimization (Reproduction with permission from ref. 134. Copyright (2022) Elsevier) and (i) Multi-objective Bayesian optimization (Reproduction with permission from ref. 104. Copyright (2022) Springer Nature) are included as strategies to achieve optimal design under these circumstances. |
In this section, we investigate the case of an inverse design problem in which the optimal design configuration does not deviate significantly from the scope of the initial training set. Here we refer to such a case as an interpolation problem. To effectively tackle the interpolation problem, we provide some representative ML-based strategies that are suitable for the case where the amount and reliability of available data are sufficient to describe the input–output relation over the entire design space. For such case, the initially trained ML model generally have an excellent predictive performance over the entire design space, and thus, the optimum design can be found without having to update the ML model during the optimization. Inverse modeling network, a forward modeling network combined with a conventional optimization scheme, and a recently emerged generative adversarial network (GAN) will be reviewed in this section.
Several subsequent studies tried to improve the inverse model approach to overcome the limitation. Kabir et al. (2008) trained an inverse design neural network that predicts the values of geometrical parameters by putting the electrical parameters as input and then used the trained surrogate model for designing microwave guide filters. At first, the study divided the given training data set into different groups in a way that each group does not contain the samples having the same performance but different designs. Then, they constructed multiple inverse modeling networks for each group of training sets, which were later integrated into one form of a comprehensive prediction model. As a result, the proposed method showed higher prediction accuracy than the conventional DNN models that are trained with all training samples at once (Fig. 2).64
Fig. 2 Comparison of the predictions made by the inverse modeling network trained using the direct inverse modeling method and the proposed division method for microwave guide filter design. The figure presents the predicted relationship between the inner mean diameter (Di) of a spiral inductor and the effective quality factor (Qeff). Applying the proposed division method reduces the error significantly, decreasing it from 13.6% to 0.05%. Reproduction with permission from ref. 64. Copyright (2008) IEEE. |
However, although the division method could be readily implemented for the inverse design problems that have a small and simple data structure, division of the training data was far more challenging when it came to a complex design space. Hence, as an alternative solution, Liu et al. (2018) proposed a tandem network architecture, whose structure has an inverse modeling network attached in front of a forward modeling network as described in Fig. 3. To model the correlation between design and response (performance), the forward modeling network located at the back of the architecture (right side of the figure) is trained first. After fixing the weights trained in the previous step, the remaining inverse modeling network is trained to reduce the error between the predicted response and the desired response. Finally, the trained tandem network was able to generate a design candidate on the intermediate layer M by putting the desired response as an input. Even if there were multiple design solutions for one identical response in the training dataset, the forward modeling network depicting many-to-one mapping was trained accurately. Furthermore, since the inverse modeling of the second training stage did not aim to predict the real design of the train data, the network could be trained effectively despite the data inconsistency. In this study, the proposed tandem-shaped network was applied for the designing of a nanophotonic structure that has the desired performance.65
Fig. 3 Proposed tandem-shaped neural network for inverse design problem of the nanophotonic structure. The forward modeling network is represented by dashed lines, with the inverse modeling network attached in front. The red nodes represent the response, while the blue nodes represent the design variables. Reproduced with permission from ref. 65. Copyright (2018) American Chemistry Society. |
The approach employing the inverse modeling network carries the benefit of rapidly proposing optimal design candidates once the neural network is trained. Consequently, this strategy has found applications across a range of inverse design problems.66–71 However, the dimension of the input variables that correspond to the material performance is usually lower than the dimension of the output variables that correspond to the materials design parameters, thereby limiting the dimension of the recommended optimal values. Such a problem may not be an issue in the case of simple problems in which the dimension of design variables is relatively small, but the strategy may not be applicable to more complex design problems with higher input and output dimensions.
In a sequential optimization strategy where the optimization process gradually approaches the global optimum by repeatedly augmenting new data to the model, the acquisition of a new dataset may take a considerable amount of time if numerous iterations of data augmentation are taken. This is especially true when we use computer simulations and experiments that cost significant time to predict the material performance for a given design variable set. Therefore, many studies have been conducted to efficiently find the optimum by combining the AI surrogate model while following the workflow of the existing data-driven optimization algorithm. For instance, Kim et al. (2020) combined the forward modeling network with the conventional genetic algorithm to optimize the structures of an axisymmetric adhesive pillar. In this study, a DNN-based surrogate model is trained with its input being 501 design variables that characterize the 2D shape of an adhesive pillar and its output being the interfacial stress distribution at the boundary between the pillar and a substrate. The stress distribution was compressed into 30 valid features through principal component analysis (PCA) for efficient training of the neural network. Also, in order to select adhesive pillars with the desired detachment type, an additional DNN for classification was trained. The trained neural networks were able to predict the output accurately and quickly for 1000 proposed adhesive pillar designs. Based on the predictive power of the networks, the genetic algorithm was performed, and the optimal pillar shapes that can minimize the interfacial stress singularity were found, as shown in Fig. 4.59 There are many other studies that performed optimization by combining the forward modeling network with optimization algorithms in various design problems.72–79
Fig. 4 (a) Design space and schematic of inverse design of the shape of the adhesive pillar combining the forward modeling network and the genetic algorithm. (b) Cross-sectional area and distribution of interfacial stress (σyy) normalized by ideal flat stress (σI) distribution for optimized adhesive pillar design with a sharp edge (Left) and a truncated edge (Right). Reproduction with permission from ref. 59. Copyright (2020) American Chemistry Society. |
A different approach called generative inverse design network finds the optimal designs having the desired performance by using back-propagation in neural networks. Generally, back-propagation is a process of optimizing the hyperparameters of hidden layers to minimize the loss function, the value of which quantitatively defines the error between the network's predicted result and the ground truth value.80 After the training stage, we can find the optimal design by fixing all neural network parameters except for the input features so that the input values are tuned to minimize a loss function through back-propagation.81,82
For example, Peurifoy et al. (2018) solved the inverse design problem using the back-propagation-based approach to find the optimal thickness combination of nanoparticle causing desired light scattering spectrum.81 First, to construct the AI regression model, the neural network was trained by the data collected from light scattering simulation, which generates the light scattering spectrum for a set of parametrized thickness values of multi-layered nanoparticles. Then, by fixing all weights except for the input features, the optimized particle designs having the desired scattering spectrum were obtained through back-propagation. The study demonstrated that the neural network (NN) outperformed the numerical non-linear optimization method by achieving a significantly closer minimum in some cases (Fig. 5).
Fig. 5 (a) Schematic of the neural network predicting the scattering cross section (σ/πr2) at varying wavelengths from the thickness value of each shell of the nanoparticle. (b) The desired spectrum (Blue), the spectrum from the NN-based design method (Orange dashed), and the spectrum from the non-linear numerical optimization method (Black dashed) for an eight-shell nanoparticle made of alternating shells of TiO2 and silica. The numbers in the legend denote the designed input shell thickness. From ref. 81, Licensed under CC BY-NC 4.0. |
GAN has also been employed to solve inverse design problems as it can find a new design candidate with excellent predictive performance, a design that is still similar to the designs within the original training set.91–93 Additionally, modified forms of GANs, such as conditional GAN (CGAN)94 and Wasserstein GAN (WGAN),95 further expanded the scope of inverse design by making the training of the model easier and expanding the types of tasks that the DL can perform.96–98
For example, Kim et al. (2020) applied WGAN structure to build a network called ZeoGAN and solved the inverse design problem of porous material to obtain the desired level of methane heat absorption. Engineered to address common challenges in traditional GANs such as unstable training and mode collapse, the WGAN introduces a novel role for the discriminator. Unlike in a traditional GAN, where the discriminator only verifies the authenticity of data, the discriminator in a WGAN (often referred to as a critic to highlight the difference in roles) is programmed to compute the Earth-Mover's distance (EMD) or the Wasserstein distance, which quantifies the minimal effort required to reshape the actual data distribution to match the artificial one. This serves as a robust indicator of their similarity. The WGAN's strategy of using the critic to calculate the EMD leads to improved stability during training and more precise evaluations of data similarity. This significantly boosts the efficiency of the training process, resulting in the generation of high-quality, realistic data. In this research, the generator that can create a structure and energy distribution similar to that of porous material from noise input was trained, and the optimal candidates were obtained by modifying the generator's loss function to obtain the desired methane heat of absorption (Fig. 6). Since the generator can be trained for other target properties, and the identical framework can be applied to other gases, the expandability of the optimization framework proposed in this research is superb.97
Fig. 6 Schematic representation of ZeoGAN architecture employed for inverse design of porous material. The generator and critic components were trained to minimize the EMD between real and generated inputs composed of materials grids (red representing silicon atoms and yellow representing oxygen atoms) and methane potential energy (green). An auxiliary neural network was trained to explicitly predict the lattice constant. From ref. 97, Licensed under CC BY-NC 4.0. |
As another example, Yilmaz & German (2020) conducted an Airfoil inverse design study using CGAN. Unlike GAN, CGAN puts a conditional vector during training to limit or to add conditions to the image generated by the generator. In this study, the neural networks were trained to generate only the design of the airfoil shape with the desired stall angle by adding the design range and angle of attack of the aircraft wing as a conditional vector. As a result, various airfoil designs that satisfy the required stall angle condition were successfully obtained (Fig. 7).98
The second section introduces AI model-based inverse design methodologies considering a vast design space and extrapolation tasks. The methods can be applied to inverse design problems in which the data has sufficient fidelity, yet the amount is not enough to describe the whole design space. Active learning strategy and novel network architecture will be reviewed in this section to effectively explore the unseen design domain.
For example, Kim et al. (2021) proposed a DNN-based forward design framework to explore the unseen design space efficiently. For instance, researchers have undertaken the optimization task of organizing stiff and soft materials to maximize mechanical properties such as stiffness, strength, and toughness, within an 11 × 11 grid composite consisting of 71 stiff and 50 soft blocks. The forward modeling network, trained on an initial training dataset of 100000 randomly arranged samples from an enormous possibility space of 1.8 × 1034 configurations, demonstrated limited predictive capability for well-ordered configurations significantly diverging from the initial dataset. The proposed method in this study gradually reinforced the DNN model through active learning by repeatedly training the model with the new candidate designs suggested by greedy sampling and a genetic algorithm. Such a sequential training method allowed the model to propagate toward the optimal designs having excellent mechanical stiffness and strength in the vast design space (Fig. 8). The study shows that a new composite design with optimal stiffness and strength can be found in a very efficient way, the size of the augmented dataset, computed by material descriptor, being only 0.5% of that of the initial dataset. The study also highlighted that the surrogate model must at least have a ‘reasonable’ extrapolation performance, if the model is to be used for the greedy sampling and genetic algorithm. For example, the DNN model used in this study was trained with the data having low material properties (lower 90%) in terms of stiffness and strength, and the model resultantly showed inevitable prediction errors when it was dealing with the designs having top 10% material properties, as the model was carrying out an extrapolation task. However, although the model was not able to accurately predict the values of stiffness and strength for the designs, the model was capable of determining the relative ranking of the designs by their performance. As a result, the surrogate could be combined with the greedy sampling algorithm and genetic algorithm, together forming the active transfer-learning framework. However, the same approach could not be adopted for the optimization of composite for toughness, which corresponds to the total area under the stress–strain curve, as the DNN model failed to show the minimal predictive power to carry out the ranking of the predicted designs. Insufficient predictive power can be improved based on domain knowledge from solid mechanics.100
Fig. 8 (a) Gradual expansion of reliable prediction domain through data addition using NN prediction-based genetic algorithm (b) optimized grid composite configuration for stiffness (Left) and strength (Right) (c) Increase in the stiffness (Left) and strength (Right) of grid composites on each update of the NN prediction-based genetic algorithm. From ref. 100, Licensed under CC BY 4.0. |
The active transfer learning-based framework has been successfully applied to the optimization of composites and structures for other target properties. Demeke et al. (2022) adopted the active transfer learning-based optimization framework for the inverse designing of a thermoelectric power generator to achieve high power and efficiency.101 Lee et al. (2022) found a superb lattice structure with high stiffness and strength by applying the framework to the optimization problem of the density and mechanical properties of the lattice structure constituting the crisscross pattern of beam elements. As above, the adaptive framework has provided a solution to the optimization problem with a myriad of possible shapes.102
On the other hand, optimization using back-propagation combined with active transfer learning also enables an exploration method toward the wider design. Chen and Gu (2020) introduced a generative inverse design networks (GIDNs) framework recommending optimal designs based on back-propagation and active learning. The GIDNs framework proceeds in three stages: predictor training, recommendation of optimum based on back-propagation, and active transfer learning. In the predictor training stage, the AI surrogate model that predicts the output performance of an unknown input design is trained with the initial training samples. Next, the desired values are put into the output layer, the weight of the hidden layer is fixed, and the optimal design candidates are recommended using back-propagation. Finally, the study proceeds with active transfer learning, which evaluates the actual performance of the recommended candidates so that the new data set can be used for the updating of the predictor. By iteratively performing the process above, the predictor is gradually updated to have high predictive power for higher-performance designs that are far from the initial training data (Fig. 9a). This study optimized the geometrical configuration of a grid composite structure, a 2D array of stiff and soft materials ordered in a random manner, to design a higher-toughness composite using the aforementioned inverse materials design method. The size of the initial training dataset was 800000, and the designer network recommended 800000 data for active learning in each iteration. The number of data points needed for optimization was significantly smaller compared to the vast number of possible combinations (2128) in the grid composite. Through this study, it was observed that the algorithm based on GIDNs and active learning was capable of identifying high-toughness designs that were not attainable with the initial predictor (Fig. 9b).103
Fig. 9 (a) The schematic of Generative Inverse Design Networks (GIDNs). The predictor is a trained DNN that predicts performance outputs based on input design variables, minimizing the error between real and predicted values. The designer produces optimized designs by back-propagation. The optimized designs are utilized for active learning, updating both the predictor and designer for further iterations. (b) Increase of grid composite toughness during each iteration of the GIDNs-based optimization process for three different volume fractions (12.5%, 25%, and 50%). The numbers displayed below the composite configurations represent their corresponding toughness values. From ref. 103, Licensed under CC BY 4.0. |
Furthermore, Jung et al. (2022) proposed a process parameter optimization method for the injection molding process using constrained GIDN (CGIDN). The conventional GIDN has a limitation in that the recommended input design from the desired output is unbounded. The CGIDN proposed in the study recommends the process parameter set within a desired range by applying a constraint to the input layer via the sigmoid function. As a result, they were able to find the optimal injection molding process parameter set that simultaneously minimizes deflection after injection and the cycle time required for production.104
Previously, the U-Net architecture, which has shown successful results in image-to-image regression in various engineering and scientific fields, is being utilized to predict the local stress field for an unknown configuration of the grid composite. U-Net was able to successfully predict the stress and strain field corresponding to the composite constituents by compressing the composite shape spatial information and supplementing and expanding the compressed information. Yet, U-Net predicts the material local fields without considering various spatial kernel effects, and it drops a lot of information from an algorithmic point of view. Hence, it was difficult to guarantee the generalizability of the model for a vast design space, indicating that U-Net still has limitations in making the predictions for a grid composite structure that is very different from the training datasets in terms of the relative volume fraction (VF) of the two constituent materials.107
To enhance the generalizability of the prediction model, it is crucial to thoroughly capture the correlation between the arrangement of two constituents for the grid composite and local mechanical deformation. For this purpose, Park et al. (2022) proposed a neural network architecture that combines various kernels of different sizes, rather than using fixed-sized kernels as in the U-Net, to efficiently extract the relationship between composite configuration and strain field at multiple scales. Specifically, the convolutional unit in the encoder section simultaneously utilizes three varying kernel sizes: 2 × 2, 4 × 4, and 8 × 8 (Fig. 10a), allowing effective usage of both local and global information in grid composites. The feature maps from these kernels are subsequently merged through a feature fusion layer (addition) and carefully concatenated using skip connections to mitigate gradient loss. The concatenated feature maps are then squeezed by bottleneck layers and passed through max-pooling layers to reduce the dimension of the feature maps while preserving meaningful information. Fig. 10b illustrates a schematic of a multi-dense block structure describing the aforementioned process. After then, this study applied the transpose convolution operation in the decoder section to recover the dimension of the feature maps reduced in the encoder section.
Fig. 10 (a) Feature extraction using multiple kernels having varying sizes. (b) The schematic of multi-dense block structure. (c) Comparison of generalizability of M-Net and U-Net. M-net shows better predictive power on the extrapolation test. Reproduction with permission from ref. 106. Copyright (2022) Elsevier. |
The proposed M-Net architecture successfully predicted the strain field of the grid composite structure that has a VF that is significantly different from the designs in the training set (Fig. 10c). In addition, this modified model showed the equivalent predictive performance even with a data set 1/3 times smaller than that used for the existing model. This research clearly showed that one can redesign the DNN architecture to efficiently tackle the purpose of the optimization problem. Here, the modified network exhibited an excellent extrapolation performance near the optimal design, even without the sequential active learning process introduced in the previous section.
Furthermore, Park et al. (2023) proposed a double generative network (DGNet) to explore the design space over extrapolation areas having higher effective stiffness compared to the initial training dataset. This model aims to predict stress and strain fields from the composite material configuration and derive effective stiffness from the predicted stress field. In order to sequentially utilize the shape and grid-averaged fields as input features for predicting composite stress fields, Park suggested the deep learning framework that c onsists of two generators based on the M-Net architecture. The first generator predicts grid-averaged fields, which are then put into the second generator, producing high-resolution stress fields. To ensure generalized predictive performance in extrapolation regimes, a CGAN was utilized to train the deep learning framework adversarially (Fig. 11). The proposed deep learning framework accurately predicted structures with superior stiffness compared to the initial training dataset, while the conventional U-Net showed a significant degradation in predictive performance in the extrapolation areas.105 We foresee more research works toward new DNN architecture design in the material/structure optimization field to enable faster and more efficient search for the optimal materials design outside the initial training set.
Fig. 11 The architecture of DGNet. The DGNet is composed of two generators the G-1 and the G-2. The G-1 generates a grid-averaged image from the shape of the digital composite, and the G-2 generates a high-resolution stress/strain field image. The DGNet is trained adversarially, and effective stiffness is estimated from the generated high-resolution image. Reproduction with permission from ref. 105. Copyright (2023) Elsevier. |
Two notable approaches exist for Gaussian process-based multi-fidelity surrogate modeling. One approach involves constructing an extended correlation matrix that considers the relationship between low- and high-fidelity data.108 The other approach entails creating a low-fidelity surrogate model first and then using high-fidelity data to correct this low-fidelity surrogate model.109 While both methods exhibit superior performance compared to traditional single-fidelity surrogate models, the relative superiority between the two methods has not been thoroughly validated. However, in cases where low-fidelity data is abundant and inexpensive, and where there is a possibility of generating additional high-fidelity data, the latter approach is in general known to be more efficient from the perspective of surrogate modeling.
Fig. 12 illustrates a single-fidelity model created using only high-fidelity data, and a multi-fidelity model built by incorporating low-fidelity data. The multi-fidelity surrogate model outperforms the high-fidelity surrogate model in terms of accuracy, as shown in Fig. 12. This superior performance can be attributed to the low-fidelity data effectively capturing the overall trend of the high-fidelity system.
Fig. 12 Concept illustration of multi-fidelity surrogate modeling. Reproduction with permission from ref. 111. Copyright (2022) Elsevier. |
Numerous studies have demonstrated that multi-fidelity surrogates are applied by categorizing high- and low-fidelity models based on various criteria. For instance, Guo et al. (2021) and Lee et al. (2022) distinguished between high-fidelity models (fine mesh) and low-fidelity models (coarse mesh) by implementing different mesh sizes for variable stiffness composites and railcar structures, respectively.110,111 Yong et al. (2019) distinguished high-fidelity models (3D elements) and low-fidelity models (2D elements) by using different mesh types for gas turbine engines.112 Moreover, Liu et al. (2020) developed a low-fidelity model of mesostructure using homogenized effective dynamic properties.113 In light of these examples, high- and low-fidelity models can be differentiated based on experiments/simulations, simulations/analytical functions, or non-linear solvers/linear solvers, among others, depending on the situation. Moreover, instead of considering only two fidelities (i.e., bi-fidelity), the multi-fidelity approach can be extended if experiments or analytical functions are available. In such cases, fidelities can be divided into levels such as experiment, high-fidelity simulation model, low-fidelity simulation model, and analytical function, and so on.
Some studies have attempted to determine whether or not to utilize multi-fidelity surrogate models. Various approaches based on maximum likelihood estimation,114 normalized cross-validation error,114 and Pearson correlation coefficient115,116 metrics have been proposed. The performances of these methods do exhibit some limitations, as reported in ref. 114–116. However, to sum up, the literature review, the consensus among most studies is that higher values of the Pearson correlation coefficient generally recommend the use of multi-fidelity models over single-fidelity models.
As an application example, Lee et al. (2022) applied the Gaussian process-based multi-fidelity surrogate framework to a real-world large-scale system. They constructed high- and low-fidelity models of a railcar structure with a difference in mesh density, as showin in Fig. 13. The computational cost ratio between the high- and low-fidelity model was 70, and the relative error in accuracy between the two models was approximately 70%. Furthermore, since the value of the Pearson correlation coefficient between the low-fidelity and high-fidelity data exceeded 0.9, it is recommended to create a multi-fidelity surrogate model. In this study, a single-fidelity model was created using the Kriging method with 15 high-fidelity samples. In addition, the hierarchical Kriging model, which is one of multi-fidelity surrogate modeling methods, was constructed utilizing the same 15 high-fidelity samples and additional 100 low-fidelity samples. As a result of comparing the performance of the two models, the accuracy increased by about three times while the computational costs remained almost the same. Moreover, both accuracy and efficiency improved compared to a Kriging model that used 20 high-fidelity samples. Through this application, it was demonstrated that the performance of the multi-fidelity framework was superior to that of the single-fidelity framework in terms of both accuracy and efficiency.111
Fig. 13 Finite element model of railcar with 5 design variables: (a) side view of high-fidelity model and (b) side view of low-fidelity model. Reproduction with permission from ref. 111. Copyright (2022) Elsevier. |
When low-fidelity data is not significantly cheaper than high-fidelity data, an adaptive sequential sampling approach should be employed to efficiently find optimal solutions.117–119 In such cases, appropriate utilization of low- and high-fidelity data is crucial. Huang et al. (2006) suggested the Co-Kriging-based sequential sampling method, introducing a measure utilizing the cross-correlation coefficient and cost ratio between low- and high-fidelity models.117 Subsequently, Zhang et al. (2018) proposed the variable-fidelity expected improvement method, which is a hierarchical Kriging-based sequential sampling method.118 This method provides information on the location and fidelity level of the next sample point using scaling factor and uncertainty information of low-fidelity and high-fidelity models. Furthermore, when parallel computing is possible, strategies for efficient sequential sampling in the allocated batch size have been investigated.120 Through these sequential sampling strategies, the optimization process can be efficiently conducted without wasting not only high-fidelity data but also low-fidelity data. Therefore, the effectiveness of these approaches compares favorably to simply adding either high-fidelity or low-fidelity data.
In the era of big data, many researchers are actively employing neural network-based multi-fidelity surrogate models. In such models, a loss function consisting of the difference between predicted and true values (e.g., mean squared error) is typically minimized to estimate the hyperparameters (e.g., weights and biases). If multi-fidelity datasets are available, an additional loss term is further added to construct the original loss function specifically for the neural network-based multi-fidelity surrogate model. This approach has been successfully applied to a variety of engineering problems.55,121–123 Moreover, if low-fidelity data is used to incorporate information from analytical functions (e.g., partial differentiation equations), this concept aligns with the idea of physics-informed neural networks.124,125
In addition, the concept of transfer learning can also be applied to handle multi-fidelity datasets. Transfer learning is a strategy to construct a neural network to learn input–output correlations of the desired dataset (typically, a small dataset with high fidelity) by using the dataset to perform fine-tuning of a pre-trained network, which is initially trained with a preliminary dataset (typically, a big dataset with low fidelity). The fine-tuning process of the transfer learning refers to either selecting only a few hidden layers from the pre-trained neural network for the training, or slightly modifying the overall weights in a reduced learning rate and epoch. Hence, if a pre-trained model exists for a similar task (that pertains to the preliminary dataset), it is easier to develop a surrogate model of our interest, as the fine-tuning of the pre-trained model generally requires a dataset of relatively small size. Owing to this advantage, the transfer learning drew big attention particularly in the research fields that have multiple sources of data, one domain where data can be easily collected and another domain where data acquisition is difficult.126
For example, Xu et al. (2021) applied the transfer learning technique in building an AI surrogate model that predicts the material properties of grid composites from the microstructure of composites. At first, the study developed a pre-trained CNN model that predicts the statistical parameter datasets (500), called as the analytical solution of geometry and distribution features (ASGDF), for a given grid configuration, which pertains to a problem whose input–output data can be computed relatively easily. After that, the pre-trained CNN was fine-tuned by the smaller number of FEM datasets (208) so that the final model could predict the effective elastic modulus of the composite. By using transfer learning, it was possible to reduce the amount of FEM data for CNN training by half (Fig. 14).127
Fig. 14 Transfer learning workflow for micro structure–property prediction using CNN. CNN is pre-trained by ASDGF label, which is easier to compute, and CNN is fine-tuned by real mechanical properties. Reproduction with permission from ref. 127. Copyright (2021) Elsevier. |
As another example, Jung et al. (2022) used the transfer learning technique to predict the non-linear mechanical responses of fiber-reinforced composites. The mean field homogenization technique can quickly compute the non-linear mechanical response beyond yield for the composites containing ellipsoidal reinforcement based on a few theoretical assumptions. However, if the shape of the reinforcement is not ellipsoidal or the volume fraction of the reinforcement material is higher than 20%, the prediction accuracy drops significantly. In contrast, although it demands higher computational cost and time, the finite element method (FEM) based calculations with fine mesh provide data of higher accuracy compared to that computed with the homogenization theory. This study pre-trained the DNN using the 49000 homogenization-based data and then fine-tuned the DNN with the 1400 high-accuracy FEM dataset. As a result, the constructed DNN could accurately predict the elastoplastic response of given composite geometries. The transfer-learned AI model showed higher prediction accuracy than a DNN trained only using FEM data. In the Fig. 15a, the model A refers to homogenization-based data pre-trained model, and the model B is fine-tuned with FEM data from model A. Model C is trained with FEM data only. By fine-tuning to specific target tasks (here, target inclusion geometry), model B can show better prediction performance compared to model C which suffers from overfitting due to insufficient dataset size. Fig. 15b shows model B and C prediction compared to ground truth FEM result for top four maximum relative error cases with ellipsoidal particle reinforced inclusion. Model B shows refined prediction performance compared to model C.128
Fig. 15 (a) The schematic of the training process of the transfer-learned DNN model for predicting the elastoplastic behavior of fiber-reinforced composites. Model A is trained using a homogenization dataset to capture the overall trend, while Model B is fine-tuned using FEM data to improve the accuracy of exact values. Model C is trained with small FEM dataset which were not sufficient to capture all details and suffer from overfitting. (b) Comparison of Prediction on ground truth FEM data for top four maximum relative error cases for ellipsoidal particle-reinforced composite. Reproduction with permission from ref. 128. Copyright (2022) Elsevier. |
In addition, multi-fidelity surrogate models can be combined with an appropriate optimization algorithm to solve the inverse design problem. For instance, Dong et al. (2021) combined a DNN-based surrogate model constructed by transfer learning with conventional data-driven optimization algorithms such as genetic algorithms and Bayesian optimization to inverse-design an optical material (composite metal oxides) having desired light absorption spectrum. To be specific, the purpose of the study is to find a mole ratio of a listed material composition that results in the desired absorption spectrum. The challenge lies in the training of an AI model as a relatively small number of data was available for the materials in the list. To overcome this hurdle, the researchers pre-trained the initial model with a large pool of available datasets, although their material compositions are different from the materials of their interest. After that, they fine-tuned the pre-trained model with a small number of data having the material compositions of their interest. As a result, they were able to construct a surrogate model that can predict the absorption spectrum based on the mole ratio of its material composition. Then, the genetic algorithm and Bayesian optimization were carried out with the transfer-learned surrogate model to discover the optimal design.129
Gaussian process regression (GPR) is a representative regression methodology that can estimate the predicted value and its reliability at the same time.131 GPR assumes that the data points follow a multivariate Gaussian distribution, and defines a covariance function between the data points to calculate the mean which corresponds to the prediction value at an input data point, and the standard deviation which indicates the reliability of the Prediction. The BO algorithm then computes the ‘acquisition function’ of various design candidates based on the mean and the variance estimated by GPR, and the design that has the highest acquisition function value is recommended as the design to be evaluated next. The expected improvement function, one of the most well-known acquisition functions, is calculated as a weighted summation of the exploitation part, which is related to finding a value close to the optimum, and the exploration part, which is related to the uncertainty of the model. The expected improvement function with an appropriate balance between exploration and exploitation should be used in order to effectively approach to the global optimum.132,133
Recently, Park et al. (2022) adopted BO to optimize the toughness of staggered platelet composite structure, which is one of the representative biomimetic composite structures mimicking a nacre. This composite material has a structure in which a stiff material is placed in a brick form on a soft polymer matrix. Because the prediction accuracy of toughness from either analytical models or computer simulations is not satisfactory, authors collected the toughness data by using a 3D printer to build an actual composite and conduct uniaxial tensile tests. With this accurate, yet expensive-to-evaluate, data collection method, they designed the maximum-toughness structure via Bayesian optimization with a relatively small number of experiments. The initial training phase utilized 14 data points, and for the optimization process, only 5 additional data points were required. The results from the study are visualized in Fig. 16.134
Fig. 16 (a) Design variable setting for the optimization of the staggered platelet composite structure. The length ratio (ξ) of this problem is fixed at 0.09. (b) The GPR model for the varying φ while keeping ρ fixed at 6.7. Solid line denotes regression mean, and the shaded area indicates the standard deviation of the regression (Left). Additionally, the heatmap of expected improvement is presented (Right). (c) 14 initial experimental data and 5 data added by BO in each iteration (Left), and the performance values of the data points are depicted (Right). Reproduction with permission from ref. 134. Copyright (2022) Elsevier. |
BO can be also extended to solve problems involving multiple objective functions. Multi-objective Bayesian optimization (MBO) aims to find Pareto optimal solutions for multiple objective functions in a trade-off relationship (such as toughness and strength for structural materials, production speed and defective rate for a manufacturing process).135,136 Recently, several studies in materials design and manufacturing adopted MBO to solve inverse design problems with multiple objectives.104,137,138 For example, Jung et al. (2022) applied MBO to optimize the injection molding process parameters and were able to determine the Pareto-optimal process conditions that minimize both the cycle time to produce a single product and the deflection that occurs after production. An initial set of 10 data samples was generated for training GPR-based surrogate model; subsequently, an optimization process was carried out with 250 additional iterations, incorporating the collection of new data points (Fig. 17).104
Fig. 17 (a) Plot of the results of MBO for every 50 iterations. As the number of iterations increases, the Pareto line becomes more distinct. Design selection can be made from the data points on the plotted Pareto line that meet the specified condition (b) the deflection distribution for the optimized process parameter set comprised of cycle time and maximum deflection. Reproduction with permission from ref. 104. Copyright (2022) Springer Nature. |
This review categorizes several ML-based optimization methodologies according to their characteristics of trainable data and the size of the design space. First, in a case where sufficiently large training data is available to capture the input–output trend over the entire design space, inverse modeling networks, conventional optimization methods combined with forward modeling networks, and GAN are suggested as suitable methods. Second, when the initial training set and the optimum are far apart in the vast design space, methodologies resolving the extrapolation challenge are introduced; gradual update of a ML model via the active transfer learning method, and devising an improved neural network architecture. Third, under the presence of two datasets with different fidelities, the domain transfer of an AI model using transfer learning was introduced. Finally, under the scarcity of data due to the objective function being too expensive to evaluate, we suggest a Bayesian optimization framework that makes efficient use of the data to determine the global optimum.
Despite the advent of numerous innovative AI model-based inverse design methods, substantial challenges persist in effectively implementing AI models in manufacturing and materials design sectors. Foremost, procuring initial training data for building the AI surrogate model can be time-consuming, especially for problems with vast design spaces, which demand several hundreds to thousands of initial training data points. Moreover, the issue of extrapolation during the design phase frequently necessitates consideration, even after data acquisition. It is clear that future research must focus on devising methodologies that can efficiently leverage minimal data for inverse design in high-dimensional design spaces. For instance, the physics-informed neural network (PINN) based methodology has been extensively investigated recently to address data paucity and the extrapolation challenge inherent in solving design problems.124,139 Such data-efficient deep learning surrogate models could be integrated with suitable optimization algorithms for inverse design. Furthermore, the quality of data currently accessible in the manufacturing industry remains suboptimal; a large portion of experimental data obtained from the field is either unlabeled or noisy. As a result, research on optimizing parameters in manufacturing processes using ML-based approaches has been less prevalent compared to studies focusing on material composition or structural design. This review primarily discusses inverse design methodologies grounded in supervised learning using labeled data. However, exploration into semi-supervised training-based methodologies, capable of utilizing unlabeled data to create surrogate models, is a promising avenue for further research in data-driven inverse design methods.140,141 Lastly, it's important to note that even in the most straightforward scenario of interpolation, significant errors can often manifest in ML models. Hence, it is crucial not to overly rely on ML and blindly trust its outcomes without checking the quantity and quality of the dataset and the prediction accuracy of the ML model.
In conclusion, ML-based inverse design frameworks have become an innovative route for solving complex inverse design problems that were not handled before. However, it is crucial to choose the right algorithms according to the characteristics of the dataset and design space, and this review provides a concise guideline in the field of materials design and manufacturing. Also, in the future, a design methodology that can exploit small, unlabeled, and noisy data sets should be further investigated to extend the impact of data-driven design methods in more practical engineering applications. At the same time, in addition to the development of algorithms, efforts to establish a database composed of standardized, high-quality datasets should be paralleled.
This journal is © The Royal Society of Chemistry 2023 |