Mani
Valleti
*a,
Rama K.
Vasudevan
b,
Maxim A.
Ziatdinov
bc and
Sergei V.
Kalinin
*d
aBredesen Center for Interdisciplinary Research, University of Tennessee, Knoxville, TN 37916, USA. E-mail: svalleti@vols.utk.edu
bCenter for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
cComputational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
dDepartment of Materials Science and Engineering, University of Tennessee, Knoxville, TN 37916, USA. E-mail: sergei2@utk.edu
First published on 4th November 2022
Automated chemical synthesis, materials fabrication, and spectroscopic physical measurements often bring forth the challenge of process trajectory optimization, i.e., discovering the time dependence of temperature, electric field, or pressure that gives rise to optimal properties. Due to the high dimensionality of the corresponding vectors, these problems are not directly amenable to Bayesian Optimization (BO). Here we propose an approach based on the combination of the generative statistical models, specifically variational autoencoders, and Bayesian optimization. Here, the set of potential trajectories is formed based on best practices in the field, domain intuition, or human expertise. The variational autoencoder is used to encode the thus generated trajectories as a latent vector, and also allows for the generation of trajectories via sampling from latent space. In this manner, Bayesian optimization of the process is realized in the latent space of the system, reducing the problem to a low-dimensional one. Here we apply this approach to a ferroelectric lattice model and demonstrate that this approach allows discovering the field trajectories that maximize curl in the system. The analysis of the corresponding polarization and curl distributions allows the relevant physical mechanisms to be decoded.
Common for materials discovery and optimization and imaging problems is the need to explore non-convex parameter spaces towards desired functionality.8–11 For materials synthesis, this is often the compositional space of selected multicomponent phase diagram,12,13 or the processing history.14,15 For imaging, this is the image plane of the object16 or the parameter space of the microscope control.17 From the optimization perspective, critical considerations are the dimensionality and completeness of the parameter space, and the properties of the target function defined over it.
For compositional space in synthesis or instrumental tuning alike, the parameter space is generally low dimensional and complete, often allowing for the use of the classical Bayesian optimization based methods.18–21 In cases where the function defined over the parameter space is discontinuous, the classical BO methods fail. However, if the character of these behaviors is partially known, the use of the structured Gaussian processing with the mean function tailored for specific behavior allows to address this problem.22
However, the many problems including molecular and materials discovery,23 active learning of structure–property relationships,24,25 and process optimization26 require optimization in very large dimensional spaces, e.g., the chemical space of the system,27 possible processing trajectories, or local microstructures. It is well recognized that solution for this problem is possible if the behaviors of interest form low-dimensional manifolds in these high dimensional spaces, and hence active learning problem is indelibly linked to the discovery of this manifold.
Recently, we have demonstrated the use of the deep kernel learning methods for discovery of structure–property relationships.28–31 Here, the algorithm has the access to the full structural information in the system and seeks to explore the relationship between the structure and functionalities by sequential measurements. The deep kernel learning parts build the correlative relationship between structure (e.g., image patch) and properties (spectra), whereas the use of physics-based search criteria allows to guide the discovery towards specific objects of interest.
Here, we extend this concept towards optimization in high dimensional spaces for applications such as materials processing or adaptive imaging and spectroscopy in physical measurements. We propose that from a practical viewpoint, we aim to optimize not the full set of possible trajectories, but a practically relevant subset that contains sufficient variability to approach a desired target. These can be chosen based on domain expertise, prior published data, or domain-specific intuition. One way to achieve this will be to use the trajectories parametrized via a certain functional form. However, this approach leads to large parameters spaces as well. Hence, we introduce the concept of deep kernel learning on the virtual spaces and demonstrate it for the lattice phase model of ferroelectric material. However, the proposed approach is universal and can be applied to other processing problems.
(1) |
Here, as an illustration, we generate 5000 trajectories governed by eqn (1), where coefficients A′p are sampled from a uniform distribution on [0, 1] and N = 14. These randomly sampled coefficients are then scaled in the form Ap = A′p/p to reduce the effects of higher-order polynomials and make them more practical. Twenty-five randomly sampled trajectories from the family generated are shown in Fig. 2a. The family of trajectories generated lie in a 15-dimensional space where pth dimension corresponds to the corresponding Legendre polynomial Lp and by construction, will contain the trajectories that represent practically important processing functions, e.g., temperature profile during field annealing, field evolution during the poling of ferroelectric material, etc. However, optimization in such high-dimensional space is complex.
Fig. 2 (a) Twenty-five randomly selected functions generated by eqn (1), the y-axis limits and the x-axis limits are set to be [−1, 1] for all the plots and are omitted from the plots for brevity. (b) The latent space is uniformly sampled between [−1, 1] in each of the latent dimensions and these sampled points are decoded back into the space of functions. (c) Functions generated by eqn (1) are represented by their encodings in the latent space and are colored using RMSE, illustrating the reconstruction quality vs. latent encoding. (d) RMSE between the function in the input space and the functions that are decoded back to the input space after encoding as a function of number of latent dimensions. |
We subsequently explore the encoding of the generated functions into a low dimensional latent space via the variational autoencoder described in our previous works.25,32–35 Here, the trajectories generated as described above act as the input to the VAE. The VAE then builds the smooth encoding of trajectories, where the trajectories are mapped to an n-dimensional continuous latent space. Each trajectory is then represented as an n-dimensional vector (l1, …, ln) in the latent space, where ln is the nth component of the latent vector. The dimensionality of the latent space (n) is a user selected variable.
The dimensionality of the latent space is set to 2 for visualization purposes. The two-dimensional latent space is uniformly sampled and each sampled point in the latent space is then decoded back into the space of trajectories. This decoded latent space shown in Fig. 2b, helps in visualizing the trajectories encoded in different parts of the latent space. We will refer to such plots as the decoded latent space from now on. It can be observed from the decoded latent space that the trajectories are encoded in a continuous and smooth manner, which is the characteristic of the VAE. Each trajectory in the real space is represented as a scatter-point in the latent space and then colored using the root mean squared error (RMSE) between the original function and the output of VAE. The reconstruction error here is the average RMSE between the trajectories in the input space and the encoded–decoded functions, the outputs of the VAE. This plot is shown in Fig. 2c, and it can be observed from this figure that the reconstruction error is not a function of latent space, i.e., there is no specific region in the latent space where the trajectories are encoded poorly by VAE. We will refer to such plots where trajectories are plotted as scatter points in the latent space as the latent distributions. Finally, to assess the effect of latent space dimensionality on the encoding, we plot the reconstruction error vs. the number of latent dimensions in Fig. 2d. Initially, the error decreases with the number of latent dimensions and increases slightly. This is because with a large number of latent dimensions, the latent space starts to encode the noise in the trajectories.
To obtain insights into the relationship between original parameters that generated the trajectories and the latent distribution, we plotted the latent distribution with colors corresponding to the parameters. Latent distributions where the trajectories are encoded into latent space, are color coded using the A1, A2, and A4 are shown in Fig. 3a–c respectively. The latent space appears to have a strong dependence on A1 (Fig. 3a) and A2 (Fig. 3b) while it does not appear to have any correlation with A4 (Fig. 3c). This is intuitively clear, as the first two coefficients in the linear expansion of the functions have larger weights by construction. The dependence of the latent space on the coefficients of the higher order polynomials diminishes with the order and can be explored further in the provided Jupyter notebook.
However, the latent space can be sampled to form trajectories in the real space with a continuous variability of higher-order parameters. To demonstrate this property, we uniformly sampled the latent space into 10000 points where each latent dimension varies between [−2, 2]. These latent points are then decoded into the real space and are expanded to a Legendre polynomial series using the least squares method i.e., (l1, l2) ⇒ f(x) into Legendre series, . With this, we plot the expansion coefficients in the latent space as shown in Fig. 3d and e. To show that the latent space has a smooth variation in both lower order and higher order polynomials, we used coefficients Ã1, Ã2, Ã3 in Fig. 3d and Ã9, Ã10, Ã11 in Fig. 3e. The value of these coefficients in the expanded Legendre series act as color channels for the RGB images. This would aid is noticing the variability of three coefficients per image. Note that while the resulting distributions are generally multi modal, they nonetheless show clear variability over latent space for lower and higher order coefficients. In such a way, the process trajectories are effectively encoded in the latent space, and corresponding parameters form smooth fields over the latent space with a small number of minimum and maxima. Assuming that the relationship between the trajectory of the process and resultant materials functionality is smooth almost everywhere, this strongly suggests that the Bayesian optimization can be performed over the latent space, where the decoded trajectories are used as an input to the process, and resultant functionality defines the target function.
Prior to the optimizing the latent space with Bayesian optimization, we assess the effect of the presence of trajectories in the input dataset that are not part of the physical system on VAE. This can also be treated as a case where the input trajectories are not necessarily limited to single family of curves. We generated a dataset where a fixed proportion of these trajectories are linear (which comply with the physical system), and the remaining are sinusoidal. The linear trajectories have variability in slope and y-intercept, which leads to variability in their range. For sinusoidal trajectories, only their frequency has been varied. In this section, we refer to the sinusoidal trajectories as wrong trajectories, i.e., the trajectories that do not belong to the original distribution.
We trained three different VAEs with varying proportions (1%, 10%, 30%) of wrong trajectories in the input data. The decoded latent space for each of the three different wrong trajectory proportions are shown in Fig. 4a (1%), Fig. 4b (10%), and Fig. 4c (30%). It can be observed from the results that the latent space is affected by the presence of wrong trajectories irrespective of the proportion of the wrong trajectories. Furthermore, all the input functions are plotted as points in the latent space where the linear trajectories are colored red, and the sinusoidal trajectories are colored green. This latent space representation of linear and wrong trajectories is shown in Fig. 4d (1%), Fig. 4e (10%), and Fig. 4f (30%). In all the three cases, the sinusoidal trajectories occupy a very tiny portion of the latent space. This is because the variability in the linear trajectories is much higher than the sinusoidal trajectories which are only varied in frequency. The Jupyter notebook provided in the data availability statement shows an example in which 90% of the curves in the input curves are sinusoidal and the discussion on latent space still holds true. The VAE latent space is expected to be smoothly varying and when decoded, certain areas should produce curves that have properties of both the families of curves. This ability of VAE to produce curves that are not part of the input dataset can be observed in decoded latent space in all the 3 cases (Fig. 4a–c) where the sinusoidal curves overlapped on the linear curves when moving radially away from the center of the image. From this analysis, it can be deduced that the latent space is always affected even if there is a limited presence of wrong trajectories. The families of curves are grouped together in the latent space whose assigned area in the latent space is proportional to their variability.
With this efficient encoding of the complex multiparametric trajectories in low dimensional parameter space demonstrated, we further extend this approach towards the process optimization. Here, we implement the Bayesian optimization over pre-encoded set of trajectories. The latter in turn is generated based on domain specific intuition and prior knowledge or can be derived assuming some desirable factors of variability for a given set of functions. The latent vector is decoded to yield the process trajectory, and the experiment is run using this trajectory. The experimental output is used as a signal to guide the BO over the latent space.
(2) |
Eloc = Eext + Edep + Ed(i,j) | (3) |
(4) |
The total free energy of the lattice is then a sum of the local free energies and the interactions between the lattice sites and their neighbors. We will only be considering the nearest neighbor interactions for this analysis; complex models can be constructed from this model where the interactions last several unit cells. The total free energy of the lattice system is given by the eqn (5), where K is the coupling constant and the interactions are summed over the entire neighborhood via summation over k,l. Summation over i, j results in total interaction energy over the entire lattice.
(5) |
Finally, the time dynamics of the system are given by the classical Landau–Khalatnikov equation and is given by eqn (6). The polarization at each time step is then updated by calculating the derivatives of the total free energy with respect to the polarization field at the previous time step and are given by eqn (7) and (8).
(6) |
(7) |
(8) |
Aexp(αt)sin(ωt) + B | (9) |
These trajectories are then projected into a two-dimensional latent space using VAE as shown in Fig. 6. This smooth continuously varying latent space will be sampled to generate the electric field that belong to the family of curves described above. Twenty-five randomly drawn electric field trajectories from the family of electric field curves generated are shown in Fig. 6a. The two-dimensional latent space is uniformly sampled, decoded, and shown in Fig. 6b. The latent distributions i.e., trajectories plotted as points in the 2-D latent space and are colored as a function of α in Fig. 6c, ω in Fig. 6d, and B in Fig. 6e. The latent space appears to be a strong function of α and ω, and they vary smoothly and continuously over the latent space. The latent space does not appear to be strongly correlated with A or B of the input curve. The effect of A and B on the latent space is lacking since each curve is normalized prior to the application of VAE.
The latent space is uniformly sampled into 10000 points and then decoded back into the space of electric fields to study the variation of the sum of absolute curls. The sum of magnitude of the curl at each lattice site will be referred to as the curl of the system from hereon for brevity. These decoded electric field curves are also normalized by design and are multiplied by a constant (150) to make them practical for the simulations. This constant value is user selected and the discussions that follow depend on the value of this multiplication constant. Once a curve is decoded, only first 900-timesteps of the curve is used as an input to the FerroSIM, discarding the last 300-timesteps as mentioned earlier. The equilibration timesteps, where the external electric field is held at a constant value for 50-timesteps, are then added to all the curves. The FerroSIM simulations are then run for each of these electric fields corresponding to the 10000 points in the latent space. The value of curl at the end of 950-timesteps is calculated for each point in the latent space and is shown in the middle in Fig. 7.
To explore effects of the final value of total polarization on the resulting curl of the system, we plotted the total polarization at the end of the simulation as a function of latent space in Fig. 8a. To negate the effects of magnitude of polarization on the final value of curl, the curl is recalculated after normalizing the polarization vectors at each lattice site at the end of the simulations. The simulations are however run using the actual polarization vectors for the entire duration. The recalculated curl (referred to as normalized curl) is shown in Fig. 8b. The normalized curl surface will provide insights on how much the polarization field rotates without accounting for the magnitude of polarization vectors. It can be observed from Fig. 8a and b that the polarization surface and normalized curl surface are inversed with respect to each other i.e., the normalized curl is maximum when the polarization is minimum and vice versa. To visualize this better, the total polarization surface is used as the R-channel and the normalized curl is used as the G-channel while the B-channel is left empty in an RGB image and is shown in Fig. 8c. It can be observed that most part of the image is either entirely green (large values of normalized curl) or entirely red (large values of polarization) with very little overlap. This is because when the magnitude of the polarization is small, any curl (deviation from the average polarization field) in the system will not lead to a substantial coupling term in free energy equation (eqn (5)). These deviations from the average polarization vector will lead to a large value of the normalized curl. However, since any such deviations at large polarization fields are not sustainable and any small deviations will not contribute much to the normalized curl, its value in these regions is going to be small. But the final value of curl is also dependent on the actual magnitude of the polarization vectors. A field with higher magnitudes of polarization vectors will result in higher value of curl compared to a field with smaller polarization vectors but a similar rotational tendency. To show this, the total polarization is used as the R-channel and the curl surface from Fig. 7 is used as the G-channel in an RGB image and is shown in Fig. 8d. It can be observed that there is some overlap between the green and red regions. So far from the discussion, it is apparent that neither a large nor a small value of the total polarization corresponds to the maximum curl. The polarization needs to be at an optimal value to create a large curl.
Fig. 8 (a) Total polarization magnitude as a function of latent space. (b) Unit curl: sum of absolute curls as a function of latent space when the polarization vectors at each lattice site are normalized at the end of 950-timesteps. (c) Total polarization magnitude and unit curl as a function of latent space are represented as red and green channels in an RGB image. (d) Total polarization magnitude and curl surface from Fig. 6 are represented as red and green channels respectively in an RGB image. Blue channel is set to be zeros in both (c) and (d). |
We carried out additional analyses to investigate the coincidence of local maxima of the curl surface and the local optima of electric field (Fig. 9). Points 2, 5, and 7 from Fig. 7 are considered for this analysis. Point 2 corresponds to a large value of final curl, point 7 corresponds to a large value of curl when the system starts to equilibrate which then falls rapidly to a very small value of curl. Finally, point-5 corresponds to a very large negative exponential factor (α) in the electric field which causes the electric field to decay to a constant value in less than a cycle. For these three points in the latent space, polarizations in x and y directions and the applied electric field as a function of time are plotted in Fig. 9a. Curl and polarization magnitudes as a function of time are plotted in Fig. 9b, and as a function of electric field in Fig. 9c. It can be observed from Fig. 9a that the polarization lags the electric field in every case. This is intuitive as the effect always lags the cause in any kinetic simulation (akin to learning rate in ML). From Fig. 9b it can be observed that the curl is maximum after a few timesteps once the polarization crosses zero. This is because the polarization vectors are in a state of maximum normalized curl right after the coercive field. Subsequently, the polarization vectors start to grow in magnitude because of a high value in the applied electric field. It takes a few time steps in the simulation for the polarization to grow to a magnitude that constitute the maximum curl for the electric field's half cycle. In our simulation, the half cycle of electric field is approximately same as the time steps the curl takes to attain the local maximum. This results in apparent coincidence of the local optima of the electric fields with the local maxima of the curl which can also be observed in Fig. 9c. This local maximum of curl is not stable as the electric field has a very high absolute value at its optima and destroys the curl in the subsequent time steps. This destruction of curl is what results in a low final value of curl for point 7. For point 5, since the electric field is always a high value and so are the polarization fields, any small deviations in the polarization field might result in some curl. The results so far depend on the various parameters that go into the FerroSIM simulation. Any change in these parameters will lead to a different curl surface but the underlying physics will still be same.
Fig. 9 (a) Total polarization in x-direction (red), total polarization in y-direction (red-dotted), and curl (blue) as a function of time. (b) Curl (green) and total polarization magnitude (red) as a function of time. (c) Curl (green) and total polarization magnitude (red) as a function of applied electric field. The applied electric field is represented in the units of coercive field of the system. The rows represent the points 2, 5, 7 in the curl surface from Fig. 6. The onset of equilibration region is marked with a black scatter point in (a) and (b). |
Bayesian optimization is an extension of Gaussian Processes (GP) regression. GP regression is an approach of interpolating or learning a function f, given the set of observations D = {(x1, y1), …, (xN, yN)}. It is assumed that the arguments xi are known exactly, whereas the observations can have some Gaussian noise with zero mean in them i.e., yi = f(xi) + ε. The key assumption of the GP method is that the function f has a prior distribution , where Kf is a covariance function (kernel). The kernel function defines the relationship between the values of the function across the parameter space, and its functional form is postulated as a part of the fit. Here we implemented the Matern kernel function given by eqn (10)
(10) |
The learning is performed via Bayesian inference in a function space and the expected value of the function, corresponding uncertainties, and kernel hyperparameters are optimized simultaneously. The output of the GP process is then the predicted data set, uncertainty maps representing the quality of prediction, and kernel hyperparameters. The aspect that makes GP standout from the other extrapolation methods by providing not only the function value, but also the uncertainty associated with the prediction. Bayesian Optimization (BO) then samples the input space based on an inbuilt or user defined acquisition function which uses the predictions by GP. The acquisition function can be set to a pure exploration state where the point selected by BO is either random or proportional to the uncertainty of the prediction. It can also be set in an exploitation state where the selection of next point in the input space is based on the target value of the function. In our case, the acquisition function is a linear combination of mean and standard deviation and is defined by eqn (11).
acq(x: λ) = μ(x) + λσ(x) | (11) |
The applications of BO algorithm are ubiquitous and can be found in a plethora of fields including but not limited to materials science,38–40 catalysis,41,42 3D printing,43,44 chemistry,20,45 solar cells,5,46 pharmaceuticals,47,48 While the application of BO in the latent space of the VAE is a recent development,44,45,49–52 the optimization in the latent space of trajectories where the trajectories in the input dataset are constructed for a domain specific application is still not explored.
The curl surface obtained through brute force is a result of running 10000 FerroSIM simulations. For BO based GP, we ran 0.5% of the simulations and the rest of the points are interpolated by GP. At each iteration, BO samples a point in the latent space based on exploration–exploitation trade off and the simulation corresponding to that point in the latent space is run and the actual value of curl is obtained. This point is now added to the input space to GP and the process is repeated for 500 iterations. The results of BO–GP analysis are shown in Fig. 10.
The initial set of points at the start of the BO process are shown in Fig. 10a. These are the 0.5% of the total points the latent space was divided into. Fig. 10b shows the points explored by the BO in 500 iterations progressively. GP interpolations of the curl surface can be plotted in the Jupyter notebook attached in the data availability statement. Finally, the explored points are plotted in the GP predicted curl surface at the end of 500 iterations in Fig. 10c and the brute force curl surface in Fig. 10d. BO does a sub optimal job in exploring one of the two local maxima that is because the curl surface is multimodal with multiple local optima and none of the initial seed points to the BO are a part of this region. However, a mere 5.5% of the points are explored at the end of this exercise and the BO was able to identify the two local maxima of the curl surface even with incomplete information.
This approach is further used to maximize specific functionalities of material represented via lattice Ginzburg–Landau model. Here we have demonstrated that this approach can be used to tune the process towards maximization of the total curl of polarization field in the system, which is a symmetry breaking phenomenon inconsistent with the symmetry of the lattice or applied stimulus. The maximization of the curl is found to occur in the vicinity of the phase transition as a result of the spontaneous symmetry breaking in the vicinity of the defects.
While here this approach is applied for specific family of the trajectories and model, we believe this approach to be universally applicable to a broad range of optimization problems such as materials processing, ferroelectric poling, and so on. It should be noted that the choice of the initial trajectory set has to be based on domain specific consideration and chosen broad enough to have sufficient variability but at the same time sufficiently narrow to avoid overly localized minima.
• The code for FerroSim model can be found at https://github.com/ramav87/FerroSim/tree/rama_dev_updated
• The code for Variational Autoencoder can be found at https://github.com/ziatdinovmax/pyroVED
• The code for Gaussian processes and Bayesian optimization can be found at https://github.com/ziatdinovmax/GPim
Footnote |
† This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). |
This journal is © The Royal Society of Chemistry 2022 |