Boris N.
Slautin
*e,
Utkarsh
Pratiush
a,
Ilia N.
Ivanov
b,
Yongtao
Liu
b,
Rohit
Pant
c,
Xiaohang
Zhang
c,
Ichiro
Takeuchi
c,
Maxim A.
Ziatdinov
d and
Sergei V.
Kalinin
*ad
aDepartment of Materials Science and Engineering, University of Tennessee, Knoxville, TN 37996, USA. E-mail: sergei2@utk.edu
bCenter for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
cDepartment of Materials Science and Engineering, University of Maryland, College Park, MD 20742, USA
dPacific Northwest National Laboratory, Richland, WA 99354, USA
eInstitute for Materials Science and Center for Nanointegration Duisburg-Essen (CENIDE), University of Duisburg-Essen, Essen, 45141, Germany. E-mail: boris.slautin@uni-due.de
First published on 15th July 2024
The rapid growth of automated and autonomous instrumentation brings forth opportunities for the co-orchestration of multimodal tools that are equipped with multiple sequential detection methods or several characterization techniques to explore identical samples. This is exemplified by combinatorial libraries that can be explored in multiple locations via multiple tools simultaneously or downstream characterization in automated synthesis systems. In co-orchestration approaches, information gained in one modality should accelerate the discovery of other modalities. Correspondingly, an orchestrating agent should select the measurement modality based on the anticipated knowledge gain and measurement cost. Herein, we propose and implement a co-orchestration approach for conducting measurements with complex observables, such as spectra or images. The method relies on combining dimensionality reduction by variational autoencoders with representation learning for control over the latent space structure and integration into an iterative workflow via multi-task Gaussian Processes (GPs). This approach further allows for the native incorporation of the system's physics via a probabilistic model as a mean function of the GPs. We illustrate this method for different modes of piezoresponse force microscopy and micro-Raman spectroscopy on a combinatorial Sm-BiFeO3 library. However, the proposed framework is general and can be extended to multiple measurement modalities and arbitrary dimensionality of the measured signals.
Often, of interest are multiple aspects of the structure and functionalities across combinatorial libraries. The investigation of such couplings entails a comprehensive study of multiple electric, mechanical, chemical, and structural properties of materials. This demands the utilization of a diverse array of local investigative methods, such as SPM, electron microscopy, and Raman microscopy. Each method serves as a critical tool for revealing distinct aspects of these couplings, allowing for a thorough understanding of different dimensions and properties. However, the extensive range of the required characterization methods often makes the exploration process time-consuming, creating a significant gap between the rates of synthesis and exploration.
In recent years, the extensive adoption of machine learning (ML) workflows has facilitated a reduction in the substantial gap between the synthesis and characterization rates, accelerating the exploration and optimization of targeted material functionalities.3 The deployed ML agents can provide support in various aspects, from mundane data treatment, including image segmentation and features detection,6,7 to big data analysis,8 as well as automating decision-making processes and governing the experiment itself.9,10 The concept of automated experimentation (AE) in microscopy involves the automatic, i.e., beyond human choice, operation of the microscope, typically establishing the sequence of imaging or spectroscopy measurement locations. This approach enables operators to redirect their focus from the tedious task of equipment tuning and microscope control at each exploration step toward more advanced result analysis and higher-level decision-making, such as establishing experimental objectives.
AE exploration frequently relies on Bayesian Optimization (BO), which has proven its efficiency in many studies, serving as a robust solution for both material design1,11–13 and automated material characterization.14–18 The Gaussian Process (GP) is the most used probabilistic surrogate model for approximating the true objective function in BO.19,20 It is important to note, however, that employing entirely data-driven approaches has limited effectiveness in material discovery due to the complex nature of materials' properties.21 Physics-informed and structured GPs allow overcoming this issue by providing a diverse array of opportunities to integrate physical insights.22 Depending on our awareness of a system, additional knowledge can be incorporated through physical-derived prior mean functions,16,21 and kernel functions,23 alongside the implementation of optimization constraints and regularization techniques.24 The widely used multi-fidelity BO methods combine information from different sources, that provide various levels of precision or require different computational resources, to efficiently integrate and exploit these datasets to balance the accuracy with the computational expenses to achieve optimal results.25,26 In microscopy, BO has been successfully employed for governing explorations using multiple methods, including SPM,17,27–29 scanning transmission electron microscopy electron energy-loss spectroscopy (STEM-EELS),30 4D STEM,31 and neutron and X-ray scattering.32–34
The rapid growth of automated labs has brought forth the challenge of the co-orchestrated operation of multiple tools. One example of this is the exploration of a given materials system with multimodal tools, where, due to technical limitations, data in different modalities can be acquired sequentially. Another example is the use of combinatorial libraries that can be explored in multiple locations using various tools simultaneously. The third involves similar synthesis systems at different geographic locations with dissimilar downstream analytics. Numerous scalable frameworks have been introduced to integrate various devices and their configurations – sometimes distributed worldwide – into autonomous or semi-autonomous automated experimental cycles.5,10,35,36 By entrusting the orchestrating role in such systems to AI, these frameworks can dynamically adjust experimental trajectories in response to real-time data, human interventions, and predefined constraints. Despite the diversity in architectures among these frameworks, their common feature lies in their ability to leverage data from various agents, thereby facilitating the accelerated exploration and optimization of material systems.
The alterations in compositions in combinatorial libraries prompt variations in the structures and multiple functional properties of materials, necessitating the utilization of numerous methods (multimodal discovery) to study their correlations and intrinsic nature. While implementing BO with high-dimensional input spaces is challenging due to the exponential growth of the search space (the curse of dimensionality), the complexity of constructing accurate surrogate models, and the difficulty of optimizing the acquisition function, the low-dimensional compositional space of a combinatorial library is an ideal candidate for employing BO for automated discovery.37,38 However, different measurements provide specific insights into the system, often resulting in high-dimensional datasets, like spectra or images. In this case, the straightforward implementation of BO requires constructing multi-output surrogate models, leading to the same scalability problems as for high-dimensional input spaces.39,40 To address this challenge, one effective solution is to leverage the encoding capabilities of techniques like variational autoencoders (VAEs) to transform the high-dimensional raw data into a lower-dimensional representation.41 The combination of a VAE with the GP-based BO has been proven to be efficient for such tasks as molecule design optimization,42,43 and the optimization of robot controllers.44 Typically, BO optimization processes data in the reduced latent space, while the selected objects are mapped back to the real space by the decoder.
In combinatorial libraries, the objective of active experimentation (AE) is to uncover the correlation between the local composition and target functionality or between different functionalities in as few steps as possible. Despite the diverse nature of the measured signals, it is anticipated that the property-changing profiles along the compositional change axis in a combinatorial library will exhibit similarity. The essence of multimodal orchestration lies in leveraging knowledge about a compositional correlation uncovered for one property to expedite the exploration of another property measured by a different method, thereby accelerating the overall characterization process. Accelerated autonomous property optimization in combinatorial libraries by embedding additional compositional information through pre-acquired phase mapping was recently demonstrated by A. G. Kusne et al.5
Here, we take the next step by introducing the multimodal co-orchestration framework. The proposed framework optimizes the exploration trajectory within the compositional spaces of combinatorial libraries by leveraging the compositional dependencies of VAE latent variables as objective functions for BO surrogate models. This approach allows the AI agent to handle the initial data interpretation, enabling the training of GPs on the compositional dependencies of complex features without the need for their direct interpretation. While this workflow was developed for combinatorial libraries, it could also be used for similar synthesis systems.
During the initial co-orchestration stage, the amount of gathered knowledge about the system is limited. Therefore, introducing new data at each exploration step can markedly alter the VAE distributions and the profiles of the latent variables throughout the latent space. This mirrors the evolution of our knowledge about the system (i.e., our understanding of the relationships between the composition and functionality) with acquiring new data. The significant alterations in the VAE distributions require retraining the VAE and MTGP at each step from scratch.
It is important to note that the transition toward a steady co-orchestration happens independently for different modalities at different times. The number of exploration steps needed for this transition indicates the complexities of the composition-dependent functionalities measured by the specific modality (i.e., method).
For our experiment, we utilized two predefined datasets showcasing local electromechanical hysteresis loops and local Raman spectra both measured for identical compositions within the Sm-BFO combinatorial library. Both datasets consisted of 94 spectra (hysteresis loops), collected at equidistant intervals along the axis of a compositional gradient. The local composition at the measured locations changed from 20% Sm-BFO at location 0 to pure BFO at location 93.
In the SPM experiment, a motorized stage in the NanoSurf DriveAFM microscope (Switzerland) was used to automatically position the sample under the SPM probe, enabling the selection of various compositions. We developed a Python-based workflow to automatically capture hysteresis loops across the combinatorial library. To avoid sample or probe damage during long-distance movements driven by the motorized stage, the SPM probe was withdrawn (∼200 μm above the surface) before each motorized stage movement. After reaching the next location, the probe re-approached to resume the hysteresis measurements. The electromechanical hysteresis loops measurements were done in the band-excitation piezoresponse switching spectroscopy mode (BEPS)46 using the AEcroscopy platform that was detailed in our previous work.47 The loops appear as a response to the modulated triangle waveform applied by the scanning probe microscope's tip. This approach enables the study of the local ferroelectric switching dynamic.48,49 The measurements were performed within a rectangular grid of locations for obtaining better statistics. Following this, the data were averaged along the constant composition axis, resulting in a dataset comprising 94 averaged loops measured along the Sm concentration gradient axis.
The Raman experiments were performed utilizing a 633 nm wavelength laser, which was focused using a 20× lens. This configuration offered a spatial resolution of approximately 5.54 μm. The sampling estimated optical depth was ∼134 nm (nBFO = 2.6). We normalized the raw Raman spectra by area to exclude the influence of the focus position (Fig. S1b†). The spectra primarily captured the response from the bulk STO, while the contribution from the Sm-BFO epitaxial film constituted only a minor portion of the overall signal. We extracted a BFO proxy signal by taking the difference between the local Raman spectra and the mean spectrum across the entire dataset (Fig. S1c†). This enabled us to eliminate the influence of the STO bulk substrate from the collected response. It is important to note that the resulting footprint spectra should not be directly interpreted as BFO Raman spectra. However, the observed changes in them, occurring with the compositional alterations, were associated with the structural variation within the Sm-BFO compositional library. The spectra were also collected at locations within the square grid and averaged along the constant composition axis. The resulting dataset comprised spectra collected for similar Sm concentrations as those used for the BEPS hysteresis loop measurements.
The multi-task GPs were implemented by the GPax Python package.50 The linear model of coregionalization (LMC) with two latent processes was used to capture the correlations across tasks. We utilized radial basis functions (RBFs) as kernels for the latent processes. A brief theoretical description of the LMC-based MTGP is provided in the ESI Section.†
The experiments were driven by pure exploration, employing a maximum uncertainty acquisition function. Each automated experiment began with 3 seed measurements at randomly selected compositions (locations), followed by 30 exploration steps.
The AEs were conducted in the simulation mode using pre-acquired BEPS and Raman datasets (see the Experimental section). Three distinct modalities were utilized: out-of-field polarization (represented by amp·cos(phase) of BEPS hysteresis loops),48 Raman spectra, and the frequency of electromechanical resonance observed during BEPS hysteresis loop measurements under applied voltage. Despite the simultaneous measurement of the polarization and frequency signals in real SPM experiments, their interdependence is indirect.56 This allows us to consider them as separate modalities for the AE simulation.
After a few initial exploration steps, we obtained that the substantial expansion of the VAE training datasets by the newly acquired data did not result in significant alterations in the configuration of both the whole and current latent distributions (Fig. 2a–d). The variations between distributions at different exploration steps were constrained by their position, degrees of compression/extension in some directions, and random orientation within the latent space. An observed stability in the general shape of the latent distributions indicates the potential ability to move from the initial co-orchestration to a steady co-orchestration stage. However, a random orientation of the latent distributions at each subsequent step results in a redistribution of the encoded features among latent variables. This alters the latent variables' dependencies on the composition used to train the MTGP.
The stabilization of the orientation of the latent distribution within the latent space was achieved by introducing the linear VAE (LVAE), a modified version of vanilla VAE. LVAE incorporates custom loss in addition to the standard reconstruction and KL divergence losses. This custom loss is computed as the absolute difference between the first latent variable (z1) and its corresponding position along the compositional axis, normalized within the range of [0,1] (x′):
custom loss = |z1 − x′|, x′ ∈ [0,1] | (1) |
The ground truth latent distributions and corresponding compositional dependencies of the latent variables were computed for all modalities by training the LVAE with the entire datasets (Fig. 3). To evaluate the proficiency of the trained LVAE models in encoding and reconstructing the raw data, we decoded spectra from different locations within the compositional profile and compared them with the original raw data. The predicted spectra effectively mirrored the dependencies present in the raw data, reducing instrumental noise (Fig. S2†).
![]() | ||
Fig. 3 LVAE ground truth: (a, c, and e) latent distributions and (b, d, and f) compositional dependencies of the latent variables for Raman, BEPS polarization, and BEPS frequency modalities. |
The LVAE fosters the accumulation of features with a proportional relationship between the latent representation and the local composition within the z1 latent variable. Simultaneously, it isolates features with complex compositional dependencies within the z2 latent variable. We utilized the latent variable z2 to construct the multimodal space for learning the multi-task GP. Therefore, the primary objective of the co-orchestration AE algorithm was to reconstruct the compositional dependencies (profiles) of the z2 latent variable for each modality, leveraging the correlation between them to expedite the process (Fig. 3b, d and f). We observed a predominantly linear downward trend with a few noticeable outlying points in the dependence of z2 on the local composition (location) for the Raman measurements (Fig. 3b). The z2 values exhibited a growing variability as the composition shifted toward pure BFO. More intriguing dependencies were observed in the latent dependencies of the BEPS modalities. The z2 for the BEPS polarization dependence exhibited an extremum at the midpoint of the exploring compositional range (Fig. 3d). The downward trend preceding the extremum transitioned into a gradual raise in the second part of the profile, accompanied by a noticeable increase in noise levels. This extremum position also aligned with a sharp drop in the BEPS frequency latent variable profile (Fig. 3f), where z2 maintained a constant value before this descent. The significant alterations in the middle of the library were associated with the phase transition of the system to the ferroelectric state.
The LVAE, trained with the data available at each exploration step, was employed to encode the entire dataset, establishing the step-specific ground truth (GT) for the z2 compositional dependence (represented by the dashed lines in Fig. 4). In both experiments, the ground truth z2 profiles for both modalities maintained their shapes consistently throughout the entire exploration. However, random latent variable profile reflections relative to the horizontal line were observed. Toward the end of the experiment, we observed quite accurate GP predictions for both modalities in the first experiment and specifically for Modality 1 in the second experiment. The reconstruction of the latent variable profile for Modality 0 in the second experiment was hindered by the presence of a sharp drop. For effective reconstruction, it is essential to employ a structural GP and introduce a suitable mean function.
For a more consistent analysis, we monitored the evolution of the key characteristics of the multi-task BO throughout the entire exploration in both experiments (Fig. 5). The experiments exhibited similar behavioral patterns but with some noticeable differences from each other.
The overall downward trends in the kernel lengths, B01 coefficients of the coregularization matrices (see the ESI†), prior noise, and GP uncertainties were consistent and preserved in both experiments. The gradual reduction of the GP uncertainties, asymptotically approaching zero as the experiments advanced, reflected the exploration process (Fig. 5g and h). Interestingly, the rate of decrease in GP uncertainties was equal for both modalities. The incremental reduction in B10 provides evidence that multi-task coupling played a significant role at the beginning of the experiment, but its importance diminished progressively with the ongoing exploration (Fig. 5c and d). There were noticeable jumps in the B10 coefficients during the final exploration steps, particularly evident in the second experiment, corresponding to the occasional appearance of compositional profile reflections relative to the horizontal line (Fig. S3†). The prior noise for Modality 0 (BEPS frequency) in the second experiment exhibited higher values and a more noticeable variability (Fig. 5f). This peculiarity aligns with the sharp drop of the z2 value in the middle of the compositional profile. To reconstruct such dependencies, structural GP is commonly employed, whereas the implementation of vanilla GP (including the MTGP used here) demonstrates limited effectiveness.
We observed a gradual rising trend in the kernel lengths of the latent processes at the beginning of the AE with subsequent saturation in both experiments (Fig. 5a and b). However, during the final steps of exploration in the first experiment, there was a noticeable reduction in the kernel lengths (Fig. 5a), which was accompanied by a gradual decrease to zero for B10 for both latent processes (Fig. 5c). We interpret this phenomenon as evidence of the existence of two subsequent exploration phases in the first experiment. During the main part of the exploration, the model endeavors to capture the general trends in the compositional profiles of the modalities and the correlations between them. This leads to the gradual adjustment and stabilization of both the coregularization coefficients and the kernel lengths of the latent processes, consequently facilitating a smooth decrease in the GP uncertainties and prior noises (Fig. 5e and g). Once general trends are discovered, the algorithm transitions to the second phase, where the noise characteristics of the profiles are explored, resulting in a reduction of the kernel lengths and a decrease to zero in the coupling coefficients. The transition to the second phase may be regarded as the trigger to conclude the exploration. We speculate that, in the second simulation, the process did not reach the point of transitioning to noise exploration.
To evaluate the performance of the co-orchestration workflow in comparison to independent vanilla GP explorations, we trained vanilla GPs for both modalities at each iteration using identical kernel lengths and noise prior distributions as those employed in the MTGP (Fig. 6). The RMSE and CRPS metrics were calculated relative to the step-specific ground truth. As the evolution of the absolute values of the RMSE and CRPS throughout the experiment (Fig. 6a–d) was not highly informative due to the constant changes in ground truth compositional profiles, we analyzed the ratio of the error metrics estimated for the MTGP and the vanilla GP (Fig. 6e–h). In regions where the ratios RMSEMTGP/RMSEGP and CRPSMTGP/CRPSGP were less than one, the MTGP algorithm outperformed the vanilla GP.
During the exploration, we identified several areas where the MTGP and the vanilla GP exhibited varying performance relative to each other. In the very first few steps in both experiments, the vanilla GP showcased a higher RMSE but much better CRPS, accompanied by fast changes in the B10 coefficient (Fig. 5c and d). Such behavior may be explained by the lack of acquired information at the initial stage necessary to build an effective coupling model, leading to controversial results. We speculate that in this area, the performance of MTGP may be higher as well as lower than the vanilla GP depending on the specific shape of the compositional profiles. In the first experiment, at most of the steps after the initial region, MTGP performed better or equal to the vanilla GP by both metrics. However, at the terminal stage of exploration, the vanilla GP showed better RMSE results for Modality 1, while for Modality 0, both algorithms demonstrated equal RMSE values (Fig. 6e). In contrast, MTGP demonstrated an advantage in CRPS for Modality 1 and performed equally to the vanilla GP for Modality 0 (Fig. 6g). We attributed this behavior to the vanilla GP's quicker transition to noise exploration in the compositional dependencies of Modality 1 compared to MTGP, resulting in a better RMSE metric for the noisy ground truth. The second experiment produced notably controversial results, as both the RMSE and CRPS ratios consistently remained at or above one after the initial stage for Modality 1 (Fig. 6f and h). Interestingly, for Modality 0, where a sharp drop was presented in the compositional profiles, MTGP showed an advantage in both metrics till the last 6 steps, where the RMSE ratio became higher than one. Hence, for Experiment 1, we cannot conclusively state that the co-orchestration with MTGP outperformed the vanilla GP, at least in terms of reconstructing the step-specific compositional profiles at each iteration.
In the final stage of our investigation, we delved into the possibilities of identifying the point at which the shift from the initial co-orchestration to a steady co-orchestration state becomes feasible. The Kolmogorov–Smirnov (KS) criterion, which measures the maximum difference in cumulative distribution functions, was utilized to evaluate the divergence between the distributions of VAE latent variables and estimate their stability throughout the exploration. At each step, we encode the entire distributions using the VAE, trained on the available points. After that, we calculated the ground truth KS (GT KS) criteria between the encoded distribution and the ground truth latent distribution for the corresponding modality (Fig. 7, diamonds). Moreover, KS criteria were also computed at each step to compare a specific latent distribution (represented by only available points) with the distribution from the previous step (Fig. 7, circles). We calculated the criterion for each iteration exclusively for the modality to which a data point was added during that specific step. It should be noted that in the real AE, only the KS criterion could be defined, while GT KS was available only in our simulation.
![]() | ||
Fig. 7 Kolmogorov–Smirnov criterion evolution defined for the z1 and z2 latent variables for both the (a) first and (b) second experiments. |
The KS and GT KS criteria presumably indicated a downward trend, except for the final two steps in Experiment 1, in the latent distributions of the linearized z1 variable (Fig. 7). However, the KS criteria for the z2 latent distribution exhibited distinct behaviors in the first and second experiments. In Experiment 1, when the latent distributions were stable and exhibited minimal changes with ongoing simulation, we observed the stabilization of the KS and GT KS criteria around 0.2 (Fig. 7a). The steady and small values of the KS criterion throughout the exploration can be considered indicative triggers for transitioning toward a stable co-orchestration. Oppositely, in the second experiment, the instability of the latent distribution of Modality 1 (BEPS frequency) resulted in elevated values to be observed in both the KS and GT KS criteria (Fig. 7b). Developing approaches to enhance the stability of the latent distribution, particularly by mitigating random reflections, is an important future task.
We proposed the utilization of variational autoencoders to encode raw spectra into low-dimensional representations. To improve the orientational stability of the VAE latent distributions throughout the experiment, we introduced the linear VAE model. In the LVAE, a special custom loss was incorporated alongside the standard reconstruction and KL divergence losses during model training to stabilize the resulting representations. Ensuring the stability of the latent distribution throughout the experiment enables the acquisition of steady compositional dependencies of the latent variables. The stability of the compositional profiles of latent variables, in turn, allows a possible transition from the initial co-orchestration stage, where the VAE and MTGP are trained from scratch, toward a steady co-orchestration characterized by incremental learning.
The capabilities of multimodal co-orchestration were validated by autonomous experimentation simulations in the Sm-BFO combinatorial library. The workflow demonstrated its effectiveness at optimizing the exploration trajectory, especially when latent variables exhibit smooth compositional dependencies. Nevertheless, partial effectiveness was also demonstrated in multimodal co-orchestration with a piecewise compositional dependency. The KS criterion, computed for the latent distributions at the subsequent exploration steps, is proposed as an indicative signal to determine the possibility of transitioning from the initial to the steady co-orchestration stages.
The co-orchestration workflow is expected to significantly enhance the efficiency of combinatorial library exploration, thereby narrowing the gap between the current synthesis and characterization rates. Further potential improvements could encompass the development of advanced methods for stabilizing VAE representations, the incorporation of structured GPs, and the implementation of cost-aware policies for multimodal Bayesian optimization.
We believe that novel multi-task approaches mark the beginning of a new chapter in autonomous experimentation. The proposed co-orchestration workflow offers a flexible and reliable algorithm for the efficient exploration of combinatorial libraries and similar material systems.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00109e |
This journal is © The Royal Society of Chemistry 2024 |