Progress and prospects for accelerating materials science with automated and autonomous workflows

Integrating automation with artificial intelligence will enable scientists to spend more time identifying important problems and communicating critical insights, accelerating discovery and development of materials for emerging and future technologies.


Introduction
Grand missions, such as combating climate change through proliferation of renewable energy technologies, necessitate technological advancements for which discovery of functional materials is oen a prerequisite. 1,2 Historically, transformative materials discoveries have been the result of serendipity from experimenting in a related area and/or decades of systematic materials development. 1 Early examples of automated synthesis and screening techniques were implemented [3][4][5][6][7][8][9][10][11] to accelerate both processes, 12 for example in the identication of a hysteresis-free shape memory alloy. 13 Continued automation of materials experiments is motivated by potential benets including lowering per-experiment costs and eliminating human error, and to enable active learning-driven experiments that identify and explore the most promising regions of materials parameter space. 12,14 In solid state materials science, advancements in automation have largely been driven by the combinatorial materials science community, where comprehensive exploration of a high dimensional materials parameter space requires a substantial number of synthesis and screening experiments. While these efforts have provided automation of individual research tasks for a wide variety of materials and functional properties, manual execution of several experiment Dr Stein conducts research at the Intersection of Laboratory Automation, Data Science, and Materials Science to unravel composition-structure-processing-function relationships in energy related materials. As an alumni in physics from Georg-August Universitaet Goettingen he graduated as a doctor of engineering at Ruhr-Universitaet Bochum with summa cum laude in 2017. He works with Dr Gregoire at the Joint Center for Articial Photosynthesis at Caltech to discover new and improved materials for renewable energy storage and production.
Dr John Gregoire leads the High Throughput Experimentation group at Caltech where he is also the Thrust Coordinator for Photoelectrocatalysis in the Joint Center for Articial Photosynthesis, a U.S. DOE Energy Innovation Hub. His research team explores, discovers and understands energy-related materials via combinatorial and high throughput experimental methods and their integration with materials theory and articial intelligence. The group seeks to accelerate scientic discovery by automating critical components of materials discovery workows, from synthesis and screening to data interpretation. steps, as well as manual design of experiments and data interpretation, result in partially-automated workows. The emerging vision of autonomous materials discovery 12,15 requires a higher level of automation. Establishment of an autonomous workow is referred to as "closing the loop" since complete task-to-task integration is required to allow computercontrolled iteration. Initial 14,16 and ongoing progress towards realizing such closed-loop systems can be tracked by the level of process automation and integration in a workow.
Sanchez-Lengeling and Aspuru-Guzik 17 recently described the advent of closed-loop experimentation as a paradigm shi in materials and molecular discovery. The illustration of Fig. 1 provides the high level template of a closed-loop workow, and in the present work we critically review the progress towards this vision in solid materials experiments. The integration of sequential automated processes is challenging due to the need for mutually compatible parameters and planning, with requirements spanning from a commensurate sample format, to a protocol for decision-making based on results from the prior experiment, and to the identication of measurement failure. To facilitate the analysis of where process integration has been successfully implemented as well as the remaining challenges, we present a framework and ontology for the automation of the materials experiment lifecycle.
The exploration of vast materials spaces (i.e. composition, structure, processing, morphology) via combinatorial materials science has yielded a wide variety of discoveries and advancements in fundamental knowledge 14,[18][19][20] and has additionally produced experiment databases with unprecedented breadth of materials and measured properties, as exemplied by the recent publication of the High Throughput Experimental Materials database (HTEM) 21 based on photovoltaics materials and the Materials Experiments and Analysis Database (MEAD) 22 based on solar fuels materials. These compilations of raw and analyzed 23 data from individual combinatorial materials science laboratories complement the suite of computational materials databases 60,61 as well as a rapidly growing number of materials data repositories including the Citrination platform, 24 the Materials Data Facility (MDF), 25 and text mining of the literature. 26 For the purposes of the present analysis of automating 12,16,27 materials science workows, these databases serve as successful examples of experiment automation and as resources that can be used to accelerate experiment planning, for example by training machine learning models to identify promising materials. In such planning, it is important to note complementary search goals of optimizing a given material property and establishing relationships that represent fundamental materials knowledge. Mapping composition-structureprocessing-function relationships 28-30 is a tenet of combinatorial materials research, [28][29][30] which contrasts with direct implementation of active learning to optimize 31 one or a few properties without requiring acquisition of data to elucidate the underpinnings of the materials optimization. Indeed the experiment workow and its operation must be designed to meet the specic research goals, although workow automation is important for accelerating many different modes of discovery.
We discuss the lifecycle of materials science experiments and the three primary stages of workow acceleration, (i) the integration of new techniques into traditional research tasks to accelerate process throughput, (ii) the integration of research tasks into a cohesive workow to mitigate bottlenecks, and (iii) integration of tasks with automated analysis and decisions to close experiment loops and enable autonomous iteration thereof. We nd that the solid state materials science community has demonstrated tremendous progress in the rst stage, substantial progress in the second stage including high throughput workows, and seminal demonstrations in the third stage with relatively simple workows, making concurrent

The experimental materials science research lifecycle
At a high level, the experiment lifecycle † for functional materials discovery consists of a set of core research tasks: synthesis, processing, characterization and performance evaluation. This set transcends the specic techniques used to perform each task, and their generality is evident in their consistent discussion in reviews, 1,32 laboratory workow descriptions, 6,33,34 and database designs for high throughput materials science. 5,6,10,32,[35][36][37] Oen unmentioned, though virtually always performed, are the additional core research tasks of planning, data management, data interpretation, and quality control. Individual and sequences of experiments require these tasks, with the extent and style varying with research strategy. In a traditional materials experiment, the 4 experiment tasks are performed manually, as are the complementary 4 tasks, for example planning via a stated hypothesis and data management via lab notebooks. The corresponding workow can be represented as shown in Fig. 2a and represents the foundation on which more advanced and accelerated workows are built. As noted above, the rst stage of workow acceleration involves implementation of techniques we refer to as "accelerators" into one or more of the workow tasks. Classifying all possible accelerators is more subjective than the above classication of work-ow tasks, and for the present work we nd the 6 accelerators noted in Fig. 2b enable effective annotation of experimental workows from the literature. Some accelerator-task combinations are readily achievable, for example parallelization of processing by annealing multiple materials in a furnace. Other combinations may not be meaningful, such as active learning of data management. Of the many combinations that are both meaningful and impactful, some have been effectively realized while others are opportunities for further experiment acceleration, as summarized below for each accelerator.

Automation and parallelization
Automated execution of a serial experiment typically involves incorporation of robotics into a traditional experiment. Parallelization typically involves development of custom instrumentation to perform many experiments simultaneously. Both approaches are commonly used in combinatorial materials science where accelerated synthesis techniques include cosputtering, 6 co-evaporation, 10 ink-jet printing, 38 combinatorial ball-milling, 39 high-throughput hydrothermal synthesis, 40,41 and bulk ceramic hot-pressing. 42 Similarly, the acceleration of the characterization of materials properties and evaluation of performance for a target functionality have been the focus of extensive methods development in the past two decades, with notable demonstrations including electrochemical testing, 43-46 X-ray diffraction, 47-49 processing, 9,50,51 optical spectroscopy, 52,53 electric properties, 65,66 shape memory, 13,54 and phase dynamics. 9 These advancements in experiment automation have undoubtedly led to discoveries that would not have been made in the same time frame using traditional techniques. Automation and parallelization-based removal of synthesis and characterization bottlenecks introduces new challenges for further acceleration of materials discovery, which are generally being addressed with data and data science-related accelerators.

Data repositories
As noted above, the emergence of experiment databases from high throughput experimentation offer opportunities for databased accelerations. The established uses of data repositories for accelerating research tasks include the data interpretation for crystallography by matching X-ray diffraction patterns to those from a database, 55 planning synthesis based on phase diagrams, 56 and planning catalyst performance evaluation using computational databases of Pourbaix stability. 57,58 Datadriven discoveries are typically enabled by a data repository produced via careful data management. While guidelines such as FAIR 59 exist, these general guidelines focus on data dissemination and do not express the data management requirements for establishing autonomous loops, which require fully automated data ingestion and seamless communication between experimental tasks.

Machine learning
Acceleration by Machine Learning (ML) models encompasses a broad range of applications of computer science algorithms to perform regression, classication or embedding tasks. The recent literature abounds with discussions of the existing and potential impact of ML in materials research. Given recent reviews covering this topic, 62 the present discussion focuses on its role in experiment workows. ML-based acceleration of research tasks typically involves either research planning or data interpretation through evaluation of ML models trained on prior data. Representative examples include selection of composition spaces for exploring metallic glasses based on ML predictions of glass forming ability 70 and identication of ultraincompressible materials. 71 ML methods have also been developed to accelerate data interpretation in areas including phase mapping from XRD patterns, 18 microscopy data, 51 signal identication in spectroscopy data, 73 annotation of microstructure images, 74 and visualization of complex compositions. 34,73 ML methods can also be developed into active learning and reasoning techniques, although due to their different roles with respect to experiments, those techniques are discussed separately, as detailed below.

Active learning
Active learning involves the choice of the next experiment based on an acquisition function that typically requires a prediction for a gure of merit and the uncertainty thereof. 75 ML models are used for the prediction and uncertainty estimation, with a distinguishing feature of active learning being the need to update the model in real time during execution of the experimental workow. Active learning is a key component of closedloop workows that can ultimately yield self-driving laboratories. 44 Algorithms such as Phoenics 63 have been specically developed for chemistry experiments and integrated into workow management soware such as ChemOS. 64 The carbon nanotube (CNT) autonomous research system (ARES) project, 65 which is discussed further below, is an example of a closed-loop system of a workow where tasks such as data interpretation are readily automated. There have been additional implementations of active learning in materials science to accelerate individual tasks, for example by acquiring only the necessary X-ray diffraction patterns for phase diagram characterization. 66 Sophisticated examples of active learning in related elds including functional genomics, 67 separations optimization, 64 and multi objective molecular optimization for small molecule drug discovery. 68 While many optimization-oriented searches are amenable to acceleration via active learning, its utility for materials discovery has yet to be sufficiently explored and demonstrated, making the above examples a springboard for assessing the ability of active learning to accelerate complex experimental workows and the generation of fundamental understanding in materials science.

Automated reasoning
For complex measurement workows where competing interpretations of the data need to be considered or a model needs to be reinterpreted given the most recent measurements, the data interpretation, quality control, and planning tasks are not readily automated with existing algorithms, motivating the development of automated reasoning to accelerate these tasks with AI methods that mimic and/or supersede human execution of these tasks (i.e. "superhuman performance" 69 ). Examples of automated incorporation of physics and chemistry-based models into such tasks include tuning the morphology of a thin lm based on a structure zone diagram 51 and ne-tuning the composition to obtain a desired doping type in semiconducting metal oxides based on spinel doping rules. 70 The opportunity for AI development in this area is the topic of a recent perspective, 69 and among the promising research directions is the establishment of generative models that expand the purview of active learning to design materials based on desired properties. 71 While inverse design has been successfully demonstrated for discovery of functional materials, 70-73 integration into automated workows remains a challenge for solid state materials research. The corresponding high level challenge for closed-loop experimentation of solid state materials is that the scope of a given automated synthesis tool is oen quite limited compared to the scope of materials that may be predicted by an active learning or inverse design algorithm. In organic synthesis, for example, there has been more success in developing workows that encompass the entirety of the synthesis scope of interest, enabling deeper integration of automated reasoning. 17

Integration of tasks into a workflow
The most common type of accelerated discovery workow consists of an automation-accelerated synthesis and an automation-accelerated characterization or performance evaluation, followed by extensive manual analysis, interpretation, and planning of both additional characterization experiments and future iterations of the workow. Most commonly the highly automated instruments require manual interfacing (e.g. alignment, measurement parameter setup, supervision for quality control), where an increased human involvement corresponds to a lower degree of integration. To simplify the present discussion, we consider two classes of task integration with the distinguishing feature being whether expert involvement is required, which designates the integration as "expert mediated" and indicates the integration is incomplete. This level of integration is prone to creating bottlenecks due to the scarcity of experts. Technique integration by robotics is not distinguished from integration by trained technicians in the present work because the resulting impact on workow throughput requires more in-depth evaluation of the specic workow.
To further illustrate how accelerated materials experiments have been integrated, we inspect four reported projects and construct the corresponding workows in Fig. 3. Each workow exhibits unique aspects that collectively frame the state of the art in accelerated materials discovery and illustrate the intricacies of workow acceleration. The scope of each workow schematic is the sequence of tasks described in the respective publications, and the largest demonstrated equivalent of traditional experimentation is provided for each workow.
The primary example of closed-loop discovery in solid state materials science is the ARES project for carbon nanotube synthesis. Nikolaev et al. 14 demonstrated optimization of carbon nanotube growth with a workow that mitigates expertmediated integration and features acceleration by automation and active learning. Automated control of growth temperature, pressure, and atmospheric conditions enables a unique growth condition in each experiment, with a series of experiments performed by spatially addressing an array of seeds on a substrate. Processing and characterization are intertwined as laser illumination provides both heating and excitation for Raman spectroscopy, producing spectrograms that are analyzed to determine the nanotube growth rate. 14,65 With this materials characterization also providing the gure of merit, the workow contains no further performance evaluation. The automated data management and interpretation enables closed-loop operation for up to approximately 100 growth experiments planned by active learning-based selection of growth conditions. Expert intervention in this closed loop occurs occasionally (estimated to be 1-3%) to assess the quality of the active learning and adjust the objective as necessary. Upon exhaustion of the array of CNT growth seeds, manual intervention is required to change samples and restart the workow. The photoanode discovery pipeline in Fig. 2b represents the tiered screening by Yan et al. 20 that includes both theory and experiment-based down-selection of candidate metal oxides. With respect to the experiments, the computational screening is an accelerant and represented as such in the planning task. The Materials Project database 60 serves as the primary repository, with additional calculations specic to photoanode screening, and while these calculations are critical to the success of the work, they are not fully integrated into the experimental workow. Synthesis, processing, characterization, and performance evaluation are accelerated using automation, with tens to thousands of materials being synthesized or measured automatically. While this sequence of tasks is in principle amenable to more autonomous operation, setup and selection on meaningful experimental conditions are chosen by an expert, resulting in expert mediated linkages in the workow. The heavy use of parallelization and automation is supported by automatic data management and quality control, with data interpretation requiring expert mediation. A key attribute of this workow is the establishment of automated techniques for a large breadth of experimental tasks, from synthesis to performance evaluation, that can operate on libraries with up to ca. 2000 unique materials. 74 The research strategy involves collection of combinatorial materials datasets that facilitate data interpretation and scientic discovery, as well as evaluation of every prediction from the computational screening to assess its efficacy. These aspects of the research limit the value of further task-to-task integration and application of active learning, with the broader message being that the impact of the closed-loop concept varies with research strategy and goals.
The workow of Fig. 3c describes a different implementation of combinatorial materials science for studying functional materials where synthesis, processing and performance evaluation are accelerated by parallelization and automation with expert-mediated integration similar to that of Fig. 3b. The unique aspect of this work is the use of an active learning loop in the middle of the workow to accelerate the mapping of phase boundaries in a composition library, demonstrating the use of active learning in a sub-workow to accelerate a bottleneck experiment (and save valuable beamline time). The synchrotron X-ray diffraction (XRD) characterization described by Kusne et al. 66 includes on-the-y data interpretation and automated selection of the next composition for XRD measurements, with occasional expert supervision of the clustering-based identication of pure-phase patterns.
The atomic-scale phase evolution workow by Li et al. 29 illustrated in Fig. 3d uses a specialized nanometer sized reactor to assess phase stability with ca. 1 hour of experiment time yielding the same data as over 500 days of annealing in traditional bulk experiments. Using data repositories of phase diagrams and stability ranges of multicomponent complex metal alloys to plan synthesis, an array of 36 reactors is deposited, for example with equiatomic mixtures of the Cantor alloy Cr-Mn-Fe-Co-Ni. 75 The loop in this workow is based on the step-wise annealing of the reactor array with subsequent atom probe tomography (APT) characterization aer each processing step. Each APT characterization involves destruction of one of the reactors, and the number of reactors is made to be several times larger than the number of processing steps due to routine failure of the APT measurement. The critical advancement enabled by a small autonomous loop is the real-time monitoring of APT data acquisition with well-integrated quality control. Data interpretation is performed by comparison to external data and visualization is done through a machine learning model. 30,76 The richness of the APT data coupled with signicant annealing time reduction yields high throughput knowledge generation even though the workow contains mostly expert-mediated integration of tasks. Increased autonomy in the workow would only be warranted aer substantial advances in automated data interpretation.
For each of these workows, the nominal time to execute the entire workow is on the order of 1 day. The equivalent number of passes through a traditional workow, or the number of days of traditional experimentation to produce the equivalent data, provides the nominal acceleration factor of the workow, which is only equal to the acceleration factor of knowledge discovery if the selection of experiments and quality of the resulting data is equivalent to those of traditional experiments. Assessment of such data value is beyond the scope of the present discussion but remains a critical consideration for quantifying workow acceleration, particularly in settings where the research goals involve understanding the underlying materials science as opposed to performance optimization. Fig. 4 Visualization of the landscape of materials experiment workflow in terms of the scientific complexity of automated tasks and the workflow automation complexity, which is based on the number, variety, speed, and difficulty of experimental steps in the workflow. The advancements in combinatorial materials science and high throughput experimentation (CMS/HTE) have been largely along this latter (horizontal) axis, and initial demonstrations of autonomous loops have made progress on the former (vertical) axis with automation of more intellectually challenging research tasks. The nominal location of the 4 workflows from Fig. 3 are noted by stars. While research will push the frontier of automated experiments along both axes (arrows with italics), the most complex scientific tasks will remain the responsibility of human experts for the foreseeable future.

Conclusions and outlook
The urgent need for better materials demands faster turnaround cycles from basic research, such that better, more efficient, more eco friendly, and more economically viable materials can enter the market sooner than the traditionally observed 40 years. 1 Accelerated materials experiment workows have been demonstrated to increase throughput by up to a few orders of magnitude compared to traditional methods. Surveying the reported workows reveals two primary areas for workow sophistication, the integration of sequential tasks without requiring expert involvement and the expansion of feedback loops to incorporate a larger fraction of the workow tasks. The ARES workow achieves both of these goals with a relatively small workow compared to the functional materials discovery research where the variety of characterization and performance evaluation experiments increases the number of workow tasks as well as the demands on data management, data interpretation, and quality control.
To visualize progress to date and the expected advances from ongoing research, Fig. 4 illustrates the continuum of materials workows in terms of the scientic complexity and workow automation complexity. To elucidate our intended meaning of scientic complexity, representative tasks spanning minimal complexity to very complex are listed. Arguably the most important aspect of a successful science program is the ability to identify interesting problems and ask the important questions that guide research activities. These tasks are beyond the purview of present autonomous research and will be for the foreseeable future. Advances in natural language processing for materials science may automate aspects of scientic communication, but critical analysis of the literature and communication of the insights provided by a given experiment will continue to rely on human intellect for the foreseeable future.
Determining the most effective advancements in a materials experiment workow requires critical evaluation of bottlenecks for progress against the research goals. Even when expert mediation is required between tasks, workow throughput is oen limited by the manual steps at the front and back ends of automated experiments. These peripheral activities, which fall under the intermediate "complicated" level of scientic complexity in Fig. 4, can be difficult (or currently impossible) to fully automate due to the routine use of expert knowledge, for example in judgement of data quality based on extensive previous experience with related data. Advances in articial intelligence (AI) for materials encompasses a wide variety of strategies for addressing these challenges, which will be critical for expanding the scope of autonomous loops. This approach to pushing the frontier of materials workows is illustrated by the "Materials AI" arrow in Fig. 4 and will ideally accompany the expansion of autonomous loops to include more complex and a larger variety of experimental tasks. This complementary approach to pushing the frontier of materials workows is illustrated by the "Build on HTE" arrow due to the demonstrated successes in experiment automation from the high throughput experimentation community. The ability to leverage this existing work makes autonomous workows more readily extendable into complex automation as compared to the extremes of complex scientic reasoning.
An outstanding question with regard to the next generation of experimental workows is how to best combat human biases that can severely limit innovation. 77 Advanced autonomous experimentation may remove biases within a given search space through computationally designed experiments. However, the scope of the search space is limited by both instrument capabilities and active learning strategy, whose designs originate with human identication of the materials space of interest. To the extent that human biases disseminate from the "complex" scientic tasks of Fig. 4, bias removal within an autonomous workow must be complemented by sociological solutions for removing bias in decisions beyond the experiment workow.
We are aware of several research groups that are building autonomous experiments in the "next generation" regime of Fig. 4, including emerging reports from perovskite synthesis 78 and molecular materials for of organic photovoltaics 79 and organic hole transport materials. 80 Continuation of these concerted efforts to increase automation and develop tailored AI algorithms will enable the materials science community to realize a paradigm shi in scientic discovery where expert scientists can dedicate a substantially larger fraction of their time to performing the critical tasks of identifying important problems and communicating critical insights.

Conflicts of interest
There are no conicts to declare.