Orchestrating nimble experiments across interconnected labs

,


Introduction
Materials Acceleration Platforms (MAPs) 1 aim to leverage modern artificial intelligence (AI) algorithms to accelerate the discovery of molecular and solid state materials.For solid state materials, automation and integration with computation 2 has been historically achieved via combinatorial methods 3,4 and recently implemented using autonomous or self-driving labs. 5,6The pace of advancement in experiment automation is staggering and motivates rethinking of how instruments are made and controlled.Initial efforts in AI-guided automated workflows have naturally focused on achieving super-human efficiency of data acquisition.Here, automated experiment selection alleviates reliance on human researchers but does not supplant human governance of the scientific line of inquiry and the strategy for its exploration. 7ecent developments in physics and chemistry-aware models 8,9 as well as hypothesis learning algorithms 10 exemplify the everincreasing sophistication of automated experiment design.While such algorithms do not yet rival the decision making of human experts, the trajectory of AI algorithms clearly indicates that the frontier of experiment automation is not the automation of india Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA 91125, USA; * E-mail: guevarra@caltech.edu,gregoire@caltech.edub Liquid Sunlight Alliance, California Institute of Technology, Pasadena, CA, USA c TUM School of Natural Sciences, Department of Chemistry, and Munich Data Science Institute, Technical University of Munich, Munich, Germany + present address: Good Terms LLC, CO, USA ‡ present address: deepXscan GmbH, Dresden, Germany vidual experiments but rather entire workflows and their ensembles.
Expanding upon the vision of interconnected workflows for AI emulation of team science, Bai et.al 11 have envisioned worldwide coordination of self-driving labs driven by the rapidly evolving fields of knowledge graphs, semantic web technologies, and multi-agent systems.Ren et al. 12 emphasize the critical need for interconnected laboratories to leverage resources and learn epistemic uncertainties.To realize this collective vision, experiment automation software that builds upon the state of the art [13][14][15][16][17][18][19][20][21][22][23][24] must be developed to interconnect laboratories and their research workflows.A hallmark of human scientific research is on-thefly adaption of experimental workflows based on recent observations.Human scientists also interleave workflows spanning materials discovery to device prototyping. 25The interleaved execution of multiple workflows typically involves shared resources, which is often a practical necessity for minimizing the capital expense of establishing any experimental workflow.These considerations require "nimble" experiment automation, and in the present work we describe our approach to automating nimble, interconnected workflows via asynchronous programming.Parallel execution of automated workflows can be realized via "Interconnected workflows" wherein a human or machine Science Manager oversees multiple automated workflows that each run on their own central processing unit (CPU).This scheme is appropriate when each workflow can operate independently.If 2 workflows share an experiment resource, one strategy for their automation is to integrate them into a hybrid workflow that is executed from a single CPU.The shared runtime among all processes in a single CPU inherently limits operational flexibility, a runtime interdependence that is impractical for interconnected labs.In addition to addressing the needs of shared resources, asynchronous programming provides the requisite flexibility for experiment automation tasks such as passing messages, writing data to files, and polling devices.These needs can be partially fulfilled with multi-thread programming, although we find asynchronous programming to be a more natural solution.For a more technical discussion of the difference between asynchronous programming and threading, we refer to the reader to Ref. 26 .

Results and Discussion
As an example of a shared resource, consider a lab in which a central piece of equipment such as an X-ray diffractometer or reactive annealing chamber is used in several distinct workflows.Traditional methods of experiment automation would involve each workflow taking ownership of that equipment during the workflow's runtime.Combining all workflows in a single instance of automation software limits the ability of different workflows to start and stop as dictated by science management and/or equipment maintenance needs.Human researchers address this challenge by creating a system to schedule the requested usage of shared equipment, and the automation analogue is to have the shared equipment operated by a broker whose runtime is independent of all other workflow automation software.This is the central tenet of "Nimble interconnected workflows" as depicted in Fig. 1, wherein each resource family is controlled by an asynchronous Resource Manager.The runtime independence of Resource Managers and Workflow Orchestrators enables each Orchestrator to maintain focus on a single research workflow while empowering a Science Manager to coordinate efforts across many workflows in any number of physical laboratories.
The series of workflow automation capabilities illustrated by Fig. 1 has been largely mirrored by the evolution of reported automation software for materials chemistry.Seminal demonstrations of software for automating an experiment workflow include ARES, 13 ChemOS, 14 and Bluesky. 15Continual development of these platforms have resulted in new capabilities such as the generalized ARES-OS 16 and remote operation with Bluesky. 17ChemOS 14 and ESCALATE 27 have increasingly incorporated ancillary aspects of automation such as encoding design of experiments and interfacing with databases.These efforts have built toward multi-workflow integration in HELAO 18,19 and NIMS-OS 20 as well as object-oriented, modular frameworks for codevelopment of multiple MAPs 21,23 and multi-agent automation frameworks. 22Enabling independent operation of workflow components ultimately requires asynchronous programming, as envisioned by the present work and ChemOS2.0. 24The abstraction of lab equipment as asynchronous web servers is implemented in HELAO-async using FastAPI, 28 with alternative options including SiLA2, 29 as demonstrated in ChemOS2.0.
The implementation of "Nimble interconnected workflows" in HELAO-async is outlined in Fig. 2. A Science Manager (see Fig. 1) is implemented as an Operator for active science management and Observer for passive science management.The Orchestrator manages workflow-level automation, which generally involves launching a series of Actions on the workflow's suite of Action Fig. 2 The HELAO-async framework is outlined as a specific implementation of the "Nimble interconnected workflow" concept from Fig. 1.Orchestrators manage workflows by controlling Action Servers that manage resources via Device Drivers.By establishing an independent FastAPI server for each Orchestrator and Action Server, workflows have independent runtimes while sharing resources as needed.A human or AI Operator manages the collection of Orchestrators, and an Observer can consume data streams from any server.The use of FastAPI endpoints and websockets for these interactions creates flexibility for the implementation of Operators and Observers.
Servers.In the parlance of Fig. 1, Action Servers are the Resource Managers, which execute actions via Device Drivers that comprise the Hardware/Software Resources.The design principle of this framework is to enable asynchronous launching of workflows via Operator-Orchestrator communication, as well as asynchronous execution of workflows via Orchestrator-Action Server communication.When multiple Orchestrators share a resource, queuing and prioritization are managed by the respective Action Server.
The HELAO-async implementation described herein is intended to be agnostic with respect to the type of Operator and Observer, which may involve any combination of a human researcher, an autonomous operator selecting experiments via an AI-based acquisition function, or a more general broker 19 for coordinating experiments across many workflows.The scope of an Orchestrator is that of a single workflow, and by implementing each Orchestrator as a FastAPI server, the parameterized workflow is exposed to the Operator via custom FastAPI endpoints.Fig. 2 also depicts Observers, which can subscribe to the FastAPI websockets established by an Orchestrator and/or Action Server.Each Orchestrator and Action Server must be programmed to publish data of interest to websockets, which enables any number of Observers to listen-in as needed.Our common implementation to-date is a web browser-based Observer (a.k.a.Visualizer) that researchers can launch to monitor quasi-real-time data streams, a critical capability for experiment quality control.
The asynchronous operation at the Orchestrator and Action Server levels was particularly motivated by hierarchical active learning schemes, for example human-in-the-loop 30,31 or fully autonomous hierarchical active learning. 32When workflow-level decisions are made by a human with autonomous workflow execution, a community of asynchronous Orchestrators enables humans to execute on any available workflow.We envision that this mode of operation will be critical for integrating physicallyseparated laboratories and realizing cloud laboratories. 11,12,33ELAO-async is being actively developed in 2 public repositories: https://github.com/High-Throughput-Experimentation/helao-coreencompasses the API data structures and https://github.com/High-Throughput-Experimentation/helao-asynccontains instrument drivers, API server configurations, and experiment sequences.These repositories contain drivers for our suite of experimental resources spanning motion control, liquid handling, electrochemistry, analytical chemistry, and optical spectroscopy.Our Orchestrator-level implementations include scanning droplet electrochemistry, 34 scanning optical spectroscopy, 35 electrochemical cells with scheduled electrolyte aliquots for monitoring corrosion, 36 and several methods of coupling electrochemical transformations with analytical detection of the chemical products. 37These latter 2 examples share the need for liquid and/or gas aliquoting from operational electrochemical cells, for which we typically use a Tri Plus robotic sample handling system (CTC Analytics), which is a shared resource across multiple workflows.
Given our safety and security protocols, readers of this manuscript may not execute HELAO-async code with our laboratory equipment.While we encourage the duplication and adaption of our hardware and/or software for operation in other labs, we have built a virtual demo for the present purposes of introducing HELAO-async.To create a minimal implementation of Fig. 2, the demo contains 2 independent Orchestrators, each with a dedicated Action Server that simulates the acquisition of electrochemistry data to characterize the overpotential for the the oxygen evolution reaction (OER).We have packaged previously-acquired electrochemistry data with the demo. 38The 2 Orchestrators share a common resource, which in practice may be the robotic sample handling system.In the demo, the shared Action Server is an active learning agent that manages requests for new acquisition instructions from the 2 Orchestrators.This shared-resource Action Server runs independently and is unaffected by the runtime of each Orchestrator, which is demonstrated in the demo by independently starting and stopping the Orchestrators, representing the asynchronous operation of research workflows within one or across multiple laboratories.Running the demo batch script will open 5 user interface browsers, 2 Operators that control the respective Orchestrators, 2 Visualizers (Observers) that show the data streams from the respective electrochemistry Action Server, and a Visualizer (Observer) for the shared active learning Action Server.This Visualizer shows the progress of the active learning campaign, including the contributions from each of the indepen-Fig.3 A screenshot from the HELAO-async demo.The command line windows are a. the miniconda python environment from which the demo was launched, b. the instance of the first Orchestrator "demo0" and its Visualizer, c. the instance of the second Orchestrator "demo1" and its Visualizer, and d. the instance of the Visualizer for the shared Action Server "GP SIM".e,f.The web user interfaces for the Operators for the Orchestrators.While various experiment controls may reside in these web interfaces, the demo involves automated execution of active learning campaigns as programmed in the sequence "OERSIM_activelearn."The three Visualizers are g.for the GP SIM Action Server, as well as h.and i. for the electrochemistry Action Servers orchestrated by demo0 and demo1, respectively.The GP SIM visualizer (g.) shows (top) the most recent communication with an Orchestrator, (bottom) a list of recent acquisitions, and (middle) histograms showing the distribution of catalyst overpotentials.The 4 histograms correspond to the 2 combinatorial libraries associated with their respective Orchestrator and and the 2 types of overpotentials, those previously measured and those predicted by the Gaussian Process for unmeasured compositions.
dent Orchestrators.A snapshot of these 5 web browser interfaces is shown in Fig. 3. Due to the use of static random seeding of the active learning, the demo runs deterministically, where the contents of Fig. 3 show the status approximately 17 minutes after launching the demo batch script.
While the HELAO-async schematic of Fig. 2 indicates the intended role and scope of each FastAPI server with respect to the universal research roles summarized in Fig. 1, there remains flexibility in how to implement HELAO-async for a given workflow or ensemble of workflows.Regarding the scope of a single Action Server, a set of resources may be bundled in a single FastAPI server based on i) their intended use as a grouping of shared resources, ii) the need for synchronization among the resources, or iii) safety-related interdependencies.As examples, consider i) an autosampler for a piece of equipment and the piece of equipment itself, which may have distinct drivers but will always be used together so it is best to code their joint actions in an Action Server and abstract the joint action of sampling and measuring as a single FastAPI endpoint; ii) an isolation valve and a pump where the isolation valve needs to be opened before the pump starts and the pump needs to stop before the isolation valve is closed, which are couplings of driver steps that are best hard coded within a single Action Server; iii) a set of motors where the limits of the first motor depend on the position of the second motor, for which evaluating the safety of a given motor movement is best done within a single Action Server.While these examples illustrate why multiple resources should be bundled in a single action server, the primary counter examples involve resource sharing and hardware/software modularity.Programming the action queuing and prioritization for an Action Server that is shared among multiple Orchestrators is best done by minimizing the set of resources in the Action Server.To best leverage Action Server code for multiple physical instantiations of resources, the scope of an Action Server should be limited to the set of resources that are always implemented collectively into a workflow.
The multi-Orchestrator demo described above additionally illustrates optionality with respect to the implementation of AIguided design of experiments.If the AI agent is intended to be a Science Manager across multiple workflows, it should be implemented as an Operator in HELAO-async.However, in the demo, the active learning engine is implemented as an Action Server that is a shared resource for the 2 Orchestrators.In this case the Science Manager is a human who configured each Orchestrator to receive guidance from the active learning Action Server, which is a prudent mode of operation when the human will routinely switch an Orchestrator from executing experiments according to AI vs human guidance.As such, this demo may be prescient in the context of instrument control using large language models that integrate human and AI design of experiments. 39While selfdriving labs have traditionally been constructed with the AI acquisition function as the Operator, the future of MAPs will likely relegate active learning to be a resource that facilitates but not governs operation of automated experimental workflows.The asynchronous programming and implementation of workflows as networked servers in HELAO-async are designed to enable development of individual automated workflows followed by their seamless interconnection with additional workflows that can be managed by any combination of human and artificial intelligence.

Conclusions
The advent of self driving labs as a fully autonomous implementation of a materials acceleration platform has broadened the purview of experimentation automation beyond traditional high throughput experimentation.Concomitantly, the materials automation community has envisioned international networks of laboratories that may be controlled by multi-agent AI systems and/or human-in-the-loop active learning.As new automated workflows are developed in isolation and then fed into broader networks of capabilities, instrument control software must be prepared to make the transition from single-workflow orchestration to participation in many-workflow automation schemes.Traditional lab automation software cannot effectively manage sharing of resources among workflows while maintaining an independent runtime environment for each workflow.We introduce HELAOasync as a framework for facilitating the automation of individual resources, their incorporation into workflows, and the interconnection of workflows.Combining object-oriented and asynchronous programming, HELAO-async abstracts resource managers and workflow orchestrators as FastAPI servers.Communication, both between servers and with additional programming instances such as web user interfaces and AI-driven experiment brokers, is implemented using FastAPI websockets and endpoints.In addition to a demo virtual instrument to facilitate learning about HELAO-async, the present work introduces the open source code repositories that house the automation software for a suite of materials acceleration platforms in the high throughput experimentation group at the California Institute of Technology.

Fig. 1
Fig.1illustrates the hierarchical components of a scientific workflow, with the highest hierarchy being a Science Manager that designs research projects and strategies for their implementation.The Workflow Orchestrator implements these strategies as (dynamic) series of experiments.Execution of the workflows re- Fig.1The hierarchies of research scope, which are intended to generally describe communication among entities for any research modality, are illustrate as a Science Manager that establishes projects and strategy, Workflow Orchestrators that manage the implementation of the strategy in an experiment workflow, Research Managers that process experiment instructions and allocate the necessary resources, and the Hardware/Software resources with which the experiments are performed.With this ontology, 3 generations of experiment automation software are outlined in which an individual workflow is automated, interconnected workflows are run in parallel, and interconnected workflows are operated with flexible management that enables a distinct runtime for each Orchestrator and Research Manager.