Open Access Article
Yuya
Tahara-Arai
*ab,
Akari
Kato
c,
Koji
Ochiai
d,
Kazuya
Azumi
d,
Koichi
Takahashi
d,
Genki N.
Kanda
*ef and
Haruka
Ozaki
*ab
aLaboratory of Bioinformatics, University of Tsukuba, 1-1-1 Tennōdai, Tsukuba, Ibaraki 305-8577, Japan. E-mail: arai.yuya.qa@alumni.tsukuba.ac.jp
bLaboratory for AI Biology, RIKEN Center for Biosystems Dynamics Research, 6-7-1 Minatojima Minamimachi, Chuo-ku, Kobe, Hyogo, Japan. E-mail: ai-biology@ml.riken.jp
cResearch DX Foundation Team, RIKEN Data and Computational Sciences Integration Research Program, 6-7-1 Minatojima Minamimachi, Chuo-ku, Kobe, Hyogo, Japan
dLaboratory for Biologically Inspired Computing, RIKEN Center for Biosystems Dynamics Research, 6-7-1 Minatojima Minamimachi, Chuo-ku, Kobe, Hyogo, Japan
eMedical Research Laboratory, Institute of Integrated Research, Institute of Science Tokyo, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8519, Japan. E-mail: genki.kanda@tmd.ac.jp
fRobotics Innovation Center, Research Infrastructure Management Center, Institute of Science Tokyo, Japan
First published on 10th March 2026
Laboratory automation increasingly requires handling complex, condition-dependent protocols that combine sequential, branching, and iterative operations. Many systems use task-oriented models, where control proceeds task to task through a predefined list. These are effective for linear, static protocols but are poorly suited to adapting to changing sample conditions or representing loops and conditional branches. We introduce the General Experimental Management System (GEMS), which instead adopts a sample-centred approach, progressing state to state, with each state defining both the operations to perform and the rules for transitioning based on observations. By formalising every experimental protocol as a partially observable Markov decision process (POMDP) and expressing its deterministic execution logic as a deterministic finite automaton (DFA), GEMS can represent heterogeneous workflow structures within a single, coherent framework and enable direct compilation into instrument-executable workflows. Its architecture includes a hierarchical experiment model, a penalty-aware scheduler combining a greedy baseline with simulated annealing refinement, and a file-based interface for instrument-agnostic control. We demonstrate GEMS in two contrasting cases: (i) fully automated Bayesian optimisation of liquid mixtures using a pipetting robot and imaging, and (ii) dynamic, long-term scheduling of multiple mammalian cell cultures executed by a LabDroid robot with autonomous imaging, passaging, medium exchange, and fault recovery. In both, GEMS maintained protocol constraints while adapting schedules in real time, showing that a state-to-state, sample-centred model provides an abstraction that maintains protocol constraints while adapting schedules in real time across heterogeneous workflows.
Existing laboratory automation platforms typically represent experimental workflows within one of four broad paradigms. First, timeline-based systems, such as Clarity LIMS and Autoprotocol, encode protocols as a fixed sequence of steps, making them straightforward to read but inherently unsuitable for conditional branching.10,11 In this work, we treat Autoprotocol as a timeline-style description standard rather than a prescriptive programming language; accordingly, our comparisons focus on representational scope (fixed step lists), not on execution semantics.11 Second, static directed acyclic graph (DAG) schedulers, exemplified by Green Button Go and SAMI EX, capture task dependencies in a graphical form and generate a single optimised schedule prior to execution; however, the graph cannot be modified during execution.12–15 Third, dynamic flow-chart frameworks—including AlabOS, ChemOS, GLAS, and COPE—maintain tasks in a queue or database and determine the subsequent action at run time, thereby supporting closed-loop optimisation, explicit loops, and adaptive resource allocation.16–22 In COPE, for example, the Logic Builder allows loops, producing directed graphs that may contain cycles. Finally, domain-specific languages such as χDL and IvoryOS treat a protocol as source code: χDL compiles declarative scripts into hardware-independent instructions, while IvoryOS assembles Python workflows from a visual canvas; both support loops and conditional logic but delegate most scheduling decisions to the execution engine.23,24
Most existing frameworks in these categories adopt a task-oriented model, in which control moves task to task through a predefined list. While effective for linear protocols and static resource allocation, this perspective makes it difficult to track the evolving condition of individual samples or to express loops and conditional branches without regenerating the entire task list. From a theoretical standpoint, such limitations can be understood by modelling experimental procedures as partially observable Markov decision processes (POMDPs),25–27 in which an unobserved latent sample state is inferred from measurements and subsequent actions are chosen according to an evolving belief state. When the policy is deterministic and the observation set is finite, the decision process can be expressed as a deterministic finite automaton (DFA, Mealy automaton).28–30 This minimal structure makes state transitions explicit, supports formal verification, and can be compiled directly into instrument-executable workflows. Framing protocols in this way suggests a natural alternative to the task-to-task paradigm: a sample-centred model in which the primary unit is the state of each sample, and control moves state to state. Each state encapsulates both the operations to be performed and the rules for transitioning based on observations. This shift aligns naturally with condition-dependent protocols and unifies short, deterministic sequences with long-running, branching workflows in a single abstraction.
Related work spanning timeline lists, static DAG schedulers, dynamic flow-chart frameworks, and DSLs has advanced throughput and, in some cases, closed-loop control; however, task-oriented models seldom treat per-sample state and progression rules as first-class control units, and scheduling/timing often remains entangled with control logic. To address this gap, we use a sample-centred deterministic finite automaton (DFA, Mealy automaton) formulation that separates state progression from task emission while delegating timing to a penalty-aware scheduler. Task-based controllers that support branching/looping, mid-run insertion/deletion, parallelism and rescheduling have been reported (e.g., Jensen and colleagues31). Other works explicitly articulate task–sample interactions (e.g., Lapkin & Kraft;32 Gregoire & Stein33). Our approach complements these directions by placing a minimal per-sample Mealy automaton at the core of control, separating progression (state transitions) from task emission while delegating timing/resource allocation to a penalty-aware scheduler. This combination unifies short deterministic sequences and long, branch-rich protocols within a single representation and supports direct compilation to instrument-executable workflows. Our contributions are: (i) a constructive mapping from protocol policies to a machine-independent Mealy automaton; (ii) a state-centred experiment model that decouples transition rules from task emission and compiles to instrument-executable tasks; and (iii) a penalty-based scheduler that respects time windows and enables on-the-fly rescheduling.
Here we present GEMS (General Experimental Management System), a general-purpose workflow engine that represents experimental protocols as DFAs derived from POMDPs. GEMS integrates (i) state-based protocol definition, (ii) instrument-agnostic scheduling with penalty-aware optimisation, and (iii) observations that store both results and metadata for decision-making. To demonstrate the generality of this framework under diverse temporal and structural constraints, we apply GEMS in two contrasting case studies: (1) sequential parameter optimisation of liquid mixtures guided by Bayesian design, and (2) dynamic scheduling of parallel mammalian cell cultures with automated imaging and passaging. Together, these examples show that a state-to-state, sample-centred model can support robust, fully automated laboratory operation across both short deterministic sequences and long, branch-rich workflows.
that maps any finite observation history to a (possibly empty) finite sequence of laboratory actions. We separate transition rules (stage progression) from task-generation rules (emitted operations), matching GEMS's transition function g and task function f.
iff for all w ∈ Σ*,
. Let
be the set of equivalence classes and q0 = [ε]. DefineBy construction, for any history
and next symbol o, the action sequence prescribed by π equals the output emitted by. 
This makes states and progression conditions explicit while keeping task generation independent of transitions, and supports formal checks and direct compilation to instrument-executable workflows.
The structure of GEMS comprises:
• Lab: the top-level collection. Each experiment runs independently and specifies a directory for saving its results.
• Experiment: the digital representation of a protocol. Internally, this is a DFA composed of a set of states, a transition function (edges), and a observations that record all outcomes (Fig. 2A).
• Machines: physical instruments available in the laboratory. Each record stores a textual description and the associated machine type defining the operations that the instrument can perform.
• State: a stage of the experiment. Each state contains (i) a task function that generates operations for that stage, and (ii) a transition function that determines the next state (Fig. 2A).
• Task: a data structure specifying the operation to be performed, its optimal start time, a penalty function for quantifying the cost of deviation from the optimal start time, and the acceptable machine type (Fig. 2B). In GEMS, all task start times are expressed on a user-defined integer timeline (relative time), not wall-clock timestamps.
• Task group: an ordered sequence of tasks with fixed intervals. Once the start time of the first task is determined, the remaining tasks are scheduled automatically.
• Task function (user-defined in Python): a function that receives the observations and emits the operations required in the current state (Fig. 2A).
• Transition function (user-defined in Python): a function that receives the observations and returns the identifier of the next state (Fig. 2A).
• Penalty function: a numerical cost function evaluating the deviation between the optimal and scheduled times.
• File-based user interface: instead of an interactive graphical user interface, GEMS relies on the file system. At start-up, the plugin manager creates the directories experimental_setting and mode and polls them for changes. Whenever a plugin file or a command file is added or modified, the corresponding method is executed and the file is removed thereafter. This design allows both external drivers and human users to control the workflow using only a text editor.
• Dry-run simulation for capacity planning: the dry-run mode executes the DFA-driven transition function and task emission without sending commands to hardware. Users can vary the number of instruments per type and adjust penalty settings to obtain feasibility, predicted lateness and indicative utilisation before execution. This supports instrument-count selection and maintenance-window placement for larger deployments, while the per-experiment DFA remains unchanged.
Fig. 2 schematically depicts the relationships between these elements: task contains a penalty function for cost calculation during scheduling, and an executable machine type. State contains a task function and a transition function. Experiment comprises a DFA of states and the observations. Lab manages multiple independent experiments and the available machines. Here, we use independence to mean independence of decision inputs: each experiment's transition and task functions consult only its own observations and current state, not those of other experiments.
Fig. 3 outlines the execution flow: state updates in the transition function, task updates in the task function, and schedule updates in the task scheduler (Fig. 2B).
NonePenalty Returns a constant value of 0; suitable when the timing of the task is completely flexible.
LinearPenalty Takes a single parameter, penalty_coefficient c. The penalty is
| Penalty = |tscheduled − toptimal| × c |
LinearWithRangePenalty Parameters: lower, lower_coefficient, upper, and upper_coefficient. Let Δ = tscheduled − toptimal. The penalty is
CyclicalRestPenalty Parameters: cycle_start_time, cycle_duration, and rest_time_ranges. With
| t = (tscheduled − cycle_start_time) mod cycle_duration, |
The helper function adjust_time_candidate_to_rest_range shifts a candidate time forward to the next active period.
CyclicalRestPenaltyWithLinear Extends CyclicalRestPenalty by adding a linear term when outside rest ranges:
• Linear protocols (Fig. 1D, left): each operation is represented by a distinct state and the sample progresses through these states exactly once in a fixed order, with a well-defined terminal state.
• Terminatable protocols with a loop (Fig. 1D, middle): transitions may return to a previous state, so that the sample can revisit a subset of states multiple times before eventually reaching a terminal state, as in protocols with conditional repeats.
• Open-ended protocols without a terminal state (Fig. 1D, right): the state graph contains no designated terminal state, allowing continuous or indefinitely repeated procedures such as long-term monitoring.
the set of machines. Each task j inside a task group g is described by its processing time pg,j, the interval dg,j from the previous task, and the required machine type µg,j.
The scheduling procedure consists of two steps:
1. A baseline greedy scheduler that produces an initial feasible schedule for each task group to avoid machine conflicts.
2. A simulated annealing (SA) refinement that adjusts start times to improve the schedule while preserving the task order within each group.
| sg,1 = tg, sg,j = sg,j−1 + pg,j−1 + dg,j (j > 1), |
), where
is the user-specified optimal start time for group g. For each group, iteratively try integer offsets Δ ∈ {0, 1, − 1, 2, − 2, … } until a conflict-free placement is found, and assign the group's start time as
) and then map results back by adding tref; wall-clock conversion is deferred to the display/dispatch stage.
| S = {tg|g ∈ G} |
g = tg + δ, and then wrap it as follows:• Basic clear solution: 0.50 g sodium bicarbonate in 50 mL water.
• Bromothymol blue: 0.04% (w/v).
The initial volume ratio of the three stock solutions (acidic red solution, basic clear solution, and bromothymol blue) was set to 0.2
:
0.3
:
0.5, and a target RGB value was defined. Batch Bayesian optimisation (q-Expected Improvement policy) was carried out with BoTorch34,35 over three rounds. In each round, eight candidate mixing ratios (conditions) were generated, and each condition was tested in four adjacent wells (four technical replicates), occupying a total of 32 wells on a 96-well plate (Fig. 4E). Dispensing was performed on an Opentrons OT-2 robot (Opentrons Labworks Inc.) in descending order of component ratio to minimise mixing variance; the final volume in each well was 100 µ L. Python scripts in the Opentrons Protocol API format were auto-generated by the OT-2 driver and executed on the instrument.
After dispensing, a Logitech c920n webcam (Logitech c920n; Logitech International S.A.) captured images of the plate. Images were processed on a 14-inch MacBook Pro (Apple, California, US): each well region was cropped, and the average RGB value was computed. A white sheet placed beneath the plate minimised background variation. A target mixture was prepared in all 96 wells at the same ratio, and the RGB values of each well were measured. To reduce variation due to the imaging environment, for each cropped well the Manhattan distance between its measured RGB and the RGB of the corresponding target well was calculated. This Manhattan distance was used as the objective for the next optimisation round. GEMS defined distinct states for mixing, imaging, and evaluation, linked by one-to-one transitions.
Prior to robotic culture, cells were prepared manually: they were seeded at the A1 well position in six-well plates, washed once with phosphate-buffered saline (PBS) (10010023; Thermo Fisher Scientific; Lot: 2412443]), detached with 0.05% trypsin (25300054; Thermo Fisher Scientific; Lot: 2713076]) by gentle pipetting after 2 min at room temperature, and replated at 1 × 105 and 2 × 105 cells per well. The plates were then transferred to the CO2 incubator in the LabDroid booth unit.
300–45,000 cells were seeded per well in 1.5 mL medium.
400 cells per 20 mL medium in eight 50 mL tubes.
Meanwhile, eight new six-well plates were coated with iMatrix-511 at 0.5 µg cm−2 (Lot: 24B215]) as described above, incubated for at least 60 min at 37 °C with 5% CO2, and prepared for plating the cell suspensions.
Two cell lines with different characteristics—hiPSCs and HEK293A cells—were used to evaluate the loop structure.
Five hiPSCs lineages were started initially; four days later, five HEK lineages were added. All operations were performed by a LabDroid Maholo (Robotic Biology Institute Inc., Japan), using the same setup as described by Ochiai et al.40 Schedules were rebuilt dynamically in GEMS whenever new HEK lineages were introduced or removed. Two hiPSCs plates were lost owing to device failure; the corresponding series were deleted from the observations, and the remaining schedule was updated automatically.
1. Data grouping—each experimental result was assigned a passage index gj (passage_group). The initial density n0,i for passage i was set to the observation result in that passage.
2. Combined model—
3. Weighted least squares—The weight of observation j was wj = 2−gj. SciPy's curve_fit function was called with:
4. Time-to-target—Given a target density Nt, the time for passage i was calculated as
The helper function calculate_optimal_time_from_df applied this formula to the latest measurements.
Cell density was following the method described by Ochiai et al.40 Multiple microscope images were stitched, the well region was selected, and cell areas Ac were extracted using the Canny algorithm via the Cell Density Calculator. Using the pixel-count function f, density C was defined as
The best ratio was obtained in Round 3, with a distance of 61.62—a 30% improvement over the best condition (88.13) in the random initial round (Fig. 4D–F). By Round 2, the distance had already been reduced to 72.13. Each optimisation cycle—from proposal generation to execution, image processing, scoring, and scheduling of the next protocol—was completed within 60 min. In total, 96 mixtures (36 unique compositions) were tested, and all steps were executed deterministically under version control. These results demonstrate that integrating GEMS with a Bayesian optimiser enables efficient, fully automated optimisation experiments under routine laboratory conditions.
The experiment timeline is summarised in Fig. 5D. Fig. SF1 (Table ST2) provides an experiment-level verification of the orchestration by visualising the per-lineage state transitions interpreted by GEMS; detailed execution timelines are provided in SI Video S1 (operation-level Gantt) and SI Video S2 (experiment-level Gantt), with the underlying request, scheduled, and actual start timestamps summarised in SI Table ST1. SI Video S3 provides a step-by-step visualisation of the state-transition history. In this run, we deliberately challenged the system by adding new experiments while it was already managing ongoing cultures. We initiated five hiPSC series on Day 0 (14 Nov 2024). While the system was actively controlling these, we introduced a second set by adding five HEK293A series on Day 4 (18 Nov 2024) through the instantiation of new experiment objects. Later in the run, two hiPSC series were lost owing to unrelated incubator failures on 1 Dec and 5 Dec. After removing the affected series, GEMS automatically recomputed the schedule, reallocated machine time, and continued all remaining experiments without interruption. These mid-run additions and removals implicitly stress-tested scalability. GEMS re-optimised schedules on the fly as instrument availability and workload shifted, without modifying the per-experiment DFA.
Over the 36 days run, the system executed ten HEK passages, five HEK samplings, nine hiPSC passages, and three hiPSC samplings (Fig. 5E). At HEK passage, the mean density was 0.82 ± 0.04 (target 0.80); at HEK sampling, 0.39 ± 0.11 (target 0.40). For hiPSCs, the mean density was 0.32 ± 0.06 at passage (target 0.30) and 0.33 ± 0.03 at sampling (target 0.30). Despite differences in growth rates and handling schedules, both cell types consistently reached their targets with minimal overshoot. These results demonstrate that GEMS can autonomously maintain multiple mammalian cell cultures at user-defined density targets, dynamically reschedule tasks in response to the introduction of new experiments as well as unexpected failures, and support robust long-term biological experiments without human intervention.
Tangible gains of adopting this sample-centred representation are threefold. First, explicit per-sample state memory and progression rules (Mealy automaton) allow the next action to be selected without regenerating global task lists or re-authoring workflows. Second, timing is decoupled from control logic via a penalty-aware scheduler, enabling on-the-fly rescheduling, maintenance windows, and capacity changes without altering the state–transition diagram. Third, operational robustness improves: mid-run insertion or removal of experiments and recovery from failures are handled by updating the active set, after which GEMS automatically rebuilds the schedule (as evidenced in the parallel HEK/hiPSC run, Fig. 5D–E). Throughout, observations maintain a consistent record of data and metadata, preserving downstream analyses as protocols evolve.
The RGB optimisation of a three-component liquid mixture confirmed that GEMS functions effectively as a framework for sequential parameter search. Ratios proposed by batch Bayesian optimisation were passed through a task function directly into OT-2 Python protocols, and the metrics required for the next round were collected automatically. Because the experimental operation, imaging, and analysis stages are separated into individual tasks, changes to the optimisation algorithm or the evaluation function do not require modification of the state-transition diagram. Even when the number of loops or samples increases, the observations preserve data and metadata in a consistent format, ensuring that downstream analyses remain coherent.
In the cell-culture study, hiPSCs and HEK293A cells—differing in growth rate and handling frequency—were managed simultaneously on the same system across multiple lines and passages. Images captured by a LabDroid Maholo were used to update growth curves in real time, and GEMS automatically determined passaging, medium exchange, and sampling times based on the predicted density. Notably, GEMS was able to incorporate additional experiments during an ongoing run—adding a second set of cultures mid-schedule—and re-optimised the entire task allocation without disrupting current operations. When a subset of lines was lost owing to hardware failure, removing the corresponding experiment was sufficient for GEMS to rebuild the remaining schedule and continue the other experiments without interruption. These results highlight that DFA-based management can simplify recovery tasks and enable dynamic adaptation in laboratories where mid-run changes and unexpected sample losses are unavoidable.
Comparison of the RGB and cell-culture case studies reveals two extremes: the former involves a short, deterministic sequence of operations, whereas the latter contains continuous branches and loops triggered by measurements. The fact that a single software framework supports both indicates that the abstraction level of GEMS is broad enough to accommodate diverse sample types and objectives. Because a task function is loosely coupled to its device driver, replacing or adding equipment requires only minimal edits, potentially reducing refurbishment costs. Likewise, optimisation algorithms and image-analysis pipelines, implemented in Python, can be swapped rapidly—an advantage for laboratories that iterate quickly.
Several limitations remain. First, writing a transition function requires domain knowledge, and complex conditions may be challenging for novice users to encode; the development of a graphical editor for specifying state transitions would therefore be valuable. Second, the current DFA model does not natively support workflows in which a single experimental sample splits into multiple experimental samples. Although this can be circumvented by registering multiple experiment with copied observations, a generalised implementation is still needed.
In summary, GEMS provides a compact abstraction layer for managing dynamic, cross-device experiments in a sample-centred, state-to-state paradigm. Although its practicality was demonstrated in two bench-scale case studies, the deterministic state description can be scaled seamlessly to facility-level operations. Because experiment definitions are decoupled from device information, each can be updated independently, enabling new instruments or assays to be integrated without downtime. This capability aligns with the needs of cloud laboratories that already operate hundreds of instruments, as well as future laboratory automation facilities.8,9 Although this work targets representational unification rather than algorithmic speed-ups, the design scales in three complementary ways. Control-logic scale keeps protocol decisions local to each experiment through a Mealy automaton, so decision-making is independent of instrument count. Resource scale expands capacity by adding instruments of the required types; the penalty-aware scheduler allocates tasks while the per-experiment logic remains unchanged. Planning scale is supported by a dry-run mode that allows users to vary instrument counts and penalty settings to estimate lateness and utilisation before execution (see Materials, Dry-run simulation for capacity planning). Together with the dynamic rescheduling seen in the parallel HEK/hiPSC run (Fig. 5D–E), these features support progression from bench to facility level. Looking ahead, integration with a graphical state editor and high-performance schedulers could further reduce the barrier to reliable, large-scale automation.
Supplementary information (SI): Fig. SF1, Videos S1–S3, Tables ST1 and ST2, and accompanying captions are available at https://doi.org/10.5281/zenodo.18102950. See DOI: https://doi.org/10.1039/d5dd00409h.
| This journal is © The Royal Society of Chemistry 2026 |