Tailoring interactions between active nematic defects with reinforcement learning

Carlos Floyd; Aaron R. Dinner; Suriyanarayanan Vaikuntanathan

doi:10.1039/D5SM00063G

View PDF VersionPrevious ArticleNext Article

Open Access Article

This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

DOI: 10.1039/D5SM00063G (Paper) Soft Matter, 2025, 21, 4488-4497

Tailoring interactions between active nematic defects with reinforcement learning†

Carlos Floyd *^ab, Aaron R. Dinner ^ab and Suriyanarayanan Vaikuntanathan *^ab
^aDepartment of Chemistry, The University of Chicago, Chicago, Illinois 60637, USA. E-mail: csfloyd@uchicago.edu; svaikunt@uchicago.edu
^bThe James Franck Institute, The University of Chicago, Chicago, Illinois 60637, USA

Received 17th January 2025 , Accepted 24th April 2025

First published on 15th May 2025

Abstract

Active nematics are paradigmatic active matter systems which generate micron-scale patterns and flows. Recent advances in optical control over molecular motors now allow experimenters to control the non-equilibrium activity field in space and time and, in turn, the patterns and flows. However, engineering effective activity protocols remains challenging due to the complex dynamics. Here, we explore a model-free approach for controlling active nematic fields using reinforcement learning. Combining machine learning with trial-and-error exploration of the system dynamics, reinforcement learning bypasses the need for accurate parameterization and model representation of the active nematic. We apply this technique to demonstrate how local activity fields can induce effective interactions between nematic defects, enabling them to follow designer dynamical laws. Moreover, the sufficiency of our low-dimensional system observables and actions suggests that coarse projections of the active nematic field can be used for precise feedback control, making experimental or biological implementation of such feedback loops plausible.

I. Introduction

Active nematics formed from a liquid crystalline suspension of active force dipoles such as cytoskeletal filaments and molecular motors have emerged as a useful experimental platform for creating micron-scale flows.^1,2 Recent research indicates that the positioning of defects in active nematics plays a crucial role in organizing the stress and velocity fields of the system, driving coherent flows.^3,4 In addition to presenting synthetic opportunities, control of active nematics appears useful for understanding how mechanical forces are coordinated at the tissue level in certain systems.^5,6 For example, it was recently shown that defects in the nematic ordering of cytoskeletal filaments in epithelial cells of developing hydra are precisely positioned at key global organizing centers, such as the future mouth.^7–15 It is thus of interest to explore how active nematic defects can be guided, both through through experimental manipulations in vitro and through mechanochemical feedback loops in living systems.^16,17

Recent works have focused on guiding active nematic flows through the external control of spatiotemporally dynamic activity fields α(r,t). Experimental advances in optical actuation of molecular motors using light fields motivate studying activity fields as externally controlled functions.^18–26 Light-based control has also recently been developed over Ca²⁺-powered assembly and contraction of proteins found in protists^27,28 and over cytoskeletal network growth.²⁹ In simulations, optimal time-dependent activity fields α(r,t) can be derived using knowledge of the nematohydrodynamic equations of motion to guide nematic and polar defects along desired paths.^30–33 Other techniques allow targeted modulation of nematic channel flow and design of localized “topological tweezers” for precise defect manipulation.^34–36

A key difficulty in implementing the above mentioned control methods is their reliance on accurate system models and parameterization. Here, we explore controlling active nematics using a model-free machine learning technique called reinforcement learning (RL).^37,38 This approach allows a program to develop a closed-loop control policy for a dynamical system purely through trial and error, without relying on precise model specifications.

To our knowledge RL has not yet been applied to control active nematics, although previous studies used it to control other active matter systems such as flockers, driven colloids, and branched actin networks.^39–41 These studies focus on design tasks that aim to achieve a specific static property of the system, such as target net flocking motion or average cluster size. In contrast, we study control tasks that guide active nematic defects to follow prescribed virtual interactions with one another (see Fig. 1).^42,43 Our goal is to learn activity field protocols that can effectively override the natural defect dynamics (such as passive Coulomb-like attraction) to impose a user-specified interaction law, a more general design objective termed “cyberphysics” in ref. 38. Additionally, we show that efficient protocols can be learned even when feedback is limited to imperfect information such as a coarse projection of the full nematic field configuration.


	Fig. 1 Schematic overview of the closed-loop control over active nematic defect dynamics enabled by a trained RL policy. One step of the feedback loop is depicted.

II. Methods

Here we describe how we use RL to impose customized interaction laws between active nematic defects. In the Appendix we outline the overdamped active nematohydrodynamic equations of motion that we numerically integrate. In Section II A we describe the geometry and interaction laws governing active nematic defects. We then define the states, actions, rewards, and experiment structure which make up our RL setting in Section II B. In Section II C we describe the specific RL algorithm that we use to optimize the policy.

A. Defect geometry and interactions

We consider throughout a pair of ±1/2 defects in a periodic two-dimensional (2D) domain. The defect positions are labeled p_± and we define the separation vector r_sep = p₊ − p₋ (see Fig. 2). As described for example in ref. 35, each defect has an orientation in addition to its position. The +1/2 defect orientation is described by a vector ê = (cos(ϕ₊), sin(ϕ₊)) given by


	(1)

where Q is the symmetric and traceless nematic order parameter, and the expression is evaluated in the limit approaching the center of the defect core. The orientation of the −1/2 defect has a three-fold symmetry and cannot be represented by a vectorial quantity. Instead, it is represented by the rank-three tensor


	(2)

where the brackets denote an angular average around the defect core. This quantity can also be expressed in terms of triple outer products of a vector [t with combining circumflex]

= (sin(ϕ₋), cos(ϕ₋)) as


	(3)

where δ_ij is the Kronecker delta. As required, Θ_ijk is invariant under ϕ₋ + n2π/3 for any integer n. In simulation we compute the nematic positions and orientations following methods described in ref. 44.


	Fig. 2 Illustration of the geometric quantities used to define the positions and orientations of a pair of ±1/2 nematic defects in a general configuration. Two activity fields are also shown in the vicinity of the defects. The vectors r_± point from the defect locations to the center of their nearby activity fields, while r_sep points from the −1/2 defect to the +1/2 defect. The target separation is denoted with an asterisk. The angles ϕ_± describe the orientation of the defects with respect to the horizontal axis.

Although defects in an active nematic fluid evolve under complex hydrodynamics (see eqn (10) and (11) below), theoretical work has shown that their motion can be approximated using simpler one- and two-body dynamical equations.^35,45,46 The positions p_± and orientations ϕ_± approximately obey differential equations which are coupled to each other through elastic interactions, and they are also coupled to the local activity profile α(r) and its gradients. These equations are derived from the hydrodynamic Stokes flow of the active nematic fluid and involve numerical prefactors such as the effective defect size, friction coefficient, elasticity constant, shear viscosity, and others. As predicted by these equations, in the absence of activity (i.e., for passive nematics) a pair of oppositely-charged defects will attract each other without rotating in a Coulomb-like interaction until they annihilate, preserving the total topological charge. This default behavior is illustrated in Fig. 3. These effective dynamical equations enable the design of precise “activity tweezers” that manipulate defect dynamics, but this technique is complicated by the difficulty of estimating the model parameters involved.^35,47 Here, we simply use the fact that defect positions and orientations couple in some way to activity gradients to find tweezer-like activity protocols purely through exploration (i.e., RL). This approach is thus agnostic to the underlying model parameters and dynamics.


	Fig. 3 Without control, defects attract and annihilate without rotating. (A) Top: Trajectories of r_sep shown as different colors with no activity field, illustrating the passive behavior of the defect pairs. Defects pairs are initiated with random horizontal separations and the nematic field is rotated by a random amount. Bottom: Corresponding trajectories of sin(ϕ₊). (B) Snapshots of the nematic field at four different steps during one of the trajectories. Each black line represents the local nematic orientation; one line is drawn for every 16 lattice sites. Color represents the local vorticity of the fluid velocity field. The −1/2 defect and its orientation are depicted as a white dot and lines, and the +1/2 defect and its orientation are depicted as a black dot and a line. The 100 × 100 simulation domain is cropped to 100 × 50 to improve visibility.

B. States, actions, rewards, and experiment structure

An RL algorithm produces a so-called policy π_θ(S) which is a function that maps from a state S into an action A. Training occurs through many trial-and-error iterations in which the policy is incrementally improved via updates to its parameters θ, in the sense that the actions which it learns to choose produce future states that optimize a user-defined reward function. We demonstrate RL-based control over active nematic defect dynamics through several tasks in this paper, each corresponding to a different choice of states, actions, and reward function. We describe each choice as we discuss the tasks below.

For every task we run several experiments using different task parameters or different seeds for the pseudo-random number generator, and the outcome of each experiment is a trained policy function π_θ(S). An experiment is divided into episodes, each of which begins by resetting the active nematic field using some distribution of initial conditions (Fig. 4). Every episode consists of a fixed number of steps, N_steps, although an episode may terminate before this number of steps if the number of defects changes through annihilation or creation. At every step, the RL agent views the state S and chooses a new action A, corresponding to a given configuration of the activity field α(r,t), which is applied for the duration of the step. A step consists of N_dt iterations of the numerical integrator. The fixed time interval which elapses during one step is thus N_dtdt where dt is the time resolution of the integrator; in simulation units we have dt = 1. The experiment ends when a number of episodes, N_eps, is reached or when the wall time of the program exceeds a specified value. Throughout this paper we set N_dt = 50, N_steps = 75, and we run experiments for 3 hours, corresponding to roughly N_eps = 150 episodes per experiment. The choice to fix wall time rather than N_eps is arbitrary and made simply for convenience.


	Fig. 4 Schematic illustration of the structure of an experiment, episode, and step as used in our RL training program.

C. Reinforcement learning algorithm

To train the RL policy π_θ(S) we use a variant of the actor-critic algorithm³⁷ called deep deterministic policy gradient (DDPG),^48,49 which is suited for continuous actions in deterministic environments. Four neural networks are used in this approach: two copies of an actor network with parameters θ and θ′, which implement the policy function, and two copies of a critic network with parameters w and w′, which estimate the value function Q_w(S,A), i.e., the expected cumulative future reward of choosing action A in state S. The main networks, μ_θ(A) and Q_w(S,A), are updated during training and used to select actions and train the actor, while the target networks track the parameters of the main networks with slow updates to provide stable function estimates for training the critic. Specifically, the target network parameters are updated as θ′ ← ρθ′ + (1 − ρ)θ and w′ ← ρw′ + (1 − ρ)w for ρ ≲ 1. Using these slowly updated target networks in addition to the main networks improves stability and convergence of learning.⁴⁹

The main actor and critic networks are updated using stochastic gradient descent (SGD) with mini-batches of N_batch = 32 samples from a replay buffer, which stores tuples (S(s), A(s), R(s), T(s)) (state, action, reward, and termination flag) collected at every step s. For each experiment, an initial random policy is used for 10N_batch steps to populate the replay buffer, after which the main actor network μ_θ(A) is used for action selection, with SGD updates performed every 5 steps. How the mini-batches are used to update the network parameters is described in ref. 48 and 49. All neural networks have two hidden layers with 32 neurons each, and they are trained using the ADAM optimizer with a learning rate of 0.001 and a weight norm clip of 1.0. The discount factor is γ = 0.99 (see ref. 48 and 49 for a definition) and the weight transfer factor is ρ = 0.995. These parameter values were chosen to be consistent with previous implementations of the DDPG method.^48,49

The actions are chosen as A = π_θ(S) + ε, where ε ∼ [scr N, script letter N] (0,0.05) is a small Gaussian noise added to the policy output to encourage exploration of the available actions. The resulting action A is then clipped to the range [−1,1]. State values provided as inputs to the network are normalized to lie approximately within the same range. For each task, the action values are linearly mapped from this range to the corresponding scaled parameters of the activity field α(r,t), such that −1 maps to the lowest scaled values and 1 maps to the largest.

To numerically integrate the active nematic hydrodynamics equations in eqn (10), we use a custom Julia implementation of Heun's finite difference method. The implementation was previously described and validated in ref. 50–52. We parameterize the nematic system so that the unit of length is equal to the equilibrium nematic persistence length (see Appendix).⁵³ To train the RL policy, we combine this numerical solver with an implementation of the DDPG algorithm provided by the ReinforcementLearning.jl package.⁵⁴

III. Results

To demonstrate the feasibility of using RL to control active nematic defect interactions through spatiotemporal activity fields, we show here three test cases of varying difficulty.

A. Translating a +1/2 defect

In the first task, we do not impose a virtual dynamics but instead a static property that the defects should exhibit. Specifically, the goal is to move the defects so that they are horizontally separated by an amount l₀ (Fig. 5A). To do this we apply a disk of nearly uniform activity centered on the +1/2 defect, and we leverage the fact that such a defect self-propels with a velocity vector parallel to its orientation vector and approximately proportional to the local activity: ṗ₊ ∼ −αê.^35,45,55 We initialize each episode by nucleating a pair of defects that are vertically aligned in the 100 × 100 periodic domain and horizontally separated by a random offset l_init selected uniformly from the range [37.5,62.5].


	Fig. 5 Results for task 1: to reach a target horizontal separation l₀. (A) Schematic illustration of the task, in which a nearly uniform disk of activity is applied to the center of the +1/2 defect. The amplitude of the activity is controlled as an action, and its radial profile for different amplitudes is shown on the right. The dashed green line denotes the position of the +1/2 defect in the activity field at the start of each step. (B) Trajectories of the activity amplitude (top) and resulting horizontal separation r_sep (bottom) with a trained RL policy. The dashed line represents l₀ = 55 for this task. (C) Snapshots of the active nematic field at the end of three different steps for the episode in panel (B). See Fig. 3B caption for a description of the visualization. Color here denotes the amplitude of the activity. The black dashed line represents the target separation l₀ = 55. (D) Two trajectories of r_sep with a trained RL policy for five values of l₀, shown as different colors. The values of l₀ are shown as dashed lines. (E) Fitted values of l^fit₀, plotted against their target values of l₀. Standard deviations are shown as shaded areas (smaller than symbols).

The state used in the RL algorithm at step s is S(s) = (r_sep(s) − l₀)/50 (where 50 is a rough scale factor), and the action is the amplitude α₀ of the activity profile scaled to the range [−5,5]. We note that allowing the activity to change sign is unphysical as a typical active nematic fluid has either only extensile (α > 0) or contractile (α < 0) force dipoles. We allow activity to change sign in this task as a first illustration of a simple control mechanism, and in the remaining tasks we constrain the sign of the activity and only vary its magnitude or position. The shape of the activity profile is an azimuthally symmetric function centered on p₊ with radial dependence


	(4)

where, for this task, c = 5 is a cutoff parameter and m = 1 is the width of the logistic profile. The reward for this task is computed at step s as R(s) = −|r_sep(s + 1) − l₀|.

A typical episode for l₀ = 55 is shown in Fig. 5B and C. The policy applies a strong negative activity field to the +1/2 defect, which propels it away from the −1/2 defect. Upon reaching the desired separation of 55 lattice units the applied activity weakens in magnitude and, in a feedback control-like manner, nudges the +1/2 defect further backward whenever the −1/2 defect moves toward it due a to the attractive elastic interaction between them. This maintains the separation at r_sep = 55 for the remainder of the episode. Although in principle the activity could be applied when the separation is at the spring equilibrium, r_sep = l₀, to exactly balance the attractive force, the actual separation oscillates slightly around this point due to the discretization of the simulation domain. We also do not expect the algorithm to learn a numerically exact policy.

We train RL policies for a range of values of l₀ and observe that in each case the algorithm converges to a policy in which the target separation is reached by the end of each episode (Fig. 5D). We fit the learned l^fit₀ by taking the average of r_sep over the last 25 steps of each episode, over the last 40 episodes of each experiments, and over 10 experiments using different random initial seeds. The learned l^fit₀ matches the target l₀ (Fig. 5E) in each case. Thus, RL can learn how to adjust the activity amplitude as a function of defect position to reach a target separation, for a simple physical set-up in which the defects are aligned and move primarily due to the activity-induced propulsion of the +1/2 defect.

B. Translating a −1/2 defect

We next consider a more complicated task, in which the horizontal defect separation r_sep should not just maintain a user-defined static value but should evolve under a user-defined dynamics. We specify an overdamped spring dynamics, in which


	(5)

where the asterisk denotes that this is the target dynamics of r_sep. For steps of duration Δt = N_dtdt (cf.Fig. 4), we use the finite-difference approximation to the change in r_sep from the target dynamics at step s,


	(6)

to form the reward function


	(7)

where r_sep(s + 1) is the actual separation obtained by evolving the dynamics under the activity field chosen at step s. As the RL program learns to better mimic the prescribed dynamics the reward approaches zero from below. The state observed by the program and the episode initialization procedure are the same as in the previous task.

We train the RL program to achieve these target dynamics by leveraging a recently demonstrated modality of defect motion under activity gradients, rather than the modality of +1/2 defect propulsion under constant activity used in the previous task. −1/2 defects have been shown to couple to second-order and higher gradients in α(r),³⁵ and, using this fact, we parameterize the activity field in the vicinity of the −1/2 defect as α(r;deα₀,1,5) in eqn (4), which has non-zero gradients of all orders. We allow the RL program to adjust the amplitude α₀ of the activity field (in the range [0,12]) and the horizontal offset r₋ between the −1/2 defect and the center of the activity field (in the range [−10,10]); see Fig. 2 and 6A.


	Fig. 6 Results for task 2: for r_sep to obey the dynamics of an overdamped spring with rest length l₀ = 50 and varying stiffness k₀. (A) Schematic illustration of the task, in which an inhomogeneous activity field is applied near the −1/2 defect. The position of the activity field and its amplitude are controlled as actions, and its radial profile for different amplitudes is shown on the right. (B) Top: Trajectories of the activity amplitude (black) and horizontal distance from the −1/2 defect (red) for a trained RL policy. Bottom: The resulting horizontal separation r_sep trajectory. The thick dashed line on the bottom represents l₀ = 50, the target stiffness is k₀ = 0.0015, and the thin dashed line denotes an exponential fit with k^fit₀ = 0.00124. (C) Snapshots of the active nematic field at the end of three different steps for the episode in panel (B). See Fig. 3B caption for a description of the visualization. The black dashed line represents the target separation l₀ = 50. (D) Top: A set of 20 trajectories of r_sep with a trained RL policy for k₀ = 0.001, with exponential fits shown as dashed lines. Bottom: Same as top, but with k₀ = 0.004. (E) Fitted values k^fit₀ plotted against their target values of k₀. Standard deviations are shown as shaded areas. For this plot we exclude episodes which randomly start within 2 of r_sep = 50 to focus on trajectories in which r_sep changes appreciably during the episode. See Video S1 (ESI†) for a movie of the trained RL algorithm performing this task.

We fix l₀ = 50 in eqn (5) and vary k₀, the overdamped spring constant. We show typical trajectories obtained with a trained policy for k₀ = 0.0015 in Fig. 6B and C. The policy simultaneously adjusts the amplitude and offset of the activity profile relative to the −1/2 defect so that the defect separation r_sep gradually approaches the “rest length” of l₀ = 50. Viewing several episodes for policies trained on k₀ = 0.001 and k₀ = 0.004, we see that the deviation from the target decays roughly exponentially with a rate that depends on the target stiffness (Fig. 6D).

We consider a range of k₀ values and average k^fit₀, which we obtain as the fitted rate of exponential approach of r_sep(s) to l₀, over the last 40 episodes of each experiment and over 10 experiments using different random initial seeds (Fig. 6E). We observe good agreement between k₀ and k^fit₀, albeit with deviations at high values of k₀. At these values the policy has difficulty pulling the defects faster than the nematic material timescales allow. Despite these limitations, the generally good agreement between k₀ and k^fit₀ indicates that the RL program can utilize a range of physical effects (including coupling to gradients of activity) to pull defects, and these can be made to obey target dynamics rather than just target static configurations.

C. Rotating a +1/2 defect

Finally, we consider the task of reorienting the +1/2 defect by coupling its orientation vector to gradients in the activity field. Describing the defect's orientation by ζ₊ ≡ sin(ϕ₊), we specify the dynamical law


	(8)

corresponding to a overdamped spring with rest length 0 acting on the variable ζ₊. We use sin(ϕ₊) instead of ϕ₊ to avoid the discontinuity at ϕ₊ = 0 and ϕ₊ = 2π. As in the previous task we use the finite-difference approximation for

to compute the reward

. The state observed by the RL program is S(s) = ζ₊(s), and its action is the angle ϕ_α, depicted in Fig. 2 and 7A, between ê and the center of the activity profile α(r;−7.5,5,1), which is centered r₊ = 5 lattice units away from the defect. We draw initial conditions by taking a pair of defects vertically aligned and horizontally separated by 50 lattice units, and then rotating the nematic director everywhere by a random angle chosen uniformly from the range [−0.4,0.4] rads.


	Fig. 7 Results for task 3: for sin(ϕ₊) to obey the dynamics of an overdamped spring with zero rest length and varying stiffness k_θ. (A) Schematic illustration of the task, in which an inhomogeneous activity field is applied near the +1/2 defect. The negative angle ϕ_α is indicated. This angle is controlled as an action, and the activity field's radial profile is shown on the right. The dashed green line denotes the position of the +1/2 defect in the activity field at the start of each step. (B) Top: Trajectory of the angular position ϕ_α of the activity field for a trained RL policy. Bottom: The resulting trajectory for orientation of the +1/2 defect sin(ϕ₊). The thick dashed line on the bottom represents sin(ϕ₊) = 0, the target stiffness is k_θ = 0.0007, and the thin dashed line denotes an exponential fit with k^fit_θ = 0.00067. (C) Snapshots of the active nematic field at the end of three different steps for the episode in panel (B). See Fig. 3B caption for a description of the visualization. The black dashed wedge represents the current orientation sin(ϕ₊) which approaches the target value of 0. (D) Top: A set of 20 trajectories of sin(ϕ₊) with a trained RL policy for k_θ = 0.00025, with exponential fits shown as dashed lines. Bottom: Same as top, but with k_θ = 0.0008. (E) Fitted values k^fit_θ plotted against the target value of k_θ. For this plot we exclude episodes which randomly start within 0.05 of sin(ϕ₊) = 0 to focus on trajectories in which sin(ϕ₊) changes appreciably during the episode. See Video S2 (ESI†) for a movie of the trained RL algorithm performing this task.

We vary the stiffness k_θ in eqn (8). We show typical trajectories obtained with a trained policy for k_θ = 0.0007 in Fig. 7B and C. Under the policy, the angle ϕ_α fluctuates around a mean value so as to cause the +1/2 defect to rotate at approximately the desired exponential rate, k_θ, toward ϕ₊ = ζ₊ = 0. In Fig. 7D we show several episodes for k_θ = 0.00025 and k_θ = 0.0007, indicating a clear difference in the learned decay rate of ζ₊ toward 0. Fitting k^fit_θ for the last 40 episodes of each experiment and over 10 experiments with different random initial seeds, we observe good agreement with the target value of k_θ (Fig. 7E). We thus see that RL can make both the positions and orientations of defects appear to follow dynamical laws that differ from those intrinsic to the system.

IV. Discussion

Here, we explored a computational strategy for tailoring activity fields to guide the dynamics of active nematic defects through closed-loop control policies learned by RL. RL offers a model-free approach for learning feedback control policies that implement a desired dynamical interaction law via trial-and-error exploration of the action space so as to maximize the reward. This approach presents a practically viable alternative to optimal control methods^30,32 or human-designed techniques,³⁵ which require accurate model specification. Relatedly, recent work suggests that RL can offer performance advantages over optimal control in steering racing drones by better accommodating model uncertainty.⁵⁶ Our focus here was on proof of principle using simple tasks. Future research could directly compare RL and optimal control in terms of the robustness of their policies for controlling active nematics.

Our current implementation has several limitations that could be improved in future work. First, we employed a fairly restrictive parameterization of the controlled activity field, chosen to ensure a low-dimensional and easily trainable action space. This design supports the idea that simple spatiotemporal patterns can couple effectively to defect dynamics in active nematics. However, future work could refine this approach, for example, by using multi-agent control^40,57 to explore higher-dimensional settings that might allow reconstruction of more spatially intricate control fields such as those shown to function as precise “topological tweezers”.^35,36 Second, our model of active nematics includes several simplifying assumptions: we adopt a high-friction limit,^35,58 assume negligible density variations,⁴⁵ and omit detailed modeling of the dynamics of the activity-carrying agents, such as dispersal.^59,60 These choices were made for computational efficiency, as the RL algorithm requires many trial iterations to train. Our algorithm converged in under three hours without heavy optimization, however, suggesting substantial room to scale up in future implementations. A key advantage of the RL framework is its agnosticism to the specific physical model, which means that, at least in principle, it can learn effective control strategies across a range of active fluid models. Finally, our reward computation could be improved. We observe small deviations from ideal spring-like behavior in cases with soft spring constants (near k₀ = 0) or when the defects are near their equilibrium separation. In these regimes, the expected defect motions are minimal, so small erroneous motions are weakly penalized. Achieving more uniform adherence to spring-like behavior may require a reweighted reward function, and further refinements could enable more complex, multi-objective control tasks.

The scientific merit of demonstrating control over active nematics via RL is at least two-fold. First, as mentioned earlier, RL offers a practical method for engineering soft active matter systems with minimal reliance on accurate model specification, although it does present several technical challenges. When deploying RL to control a new system, careful attention must be given to parameterizing the control fields and designing the reward functions in a way that allows the algorithm to efficiently learn to solve the abstract optimization problem. Like other machine learning applications, this process involves fine-tuning various hyperparameters such as learning rates, neural network architectures, etc. Additionally, reliable state observations are important (although there exist stochastic variants of RL using uncertain state measurements), and accurately measuring nematic field configurations in experiments requires considerable technical attention.^61,62 Despite these challenges, RL has a proven track record of solving highly complex problems, often surpassing human capabilities,^63,64 making its application to controlling active matter systems quite promising.

Second, the ability of RL programs to successfully manipulate the dynamics of active nematic defects using very low-dimensional state and action spaces suggests that an effective low-dimensional description is sufficiently accurate to capture the defects' dynamics. This finding supports the theoretical arguments presented in ref. 35 and 45. Relatedly, in previous work we demonstrated that dynamic activity fields can be iteratively constructed without RL and without knowledge of the active nematic dynamics using physically motivated yet imperfect local feedback rules.⁶⁵ Recent experiments have similarly demonstrated closed-loop control over active nematic flow speeds which matches predictions from a highly coarse-grained model.¹⁸ We have also demonstrated that low-dimensional RL may in principle be used to provide feedback control over other biomimetic mechanochemical protein networks such as the ultrafast Ca²⁺-powered contractile proteins found in certain protists.^27,28 The sufficiency of imperfect, low-dimensional information to guide active nematics defects has implications for designing experimentally efficient feedback control over living systems and for understanding what information is sufficient to enable biologically plausible control over mechanochemical dynamics during biophysical processes like morphogenesis.^7–15

Data availability

The code for this paper can be found at https://github.com/svaikunt/RLNematics/.

Conflicts of interest

There are no conflicts to declare.

Appendices

Active nematic equations of motion

Active nematic fluids are described by an order parameter Q, which is a symmetric and traceless tensor:


	(9)

where

is a unit director, q measures the degree of polarization, d is the dimensionality, and I is the identity tensor. Our treatment follows the experimentally-motivated model in ref. 21, 52, 66, which uses d = 3 but projects the field onto a 2D plane to which the nematic is spatially confined, while allowing it to rotate out of this plane. We note that this differs slightly from the strictly 2D treatment used in several theoretical works such as ref. 35, 45.

The tensor Q couples to a flow field v and spontaneously seeks to minimize its local free energy. We consider the overdamped limit and the limit of high substrate friction (as in ref. 35, 45, 58), yielding the following equations of motion:


∂_tQ_ij = S_ij(v) + Γ_HH_ij,	(10)


v_i = γ_v⁻¹∂_k(σ^a_ik(Q) + σ^E_ik(Q)).	(11)

Here, H_ij is the symmetric and traceless part of where F is the free energy functional (see below), S_ij is a flow-coupling term, and γ_v is a coefficient of friction. In this overdamped limit, v is an instantaneous function of Q, so eqn (10) is closed in Q. This simplification allows for computational efficiency which is advantageous in training RL programs. The active stress tensor is given by^67,68


σ^a_ij = −αQ_ij,	(12)

and the Ericksen stress tensor is given by⁶⁹


	(13)

The Landau–de Gennes free energy is:


	(14)

where


	(15)

The isotropic part of the passive stress in this model depends only on f as we neglect pressure contributions resulting from variations in density of the fluid.⁴⁵

The flow coupling term is

S_ij(v) = −v_k∂_kQ_ij + Φ_ikQ_kj⁺ + Q_ik⁺Φ_kj − 2ξQ_ij⁺(Q_kl∂_kv_l),

where


	(16)


	(17)


	(18)

and


Φ_ij = ξΨ_ij − Ω_ij.	(19)

In these equations, ξ, A₀, U, Γ_H, L, γ_v are parameters whose meanings are described in ref. 21, 52, 66, to which we refer the reader for additional details on this model. Throughout the paper we use ξ = 0.7, A₀ = 0.1, U = 3.5, Γ_H = 1.5, L = 0.1, and γ_v = 10 (all in simulation units). For these choices the equilibrium nematic persistence length is the same as one length unit.⁵³

Acknowledgements

We wish to thank Alexandra Lamtyugina, Luca Scharrer, Suraj Shankar, Grant Rotskoff, Michael Hagan, Saptorshi Ghosh, and Aparna Baskaran for helpful discussions. This work was mainly supported by DOE BES Grant DE-SC00197 to SV (theory, design). A. R. D. acknowledges support from National Science Foundation (NSF) award MCB-2201235, and the University of Chicago Materials Research Science and Engineering Center, which is funded by the NSF under award number DMR-2011854. The authors acknowledge the University of Chicago's Research Computing Center for computing resources.

References

A. Doostmohammadi, J. Ignés-Mullol, J. M. Yeomans and F. Sagués, Active nematics, Nat. Commun., 2018, 9(1), 3246 CrossRef.
G. Tóth, C. Denniston and J. M. Yeomans, Hydrodynamics of topological defects in nematic liquid crystals, Phys. Rev. Lett., 2002, 88(10), 105504 CrossRef.
M. Serra, L. Lemma, L. Giomi, Z. Dogic and L. Mahadevan, Defect-mediated dynamics of coherent structures in active nematics, Nat. Phys., 2023, 19(9), 1355–1361 Search PubMed.
S. Shankar, A. Souslov, M. J. Bowick, M. C. Marchetti and V. Vitelli, Topological active matter, Nat. Rev. Phys., 2022, 4(6), 380–398 Search PubMed.
P. Guillamat, C. Blanch-Mercader, G. Pernollet, K. Kruse and A. Roux, Integer topological defects organize stresses driving tissue morphogenesis, Nat. Mater., 2022, 21(5), 588–597 CrossRef CAS PubMed.
K. Kawaguchi, R. Kageyama and M. Sano, Topological defects control collective dynamics in neural progenitor cell cultures, Nature, 2017, 545(7654), 327–331 CrossRef CAS PubMed.
Y. Maroudas-Sacks, L. Garion, L. Shani-Zerbib, A. Livshits, E. Braun and K. Keren, Topological defects in the nematic order of actin fibres as organization centres of Hydra morphogenesis, Nat. Phys., 2021, 17(2), 251–259 Search PubMed.
Y. Ravichandran, M. Vogg, K. Kruse, D. J. Pearce and A. Roux, Topology changes of Hydra define actin orientation defects as organizers of morphogenesis, Sci. Adv., 2025, 11(3), eadr9855 Search PubMed.
Z. Wang, M. C. Marchetti and F. Brauns, Patterning of morphogenetic anisotropy fields, Proc. Natl. Acad. Sci. U. S. A., 2023, 120(13), e2220167120 CrossRef CAS PubMed.
L. Metselaar, J. M. Yeomans and A. Doostmohammadi, Topology and morphology of self-deforming active shells, Phys. Rev. Lett., 2019, 123(20), 208001 CrossRef CAS PubMed.
L. J. Ruske and J. M. Yeomans, Morphology of active deformable 3D droplets, Phys. Rev. X, 2021, 11(2), 021001 CAS.
L. A. Hoffmann, L. N. Carenza, J. Eckert and L. Giomi, Theory of defect-mediated morphogenesis, Sci. Adv., 2022, 8(15), eabk2712 CrossRef PubMed.
Z. Zhao, H. Li, Y. Yao, Y. Zhao, F. Serra and K. Kawaguchi, et al., Integer topological defects offer a methodology to quantify and classify active cell monolayers, Nat. Commun., 2025, 16(1), 2452 CrossRef CAS.
D. Pearce, C. Thibault, Q. Chaboche and C. Blanch-Mercader, Passive defect driven morphogenesis in nematic membranes, Phys. Rev. Lett., 2025, 134(1), 018402 CrossRef CAS.
D. Krommydas, L. N. Carenza and L. Giomi, Hydrodynamic enhancement of p-atic defect dynamics, Phys. Rev. Lett., 2023, 130(9), 098101 CrossRef CAS PubMed.
A. Bailles, E. W. Gehrels and T. Lecuit, Mechanochemical principles of spatial and temporal patterns in cells and tissues, Annu. Rev. Cell Dev. Biol., 2022, 38(1), 321–347 CrossRef CAS PubMed.
D. B. Brückner and E. Hannezo, Tissue Active Matter: Integrating Mechanics and Signaling into Dynamical Models, Cold Spring Harbor Perspect. Biol., 2024, a041653 Search PubMed.
K. Nishiyama, J. Berezney, M. M. Norton, A. Aggarwal, S. Ghosh, M. F. Hagan, et al., Closed-loop control of active nematic flows, arXiv, 2024, preprint, arXiv:240814414 DOI:10.48550/arXiv.2408.14414.
T. D. Schindler, L. Chen, P. Lebel, M. Nakamura and Z. Bryant, Engineering myosins for long-range transport on actin filaments, Nat. Nanotechnol., 2014, 9(1), 33–38 CrossRef CAS.
M. Nakamura, L. Chen, S. C. Howes, T. D. Schindler, E. Nogales and Z. Bryant, Remote control of myosin and kinesin motors using light-activated gearshifting, Nat. Nanotechnol., 2014, 9(9), 693–697 CrossRef CAS PubMed.
R. Zhang, S. A. Redford, P. V. Ruijgrok, N. Kumar, A. Mozaffari and S. Zemsky, et al., Spatiotemporal control of liquid crystal structure and dynamics through activity patterning, Nat. Mater., 2021, 20(6), 875–882 CrossRef CAS PubMed.
L. M. Lemma, M. Varghese, T. D. Ross, M. Thomson, A. Baskaran and Z. Dogic, Spatio-temporal patterning of extensile active stresses in microtubule-based active fluids, PNAS Nexus, 2023, 2(5), pgad130 CrossRef.
S. Chandrasekar, J. R. Beach and P. W. Oakes, Shining a light on RhoA: Optical control of cell contractility, Int. J. Biochem. Cell Biol., 2023, 161, 106442 CrossRef CAS PubMed.
F. Yang, S. Liu, H. J. Lee, R. Phillips and M. Thomson, Dynamic flow control through active matter programming language, Nat. Mater., 2025, 1–11 Search PubMed.
R. Sakamoto and M. P. Murrell, Mechanical power is maximized during contractile ring-like formation in a biomimetic dividing cell model, Nat. Commun., 2024, 15(1), 9731 CrossRef CAS PubMed.
I. Linsmeier, S. Banerjee, P. W. Oakes, W. Jung, T. Kim and M. P. Murrell, Disordered actomyosin networks are sufficient to produce cooperative and telescopic contractility, Nat. Commun., 2016, 7(1), 12615 CrossRef CAS PubMed.
X. Lei, C. Floyd, L. C. Ferrer, T. Chakrabortty, N. Chandrasekharan and A. Dinner, Light-induced reversible assembly and actuation in ultrafast Ca²⁺-driven chemomechanical protein networks, bioRxiv, 2025, preprint, 2025-03 DOI:10.1101/2025.03.03.641304.
C. Floyd, A. T. Molines, X. Lei, J. E. Honts, F. Chang and M. W. Elting, et al., A unified model for the dynamics of ATP-independent ultrafast contraction, Proc. Natl. Acad. Sci. U. S. A., 2023, 120(25), e2217737120 CrossRef CAS.
T. Litschel, D. Vavylonis and D. A. Weitz, 3D printing cytoskeletal networks: ROS-induced filament severing leads to surge in actin polymerization, bioRxiv, 2025, preprint, 2025-03 DOI:10.1101/2025.03.19.644260.
M. M. Norton, P. Grover, M. F. Hagan and S. Fraden, Optimal control of active nematics, Phys. Rev. Lett., 2020, 125(17), 178005 CrossRef CAS.
S. Ghosh, C. Joshi, A. Baskaran and M. F. Hagan, Spatiotemporal control of structure and dynamics in a polar active fluid, Soft Matter, 2024, 20, 7059–7071 RSC.
S. Ghosh, A. Baskaran and M. F. Hagan, Achieving designed texture and flows in bulk active nematics using optimal control theory, J. Chem. Phys., 2025, 162, 134902 CrossRef CAS.
J. Alvarado, E. Teich, D. Sivak and J. Bechhoefer, Optimal Control in Soft and Active Matter, arXiv, 2025, preprint, arXiv:250408676 DOI:10.48550/arXiv.2504.08676.
C. G. Wagner, M. M. Norton, J. S. Park and P. Grover, Exact coherent structures and phase space geometry of preturbulent 2d active nematic channel flow, Phys. Rev. Lett., 2022, 128(2), 028003 CrossRef CAS PubMed.
S. Shankar, L. V. Scharrer, M. J. Bowick and M. C. Marchetti, Design rules for controlling active topological defects, Proc. Natl. Acad. Sci. U. S. A., 2024, 121(21), e2400933121 CrossRef CAS PubMed.
W. T. Irvine, A. D. Hollingsworth, D. G. Grier and P. M. Chaikin, Dislocation reactions, grain boundaries, and irreversibility in two-dimensional lattices using topological tweezers, Proc. Natl. Acad. Sci. U. S. A., 2013, 110(39), 15544–15548 CrossRef CAS.
R. S. Sutton, Reinforcement learning: an introduction, A Bradford Book, 2018 Search PubMed.
J. Bechhoefer, Control theory for physicists, Cambridge University Press, 2021 Search PubMed.
M. J. Falk, V. Alizadehyazdi, H. Jaeger and A. Murugan, Learning to control active matter, Phys. Rev. Res., 2021, 3(3), 033291 CrossRef CAS.
S. Chennakesavalu and G. M. Rotskoff, Probing the theoretical and computational limits of dissipative design, J. Chem. Phys., 2021, 155(19), 194114 CrossRef CAS PubMed.
S. Chennakesavalu, S. K. Manikandan, F. Hu and G. M. Rotskoff, Adaptive nonequilibrium design of actin-based metamaterials: Fundamental and practical limits of control, Proc. Natl. Acad. Sci. U. S. A., 2024, 121(8), e2310238121 CrossRef CAS.
U. Khadka, V. Holubec, H. Yang and F. Cichos, Active particles bound by information flows, Nat. Commun., 2018, 9(1), 3864 CrossRef PubMed.
T. Bäuerle, A. Fischer, T. Speck and C. Bechinger, Self-organization of active particles by quorum sensing rules, Nat. Commun., 2018, 9(1), 3232 CrossRef PubMed.
X. Tang and J. V. Selinger, Orientation of topological defects in 2D nematic liquid crystals, Soft Matter, 2017, 13(32), 5481–5490 RSC.
S. Shankar, S. Ramaswamy, M. C. Marchetti and M. J. Bowick, Defect unbinding in active nematics, Phys. Rev. Lett., 2018, 121(10), 108002 CrossRef CAS PubMed.
C. D. Schimming, C. Reichhardt and C. Reichhardt, Analytical model for the motion and interaction of two-dimensional active nematic defects, Soft Matter, 2025, 21(1), 122–136 RSC.
C. Joshi, S. Ray, L. M. Lemma, M. Varghese, G. Sharp and Z. Dogic, et al., Data-driven discovery of active nematic hydrodynamics, Phys. Rev. Lett., 2022, 129(25), 258001 CrossRef CAS.
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra and M. Riedmiller, Deterministic policy gradient algorithms, In: International Conference on Machine Learning. Pmlr, 2014, pp. 387–395.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al., Continuous control with deep reinforcement learning, arXiv, 2015, preprint, arXiv:150902971 DOI:10.48550/arXiv.1509.02971.
C. Floyd, S. Vaikuntanathan and A. R. Dinner, Simulating structured fluids with tensorial viscoelasticity, J. Chem. Phys., 2023, 158(5), 054906 CrossRef CAS.
C. Floyd, A. R. Dinner and S. Vaikuntanathan, Pattern formation in odd viscoelastic fluids, Phys. Rev. Res., 2024, 6(3), 033100 CrossRef CAS.
S. A. Redford, J. Colen, J. L. Shivers, S. Zemsky, M. Molaei and C. Floyd, et al., Motor crosslinking augments elasticity in active nematics, Soft Matter, 2024, 20(11), 2480–2490 RSC.
E. J. Hemingway, P. Mishra, M. C. Marchetti and S. M. Fielding, Correlation lengths in hydrodynamic models of active nematics, Soft Matter, 2016, 12(38), 7943–7952 RSC.
J. Tian, other contributors. ReinforcementLearning.jl: A Reinforcement Learning Package for the Julia Programming Language; 2020. Available from: https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl.
L. Giomi, M. J. Bowick, P. Mishra, R. Sknepnek and M. Cristina Marchetti, Defect dynamics in active nematics, Philos. Trans. R. Soc., A, 2014, 372(2029), 20130365 CrossRef PubMed.
Y. Song, A. Romero, M. Müller, V. Koltun and D. Scaramuzza, Reaching the limit in autonomous racing: Optimal control versus reinforcement learning, Sci. Rob., 2023, 8(82), eadg1462 CrossRef PubMed.
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel and I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, Adv. Neural Inf. Process. Syst., 2017, 30, 6382–6393 Search PubMed.
A. Doostmohammadi, M. F. Adamer, S. P. Thampi and J. M. Yeomans, Stabilization of active matter by flow-vortex lattices and defect ordering, Nat. Commun., 2016, 7(1), 10557 CrossRef CAS.
R. C. Coelho, N. A. Araújo and M. M. T. da Gama, Dispersion of activity at an active-passive nematic interface, Soft Matter, 2022, 18(39), 7642–7653 RSC.
T. E. Bate, M. E. Varney, E. H. Taylor, J. H. Dickie, C. C. Chueh and M. M. Norton, et al., Self-mixing in microtubule-kinesin active fluid from nonuniform to uniform distribution of activity, Nat. Commun., 2022, 13(1), 6573 CrossRef CAS PubMed.
Y. Li, Z. Zarei, P. N. Tran, Y. Wang, A. Baskaran and S. Fraden, et al., A machine learning approach to robustly determine director fields and analyze defects in active nematics, Soft Matter, 2024, 20(8), 1869–1883 RSC.
P. N. Tran, S. Ray, L. Lemma, Y. Li, R. Sweeney and A. Baskaran, Deep-learning optical flow for measuring velocity fields from experimental data, Soft Matter, 2024, 20(36), 7246–7257 RSC.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre and G. Van Den Driessche, et al., Mastering the game of Go with deep neural networks and tree search, Nature, 2016, 529(7587), 484–489 CrossRef CAS.
G. Reddy, J. Wong-Ng, A. Celani, T. J. Sejnowski and M. Vergassola, Glider soaring via reinforcement learning in the field, Nature, 2018, 562(7726), 236–239 CrossRef CAS.
C. Floyd, A. R. Dinner and S. Vaikuntanathan, Learning to control non-equilibrium dynamics using local imperfect gradients, arXiv, 2024, preprint, arXiv:240403798 DOI:10.48550/arXiv.2404.03798.
J. Colen, M. Han, R. Zhang, S. A. Redford, L. M. Lemma and L. Morgan, et al., Machine learning active-nematic hydrodynamics, Proc. Natl. Acad. Sci. U. S. A., 2021, 118(10), e2016708118 CrossRef CAS PubMed.
Y. Hatwalne, S. Ramaswamy, M. Rao and R. A. Simha, Rheology of active-particle suspensions, Phys. Rev. Lett., 2004, 92(11), 118101 CrossRef PubMed.
M. C. Marchetti, J. F. Joanny, S. Ramaswamy, T. B. Liverpool, J. Prost and M. Rao, et al., Hydrodynamics of soft active matter, Rev. Mod. Phys., 2013, 85(3), 1143–1189 CrossRef CAS.
A. N. Beris and B. J. Edwards, Thermodynamics of flowing systems: with internal microstructure. 36, Oxford University Press, USA, 1994 Search PubMed.

Footnote

† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5sm00063g

Click here to see how this site uses Cookies. View our privacy policy here.