Dominik
Schildknecht
*a,
Anastasia N.
Popova
b,
Jack
Stellwagen
c and
Matt
Thomson
d
aBiology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA. E-mail: dominik.schildknecht@gmail.com
bApplied and Computational Mathematics, California Institute of Technology, Pasadena CA, USA
cSchool of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
dBiology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA. E-mail: mthomson@caltech.edu
First published on 13th December 2021
The control of far-from-equilibrium physical systems, including active materials, requires advanced control strategies due to the non-linear dynamics and long-range interactions between particles, preventing explicit solutions to optimal control problems. In such situations, Reinforcement Learning (RL) has emerged as an approach to derive suitable control strategies. However, for active matter systems, it is an important open question how the mathematical structure and the physical properties determine the tractability of RL. In this paper, we demonstrate that RL can only find good mixing strategies for active matter systems that combine attractive and repulsive interactions. Using analytic results from dynamical systems theory, we show that combining both interaction types is indeed necessary for the existence of mixing-inducing hyperbolic dynamics and therefore the ability of RL to find homogeneous mixing strategies. In particular, we show that for drag-dominated translational-invariant particle systems, mixing relies on combined attractive and repulsive interactions. Therefore, our work demonstrates which experimental developments need to be made to make protein-based active matter applicable, and it provides some classification of microscopic interactions based on macroscopic behavior.
To better understand nature, and to replicate it for applications, researchers started to study so-called active matter. Here, active matter refers to systems of proteins or other particles that continuously consume energy to achieve non-equilibrium dynamics, which can self-organize to achieve specific tasks. In particular, recent work suggested using active matter to achieve macroscopic tasks, such as generating fluid flows,13–15 flow rectification,16 and equilibration of glassy systems.17
The unsolved scientific challenge is how to assemble the existing (microscopic) active matter systems to achieve macroscopic results such as mixing and material transport. While recently, machine learning was applied to various topics in active matter (see ref. 18 for a review), the literature on active matter control is limited. While in macroscopic systems, where individual agent properties can be controlled, various goals could be achieved, such as navigation through fluids at various Reynolds numbers19–25 and self-organization emerging in a multi-agent setting to solve several tasks,26–31 using microscopic systems to achieving macroscopic behavior is challenging. Indeed, while some limited control was achieved, such as the ability to generate simple fluid flows,13–16,32 achieve global particle momentum,33 and accelerated equilibration of otherwise glassy systems,17 control strategies for more complex scenarios are absent. The reason for this challenging behavior is that active matter systems are interaction-dominated, so strategies have to exert indirect control via agent–agent interactions rather than single–agent properties. Therefore, finding successful strategies to achieve macroscopic goals requires advanced control methods, including policy-based reinforcement learning.
The key question we address in this paper is finding control strategies for the macroscopic task of mixing using protein-based active matter systems. In particular, we focus on finding mixing strategies of active particles by applying RL to our recent numerical active matter model.1 If we find good mixing solutions for the active particles, these strategies could then be implemented experimentally to induce mixing in the surrounding medium, for example, via fluid–matter interactions. For our model of active particles, we observe that while RL fails to learn good strategies if only attractive or repulsive interactions are available, RL finds good strategies if attractive and repulsive interactions can be combined. We analyze this puzzling behavior using dynamical systems theory, particularly theory to hyperbolic dynamics and Anosov diffeomorphisms,34 to prove that mixed interactions are indeed necessary to render the problem solvable. Our results, therefore, provide a guideline to make protein-based active matter applicable and answer the question of how different interactions can lead to macroscopic behavior asked in a recent review.18
Our model1 required only attractive interactions to describe the existing experimental platform.14,15,44 However, this restriction will turn out to limit mixing. Hence, an extended model is introduced as follows: the system consists of Np particles in two dimensions indexed by i, which can be in three states: attractive-activated (pai = 1, pri = 0), repulsive-activated (pai = 0, pri = 1), or inactivated (pai = 0, pri = 0). Particles can be activated if they were previously inactivated by being in the corresponding activation area (which will form our control input) and deactivate back to the inactivated state with a rate λ. Activated particles interact with other similar-activated particles by a spring potential.‡ The system's dynamics is described by the equations of motion
(1a) |
(1b) |
(1c) |
(1d) |
Using this model of drag-dominated particle dynamics with controllable pairwise spring interactions, we built an RL environment using OpenAI gym.45 In particular, Np = 96 particles are initialized randomly in a box with periodic boundary conditions, asserting an equal number of particles on the two halves of the system (for future reference, tagged “left” and “right”). The system is then integrated according to eqn (1) with a time-step of Δt = 0.05 for Nt = 100 time-steps. A sketch of the initial condition can be found in Fig. 1b, and an example of a target state is depicted in Fig. 1c. The detailed simulation parameters can be found in the ESI.† The observation and control spaces are introduced by using an Ng × Ng = 4 × 4 square grid, as depicted in Fig. 1d. In particular, observations are given to the algorithm by providing a separate count for each tag (corresponding to the orange and blue colors in Fig. 1b and c) and each bin. The system is controlled by associating each square of the binning to either activation area or none at all so that the action space at every time-step is 34×4 ≈43 × 106 dimensional if both interactions are included.§
It should be noted that the individual simulations are relatively small at only 96 particles and a coarse binning in a 4 × 4 grid. We chose such small system sizes because the individual simulations are run many thousands of times, and as such, they need to run very quickly. Nevertheless, because the scaling of an individual simulation is only (Np2), future work could easily extend to larger particle numbers. In contrast, due to the exponential scaling of the action space with grid resolution, increasing the spatial resolution would slow down learning, and possibly more advanced methods such as curriculum learning would need to be implemented to achieve fine-grained control. However, the main result of this paper, namely the necessity of two types of interactions, will continue to hold independent of the number of particles or the grid resolution, as we will show in Section 4.
(2) |
Since only using eqn (2) will lead to degenerate solutions that collapse all particles to dense clusters, we use an additional (time-step) homogeneity reward, constructed analogously to eqn (2):
(3) |
We introduce the episode rewards as the mean over the time-step rewards for an entire episode:
(4) |
In the RL agents, we use a neural network, which gets as input the tensor . The neural network itself then consists of three convolutional layers (with 32, 64, and 256 channels, and kernel sizes 3, 3, and 4, respectively), and a dense layer with Ng × Ng output neurons. The simulation uses the value of these output neurons to determine if a bin should be activated by thresholding.
The hyperparameters for PPO were tuned automatically during training using ray tune's Population Based Training (PBT)53 with 16 agents. Using PBT, unstable solutions could be overcome, and we observed converged strategies in all training situations. Additional hyperparameters required for PBT can be found in the ESI.†
Fig. 2 Training with only attractive interactions achieves only collapsed clusters: In (a), the mean episode reward of the 16 agents training at α = 1 is shown in solid lines, with minimal and maximal episode rewards as an error band. It can be observed that while training converged, reward variations were still substantial, and several dips in reward were recovered using PBT. In (b), an exemplary time series for only mixing-focused training (α = 1) is shown, in which a collapse of the particle density can be observed. The panels are enumerated from 1 to 4, corresponding to the time-steps 0, 20, 40, and 60, respectively. (A curated list of time series for all attractive-interaction strategies can be found in Fig. S4, ESI†). In (c), the composition of emergent strategies at various values of α is shown, where all the intermediate emergent strategies are taking over continuously. However, no emergent strategy displays a homogeneous mixing. |
To overcome the trivial “collapse all”-solution, α can be reduced so that homogeneity is weighted in. For the extreme case of α = 0, where the algorithm optimizes for homogeneity only, almost no particles were activated as the initial condition already has a high homogeneity. We called this emerging strategy “activate little”, which also did not lead to homogeneous mixing. We analyzed the behavior for intermediate values of α by performing additional simulations at 7 additional values of α (to a total of 9), and we hand-labeled the last validation video for each of the 16 PBT agents. It should be noted, that hand-labeling produced interpretable strategies, but that a quantitative analysis of the activation dynamics (cf. Fig. S7 and S8, ESI†) shows that the activation patterns can be roughly classified into groups consistent with our hand-labeling procedure. The results of the emergent strategy composition are plotted as a function of α in Fig. 2c. There we can observe that for intermediate values of α, we observed two additional strategies. Firstly, for some simulations at 0 ≤ α < 0.5, a “collapse some”-strategy emerged, which focuses on collecting dense particle clusters of opposite tags to remedy the worst bins, but leaves most of the other parts of the system inactivated. Secondly, for various values of α, we observed a “collapse all, careful”-strategy similar to the “collapse all”-strategy in that it tries to collect all particles into a single bin. However, it does so over longer periods of time (typically taking more than half of the simulation time) in order to avoid intermediate dense clusters, particularly if these dense clusters are of a single color. As an additional feature of this strategy, due to giving the system longer times to react, the “collapse all, careful”-strategy tends to be more thorough in collapsing all particles in a single bin compared to the “collapse all”-strategy, that mainly focuses on speed. While the exact details, i.e., which strategy emerges for which value of α, will depend on a variety of factors such as the number of bins Ng × Ng, the initial condition, and the subjective assessment during the hand-labeling (cf. ESI† Section SC), we can observe two main results: firstly, the strategies continuously transform from one to another by tuning α. Secondly, none of the observed strategies achieves a homogeneous mixed state.
Therefore, using only attractive interactions, no strategy emerged that provides a solution similar to our target, as depicted in Fig. 1c, even when varying α. Indeed, we will show in Section 4 that no reward-shaping could overcome this problem, as attractive interactions are insufficient to arrive at a homogeneous and mixed state.
Fig. 3 Training with repulsive interactions has a dominant strategy using the periodic boundary conditions to achieve mixing: in (a), an exemplary time series for only mixing-focused training (α = 1) is shown, in which activation of one side of the system can be observed, using the periodic boundary conditions to facilitate mixing. (A curated list of time series of all repulsive-interaction strategies can be found in Fig. S5, ESI†). In (b), the different emergent strategies are shown, demonstrating that the “activate one side”-strategy is dominant over most of the α-parameter space. |
For α = 0, i.e., maximizing the homogeneity of the system, the network uses repulsive interactions in more densely populated bins to achieve homogenization beyond the initial condition (“repulsive spreading”-strategy). However, as becomes clear from Fig. 3b, this spreading strategy only emerges very close to α = 0 and is quickly overtaken by the “activate one side” strategy (cf. Fig. S7 and S8, ESI†). Indeed, the homogeneity reward strongly punishes dense clusters as emergent in the “collapse all”-strategy but only weakly punishes a situation where only half of the cells become empty. Hence, the “activate one side”-strategy is dominant over large parts of the α-parameter-space. While the “activate one side”-strategy achieves some mixing with the repulsive-only interactions, the strategy is not tunable, and it heavily relies on the periodic boundary conditions. Indeed, we will demonstrate in Section 4 that phase-space-limiting boundary conditions (such as periodic boundary conditions) are required to facilitate mixing with repulsive interactions only.
Fig. 4 Training on a system with both interactions achieves homogeneous mixing by alternating attractive and repulsive interactions: in (a and b), two exemplary time series for oscillation strategies are shown, with and without collapse, respectively. (A curated list of time series for all combined-interaction strategies can be found in Fig. S6, ESI†). In (c), the fraction of runs with a particular strategy at various values of α is shown, with various emergent strategies continuously transforming into each other. |
Overall, an interesting strategy evolution as a function of α emerges using both interactions, as shown in Fig. 4c. In particular, a smooth transition from one strategy to the next can be observed (maybe with exception to the “attractive-repulsive spreading”-strategy), and various intermediate strategies produce well-mixed states without relying on boundary conditions. It should be noted that the actual emergent strategies depend on various factors and are to a degree subjective due to the hand-labeling procedure. Indeed, both oscillatory strategies and the “collapse all”-strategy are similar (cf. the more quantitative analysis presented in Fig. S7 and S8, ESI†), and their differences are can mainly be attributed to when a spread of a dense cluster occurs. As such, this classification relies on the hand-labeling procedure to correlate the particle configuration with the activation pattern. However, we can observe some general features: namely, the strategy evolution is mostly continuous, and the existence of a homogeneous mixing strategy seemed to depend mainly on the interaction set. This result is surprising as attractive or repulsive interactions alone are insufficient to produce efficient mixing, but the combination of both is sufficient. We can now analyze the success-failure pattern of RL to find efficient strategies using dynamical systems theory to analyze the problem more closely. Doing so will reveal that the necessity of combined interactions is not a specific feature of the model in eqn (1) but a much more general statement about different models.
We start by reconsidering the model in eqn (1). Without truncating the sums, the equations of motion are
(5) |
The long-term evolution of the system can be analyzed by considering the eigenvalues λi of M at every time-step. Namely, ergodicity (and therefore mixing) requires the update to be measure-preserving in phase space,56 meaning that the determinant detM = Πiλi has to be 1. Suppose the update map M is sufficiently (namely C2) differentiable, has eigenvalues smaller and larger than 1, but maintains a determinant of 1. Then, the map is so-called Anosov, is guaranteed to mix the phase space (i.e., the 2 × Np-dimensional space),34,57 and as such it is expected to mix the projection in the 2-dimensional real space.
Therefore, if we understand the spectrum of M (i.e., the set of all eigenvalues), strong statements about mixing can be achieved. The update matrix M can be expressed as
(6a) |
(6b) |
Because M is symmetric, all eigenvalues are real.|| Because of Gershgorin's circle theorem58, the spectrum of M
(7) |
Using eqn (7) for attractive interactions where cij ≥ 0: so that the largest possible eigenvalue is 1, while all other eigenvalues are less than 1. Therefore, detM ≤ 1 and iterating M on any initial vector will either collapse the phase space direction (associated eigenvalue λ < 1) or leave it invariant (associated eigenvalue λ = 1). Hence, with only attractive interactions, the system will only be able to collapse, never leading to homogeneous mixing.
Analogously, for repulsive interactions, where cij ≤ 0, so that all eigenvalues are larger or equal to 1, and detM ≥ 1. Therefore, without phase-space-limiting boundary conditions, the phase space is expanding, never leading to homogeneous mixing. However, with periodic boundary conditions, phase space is limited and can be folded into itself. Thus, the successful mixing with only repulsive interactions presented in Section 3.2 was only possible due to the periodic boundary conditions.
Finally, for attractive and repulsive interactions, the off-diagonal elements are around 0 but can have either sign** so that the eigenvalues of M lie around 1 (more precisely ). While it is not guaranteed that the determinant will be 1, the updates now can induce hyperbolic dynamics, i.e., phase space stretching in some directions and compressing in others, bringing the dynamics closer to an Anosov diffeomorphism, hence being able to induce mixing in agreement with our successful mixing simulations in Section 3.3. Furthermore, our results indicate that in order to optimize mixing, the determinant should be close to 1. This optimal mixing could be achieved in future work by tuning α dynamically during training by monitoring the average determinant.
The results on the eigenvalue spectra can be verified numerically: additional simulations at were performed for all three interaction cases. We analyzed the update matrices of the last validation runs for each of the agents at every time point of the simulation and plotted their eigenvalue frequency as histograms in Fig. 5. Indeed, we observe the bounded spectra for limited interaction sets while the system with combined interactions exhibits a more balanced eigenvalue spectrum. While the spectrum for combined interactions in Fig. 5c is not entirely balanced around 1, and the determinant is larger than 1, the matrix is closer to describe an Anosov diffeomorphism and can describe one if necessary. In particular, the oscillatory strategies discussed in Section 3.3 alternate attractive and repulsive interactions to achieve eigenvalues larger and smaller than 1 over multiple time-steps (cf. Fig. S9, ESI†). In conclusion, we indeed observe the predicted eigenvalue spectra throughout all simulations, clearly demonstrating why limited interaction sets cannot provide mixing, whereas the combined interaction set can.
The results of this paper suggest further work in two areas: first, our results could be transferred to the experimental system14,15,43 to verify that mixing of active particles can induce mixing in the surrounding medium via fluid–matter interactions. While the experimental platform is not yet ready, since it lacks an additional repulsive light-controllable motor protein, our paper provides a clear and feasible guideline for necessary developments to the platform to enable active–matter-based mixing. As such, we hope to provide the necessary motivation for the field to develop a dual-controlled active–matter system.
Second, the strategies we found should be further refined for actual applications. While some computational effort could be saved by tuning α dynamically to match a unit determinant, future simulations should use a higher grid resolution, a larger number of particles, better boundary conditions, and more realistic particle–particle interactions to produce realistic strategies. While such refined simulations are highly interesting, their computational demand will undoubtedly be more extensive, so that we expect more advanced machine learning approaches such as transfer learning or curriculum learning to be necessary. Nevertheless, we are confident that our work paves the way for future applications of RL to this area of study, namely control of complex interaction-dominated active matter systems.
Footnotes |
† Electronic supplementary information (ESI) available: Additional details to the simulations, and supplemental figures. The code for the simulation can be found in ref. 1, and the OpenAI gym environment can be found on https://github.com/domischi/SpringBoxDDPG. The data and analysis scripts are accessible. See DOI: 10.1039/d1sm01400e |
‡ I.e., attractive-activated particles only interact with other attractive-interacted particles, and repulsive-activated particles only interact with other repulsive-activated particles. |
§ If only one interaction type is included, the action space at every time-step is 24×4 ≈ 65 × 103 dimensional. |
¶ It should be noted that we define a strategy to summarize all actions of an agent throughout an entire simulation. In contrast, one could consider motifs, i.e., actions an agent performs during a short time period. However, most strategies observed in this paper can be described by a single motif which is used continuously, or at most two that alternate. |
|| For non-symmetric matrices, the argument would be analogous, limiting the discussion to the absolute value of the eigenvalue. |
** Per row, each off-diagonal will have the same sign, as only the same activation states interact with each other. However, due to the possibility of changing the interaction state, the same row can have an opposite sign at different times. |
This journal is © The Royal Society of Chemistry 2022 |