Atsuyuki
Nakao
a,
Yu
Harabuchi
bcd,
Satoshi
Maeda
bcd and
Koji
Tsuda
*aef
aGraduate School of Frontier Sciences, The University of Tokyo, Kashiwa 2778561, Japan. E-mail: tsuda@k.u-tokyo.ac.jp
bInstitute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo 001-0021, Japan
cJST ERATO Maeda Artificial Intelligence for Chemical Reaction Design and Discovery Project, Sapporo 060-0810, Japan
dDepartment of Chemistry, Faculty of Science, Hokkaido University, Sapporo 060-0810, Japan
eRIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
fResearch and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
First published on 6th April 2022
Reaction path finding methods construct a graph connecting reactants and products in a quantum chemical energy landscape. They are useful in elucidating various reactions and provide footsteps for designing new reactions. Their enormous computational cost, however, limits their application to relatively simple reactions. This paper engages in accelerating reaction path finding by introducing the principles of algorithmic search. A new method called RRT/SC-AFIR is devised by combining rapidly exploring random tree (RRT) and single component artificial force induced reaction (SC-AFIR). Using 96 cores, our method succeeded in constructing a reaction graph for Fritsch–Buttenberg–Wiechell rearrangement within a time limit of 3 days, while the conventional methods could not. Our results illustrate that the algorithm theory provides refreshing and beneficial viewpoints on quantum chemical methodologies.
![]() | ||
Fig. 1 Potential energy surface and the corresponding reaction graph. Equilibrium structures are shown as EQ. |
In this paper, we leverage algorithmic search for efficient path finding. Algorithmic search algorithms such as A*,20 simulated annealing21 and Monte Carlo tree search22 have been applied to real-world path finding problems such as maze solving, vehicle routing and robot movement scheduling. Among various options, we focus on rapidly-exploring random tree (RRT)23 due to its simplicity and a good track record of successful applications.24–26 Initially, there are a start node (i.e., reactant) and a goal node (i.e., product) in the search space. The search graph is expanded from the start node by alternating the two steps: random node expansion and goal-oriented expansion (Fig. 2). In graph expansion, SC-AFIR is employed to generate adjacent nodes corresponding to neighboring equilibrium structures (denoted as EQs). Finally, our algorithm, termed RRT/SC-AFIR, terminates when a time limit is met.
To meet the challenges from high dimensional and highly constrained conformational spaces, we enhanced RRT with respect to the following two points: (1) fair node sampling and (2) similarity measure of structures. In step A shown in Fig. 2a, a node is randomly selected and expanded. If most of the nodes correspond to similar structures, simple sampling leads to biased exploration. For fairer sampling, the nodes are clustered according to a coarse-grained representation of structures called the connection pattern. A node is selected by the following two-step method. First, a cluster is selected in equal probability. Next, a node in the cluster is randomly selected. In step B shown in Fig. 2b, a node is selected using the similarity between a node and the goal node. Most of the available similarity measures of three-dimensional conformations are developed for comparing two molecules.27 Our equilibrium structure, however, often involves multiple molecules. Since the relative positions of distinct molecules cause only negligible changes in the potential energy, similarity measures of global geometry may fail. To focus on the differences in local geometry, we developed a similarity measure based on greedy match of local atomic environments.
In our computational experiments, RRT/SC-AFIR was evaluated in path finding tasks with respect to Wöhler's urea synthesis (WUS) and Fritsch–Buttenberg–Wiechell rearrangement (FBW). It was compared with two SC-AFIR-based methods: Boltzmann/SC-AFIR15 and kinetics/SC-AFIR.28 The former samples new nodes according to the Boltzmann distribution, while the latter uses kinetics-based navigation. We found that our method was the fastest in finding valid reaction paths that are consistent with the known paths (Fig. 3). Using 96 cores, RRT/SC-AFIR found valid paths of FBW within a time limit of 3 days, while the others could not. Taking statistics of path lengths, it is found that existing SC-AFIR methods tend to generate unreasonably long paths, indicating that RRT was effective in guiding the search towards the goal node.
![]() | ||
Fig. 3 Known reaction paths of (a) Wöhler's urea synthesis (WUS) and (b) Fritsch–Buttenberg–Wiechell rearrangement (FBW). |
In algorithmic search, taking balance between exploration (i.e., gathering knowledge with unpurposeful moves) and exploitation (i.e., moving toward the goal) is crucially important.20 If too much weight is put on exploitation, the algorithm is likely to stuck at local minima. Exploration helps it escape from local minima, but too much wandering is also harmful for fast search.
RRT/SC-AFIR takes the balance by alternating two distinctly different steps (Fig. 2). In step A (random node expansion), a node is chosen randomly and an adjacent node is attached to it. To avoid bias, the nodes are clustered using a labeled graph representation called the connection pattern (see Method 4.1). A set of nodes with an identical connection pattern is summarized as a cluster, where labeled graph isomorphism checking is conducted using the NetworkX python library (https://networkx.org/). A cluster is chosen randomly and a node in the cluster is selected in equal probability. An adjacent node is created by single AFIR-path calculations,29 where an artificial force in either the merging or splitting direction is added to two fragments formed around two randomly-selected atoms. For details about fragment formation, see the ESI.† The AFIR-path calculation finds another equilibrium structure by getting beyond the energy barrier, hence a new node. To keep the search in low energy regions, the following probabilistic filter is applied, i.e., the new node is accepted by the following probability:
In step B (goal-oriented expansion), a node close to the goal node is identified in terms of our similarity measure (see Method 4.2). A cluster is chosen according to the probability proportional to exp(csi), where the similarity to a cluster si is defined as the maximum value among the similarities between the goal and its members, and c is a parameter to control the exploration–exploitation balance. Then, a node is chosen according to the probability proportional to exp(czj) where zj denotes the similarity between the goal and a member. Then, an adjacent node is created by AFIR, and the same probabilistic filter as step A is applied.
Table 1 summarizes the goal time and path connecting time and gradient calculations per minute. In WUS, all the three methods found the goal node and the valid reaction path within a time limit of three days. RRT/SC-AFIR found valid reaction paths much faster than the other methods. Comparing kinetics/SC-AFIR and Boltzmann/SC-AFIR, the former was more efficient in finding the goal node, while the latter was better in finding valid paths. In FBW, only RRT/SC-AFIR found valid paths in time.
Reaction | Method | Goal time (min) | Path connecting time (min) | Gradient calculations per minute |
---|---|---|---|---|
RRT | 67 | 67 | 341.3 | |
WUS | Kinetics | 66 | 776 | 342.8 |
Boltzmann | 326 | 549 | 343.8 | |
RRT | 2575 | 2575 | 126.6 | |
FBW | Kinetics | — | — | 123.4 |
Boltzmann | — | — | 123.8 |
Fig. 4 shows the distribution of distances from all nodes of the constructed reaction graph to the start node. The distance between nodes is measured by the length of the shortest path. In both reactions, nodes explored by RRT/SC-AFIR are distributed near the start node, while the distributions of Boltzmann/SC-AFIR and kinetics/SC-AFIR cover broad ranges, indicating that they are likely to produce very long reaction paths. This phenomenon is more evident in FBW. Since known reaction paths shown in Fig. 3 consist of three and two steps for WUS and FBW, respectively, the distribution of RRT/SC-AFIR conforms better with the known paths.
![]() | ||
Fig. 4 Distance distribution from the start node in the combined reaction graph. (a) Wöhler's urea synthesis (WUS) and (b) Fritsch–Buttenberg–Wiechell rearrangement (FBW). |
Boltzmann/SC-AFIR and kinetic/SC-AFIR perform blind search, where the goal is not given a priori. If most feasible products at the present computational level are identical to experimentally validated products, they would find the goal node quickly, but it is not the case in both WUS and FBW as shown in Fig. 5. Since these products are more stable, Boltzmann/SC-AFIR and kinetics/SC-AFIR search around them, leading to unreasonably long paths. A main reason of this gap would be that solvent effects are ignored in the present computational level. These results indicate that, when products are known, RRT/SC-AFIR has a clear advantage in that it is robust against discrepancies between computation and reality.
(1) Make an empty list V.
(2) Let v be the minimum value v = mini,jaij, and i* and j* be the corresponding row and column indices, respectively.
(3) Attach v to the list V and remove row i* and column j*.
(4) Quit if A is empty. Otherwise go to 2.
Similarly, one can define the maximum value profile as well.
To define a similarity between two structures, an atom type is selected first. Suppose there are n and m atoms of the selected type in the first and second structures of interest, respectively (n ≤ m). Let dik denote the distance between atoms i and k in the same structure. Also, let Cij denote a matrix whose k, l element is the difference between dik in the first structure and djl in the second structure. When Vij is the minimum value profile of Cij, the distance between local environments around atoms i and j is defined as
This paragraph describes SC-AFIR options used in all the RRT/SC-AFIR, Boltzmann/SC-AFIR, and kinetics/SC-AFIR searches. The algorithm avoiding Hessian calculations is adopted. The searches using the three different methods are initiated from a common initial structure, i.e., a complex among NH+4 + OCN− + 2(H2O) for WUS and a complex between butyllithium and 1-bromo-2-methylpropane for FBW. In the calculations of WUS, all atoms are set as the target atom of the SC-AFIR method. In the case of FBW, lithium, atoms in the CH2 moiety adjacent to lithium, and all atoms in 1-bromo-2-methylpropane are set as the target atom. The model collision energy parameter γ of the AFIR method is set to 500 kJ mol−1. To prevent a molecule from going too far from the reaction centre, a weak force with γ = 100/[N(N − 1)/2] kJ mol−1 is applied to all atom pairs, where N corresponds to the number of atoms in each system. The force induced paths are optimized by the locally updated planes (LUP) method36 to obtain relaxed paths (LUP paths).
During the Boltzmann/SC-AFIR and kinetics/SC-AFIR searches, node clustering is done using the MatchDecScale = 7:
0 option of the GRRM20 program.34 The barrier along the LUP paths is used in the kinetics simulations during the kinetics/SC-AFIR searches. In kinetics/SC-AFIR, the initial population 1
:
0 is given to the initial structure, and a node from which the path calculation is done next is chosen based on the traffic volume of each node obtained by thermal equilibration of 3600 seconds at 200 K, 300 K, and 400 K, where the traffic volume is an index indicating influx/outflux of population to/from each node during the equilibration.28
RRT/SC-AFIR was developed as a stand-alone external code. The GRRM20 program34 equips SubSelectEQ and SubPathsGen options, and these options allow one to develop an SC-AFIR driver without accessing the internal functions of GRRM. More specifically, when these options are provided as SubSelectEQ=code1.exe and SubPathsGen=code2.exe in the GRRM input file, GRRM calls code1.exe and code2.exe when deciding a node from which the path calculation is done next and when choosing a fragment pair to which the artificial force is applied, respectively, and accepts their decisions. In this study, these options were used to develop RRT/SC-AFIR.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cp01079h |
This journal is © the Owner Societies 2022 |