Searching chemical action and network (SCAN): an interactive chemical reaction path network platform

The interactive chemical reaction platform, SCAN, is developed for analyzing the chemical reaction path network. SCAN o ﬀ ers the chemical reaction path network database, visualization, and network analysis tools. In particular, SCAN is a web-based platform that allows users to perform interactive chemical reaction path network visualization and data science techniques with simple operation. SCAN is designed to provide a user-friendly graphic user interface, making pre-existing knowledge of programming and skillsets optional. Thus, SCAN is proposed to be an alternative tool for analyzing and understanding chemical reaction path networks.


Introduction
Understanding a chemical reaction answers the fundamental mystery of how products are formed from reactants. First principles calculations reveal that a chemical reaction is a complex matter as it involves a tremendous number of intermediates. [1][2][3][4][5][6] In other words, a chemical reaction can be treated as a form of a complex network consisting of numerous molecular interactions. 7 While it is difficult to capture the details of molecular interactions in an experiment, rst principles calculations play a major role in understanding such a complex reaction map. 1,8,9 In particular, numerous automatic chemical reaction searching tools have been developed such as the freezing string method with the Berny algorithm, single/ double-ended growing string methods, articial force induced reaction (AFIR), reaction mechanism generator (RMG), and KinBot. [10][11][12][13][14][15] Although such complex reaction path networks have become available, the question arises over how such complex networks can be understood and how knowledge can be extracted, rendering tools for extracting the knowledge from networks necessary.
Extracting knowledge from the chemical reaction path network involves multiple steps and processes. In particular, organization of the chemical reaction database, statistical analysis, network visualization, and graph theory are involved. Several network visualization tools are available such as Cytoscape and Gephi which offer network visualization and graph theory analysis. 16,17 Moreover, it has been demonstrated that graph theory such as centrality analysis is found to be effective when determining intermediates. 5,14,[18][19][20] However, these processes are strongly linked to each other, meaning that the individual development of each process could limit the ability to extract knowledge. In addition, network data visualization and analysis oen require particular skillsets as well as advanced programming skills, which can act as barriers towards performing such analyses. Therefore, it is crucial to establish a centralized, interactive, and user-friendly platform which has the ability to utilize these processes simultaneously. Here, Searching Chemical Action and Network (SCAN) is introduced where a platform for an interactive chemical reaction path network is designed and proposed where the chemical reaction path network is produced by the AFIR method. 9,12,21,22 The SCAN platform is available at https://scan.sci.hokudai.ac.jp/ where it allows the users to explore, visualize and analyze the chemical reaction path network data generated by rst principles calculations. Thus, SCAN allows for searching and understanding the complex chemical reaction path network.

SCAN architecture
The concept of SCAN is to store and share the chemical reaction path network generated by rst principles calculations where interactive network visualization and network analysis are also provided. In order to achieve the exible reuse of data, the layered architecture is implemented as shown in Fig. 1. Fig. 1 illustrates the SCAN architecture which consists of a data lake, data warehouse, and data mart. Here, the chemical reaction path network generated by AFIR methods is used as prototype chemical reaction path network data where data are previously published. 23 The chemical reaction path network data generated by AFIR contain numerous log les which are classied as raw data. These raw data are stored in their original form with no modications. This data storage unit is dened as a data lake. The raw data provided by the data lake are then preprocessed for network visualization and network analysis and stored in a data warehouse. Finally, the data warehouse is accessed by the data mart, which provides application services such as data visualization, data analysis, and an application programming interface for data sharing.
The SCAN platform has the option of an application programming interface (API). Users can access and retrieve all registered data using the key of information from their own applications. The information of geometries, energy, gradients, and physical properties can then be used in other applications such as informatics and machine learning.

Web-based browsing interface
Web applications are constructed where the three-layered architecture illustrated in Fig. 1 is used as the foundation. The web application directly connects to the data mart, which allows the user to directly access the web graphic user interface to carry out network visualization, network analysis, and data downloading without previous experience with programming or data preprocessing. This is particularly attractive as it expands access to chemical reaction analysis to researchers that may not have the knowledge or skillsets required for such research. The browsing interface is published under https:// scan.sci.hokudai.ac.jp.

System architecture
The application consists of the frontend and the backing API as shown in Fig. 2. The frontend is implemented as a JavaScript web application using Next.js † while the backend is implemented with the Python FastAPI framework. ‡ The backend fetches stored data from the database in the data warehouse layer and returns the data as JSON in response to requests from the frontend application. The frontend application displays the fetched data as web pages. The application also provides access control of the stored data. This is implemented with Auth0, § an external authentication service. Only registered users can access the data within the SCAN platform.

Web interface
The top page of SCAN is shown in Fig. 3. At the top page, users can see the logo of the SCAN platform as well as the statistical counts of the stored data in the SCAN database which contains the number of the reaction maps as well as the number of nodes and edges. The top page also provides access to login which allows the user to create an account for the SCAN platform. In addition, the terms of use can be reached via the top page.

Data preprocessing tool
AFIR raw data stored in the data lake must be converted into a data format that can be used in network visualization and network analysis in SCAN. In particular, the data structure of the chemical reaction path network in SCAN is designed to consist of nodes (equilibrium state; EQ) and edges (reaction paths) as described in the previous work for the details of dataset 23 using the AFIR method. However, AFIR produces a tremendous amount of raw data which contains such 3D molecular geometries and energetical information. In order to extract necessary information from the AFIR raw data, a parsing tool called grrmlog_parser is developed. The grrmlog_parser extracts all necessary information as a Python-object for network visualization and network analysis from the AFIR raw data and is available on GitHub.{

Map search engine
The search engine is implemented for searching the chemical reaction path network data. When a user logs into the application, the search interface appears as shown in Fig. 4.
Here, the user can enter atom symbols as the search query in order to retrieve maps containing specied atoms. Search results from the keyword input are displayed as a list on the bottom of the page. Users may input multiple atoms with comma-separated atom symbols like "C,O", which equates to searching for reaction maps that include all of the specied atoms. The result list includes a visual representation of the initial structure of the calculation of the reaction map and its overview information such as the lowest/highest energy in the map and the number of EQs and transition states (TSs) included.

Map detail view
When the user clicks one of the candidates returned by the search result, a map detail view of the search result is displayed as seen in Fig. 5 which includes detailed information of the reaction map.
This view also contains the interactive graph network viewer of the chemical reaction path network (Fig. 6). It displays the graph representation of the chemical reaction path network in a visual manner. The nodes represent EQs and the edges represent reaction paths in the map. The graph is automatically arranged with the force-directed graph layout technique. The nodes are colored according to the energy value.
This reaction map viewer is implemented with VivaGraphJS,k which is a very performant network graph rendering library. This supports a dynamic layout of thousands of EQs and edges with the force directed algorithm.
In the graph representation, when the user scrolls the mouse over a node, a contained window pops up to display the 3D structure of the atom coordinates (Fig. 6). Here, selecting the "switch view" button switches the display mode of the atom structure between a 2D and interactive 3D model. In the 3D mode, users can rotate the atom structure via the mouse on the switch view to look at the model from different viewpoints. The atom coordinates are displayed with ChemDoodle Web Components.** In addition, users can carry out simple graph analysis using this view. For example, betweenness centrality is calculated and the nodes that have higher betweenness values are highlighted. This may reveal important chemical reactions on the map. Users can also use other analysis methods such as frequency, closeness centrality, or PageRank.
In the map detail view, users can navigate to the list of EQs or edges in the map. Then, selecting one candidate, users open the detail view of selected EQ or edge.

Reaction path network data
Chemical reaction path networks generated using rst principles calculations with the AFIR(QCaRA) method are chosen as the prototype data set where QCaRA is designed to search for reactants from initial structures. [23][24][25][26][27][28] The data consist of nodes and edges which represent EQs and reaction paths (dened as peak top (PT) in AFIR), respectively. In addition, corresponding molecular structural information for EQs and reaction yields are contained. Please see the previous work for details regarding the dataset. 23

Reaction analysis in SCAN
SCAN unveils unique features of the chemical reaction path network data generated by AFIR. Initial set of molecules are presented as shown in Fig. 4. Here, AFIR data search the various atomic congurations while the number of atoms remains the same. Thus, the chemical reaction path network generated by AFIR can be dened as the change of atomic conguration in a specied set of molecules and atoms. The question arises how this network can be understood and identied. One could understand by using the provided reaction yield as AFIR data contain reaction yields at 200 K, 300 K, and 400 K. Fig. 7 demonstrates the reaction path network of NC(=O)N.O.O as an example. Here, the color red indicates high reaction yield, indicating that the red-colored nodes can be potential candidates for reactants of the input structure, NC(=O)N.O.O. According to AFIR(QCaRA) data, 23 these high yield nodes have higher chances to form as a reactant from the initial set of molecules. Some high yield nodes have been previously experimentally validated. 29 Therefore, users can interactively see the potential reactants via SCAN. However, it must be noted that there are a great number of high yields presented in the network. From this data, it is unknown how to narrow down which nodes tend to occur more than others. Thus, it demonstrates the potential reactant candidates to form the initial structures while remaining challenging to identify the exact set of molecules as reactants.
SCAN also equips data science techniques which can help with analysis of the complex reaction path network. One technique is centrality analysis where SCAN offers betweenness centrality, closeness centrality, page rank, and frequency analysis. These methods enable the search for the key nodes based on how data are presented. Centrality analysis can be easily performed by choosing the tab placed during the visualization.
Finally, a shortest distance calculator is implemented in SCAN. Within SCAN, users can click start and end nodes while holding the CTRL button. SCAN would then automatically calculate the shortest path where the shortest path is highlighted in bright pink as shown in Fig. 8. Thus, SCAN offers data science tools for analyzing the chemical reaction path network with simple operation.

Conclusion
Searching Chemical Action and Network, SCAN, is designed and developed for interactive chemical reaction path network analysis. SCAN provides a user-friendly graphic user interface where users can access, visualize, and analyze the reaction path network. In particular, the users can search and explore the chemical reaction path network generated by rst principles calculations where the users can also visualize and analyze the complex reaction path network via the data science technique with simple operation. SCAN source code is available at GitHub under the MIT license; thus, SCAN can be redistributed for any other chemical reaction path network data as long as network data consist of nodes and edges. Hence, SCAN is proposed as an alternative environment for understanding complex reaction path networks, providing the ability to unveil the details of chemical reactions.

Data availability
Fourteen reaction maps are stored and published on the SCAN platform. All data available on the SCAN platform are licensed under the Creative Commons license "Attribution-NonCommercial-NoDerivatives 4.0 International" (CC BY 4.0). The source code of the SCAN platform is hosted at https:// github.com/scan-team/scan-platform-test under GPL3.0 license.

Author contribution
JF and KT designed the architecture, JF, MK, and KT developed the platform. KT provided the data analysis tools. YH developed the grrmlog_parser. YH and SM calculated and analyzed the reaction path network.

Conflicts of interest
There are no conicts to declare.